Objective: Suicide is a critical medical and public health challenge, particularly among individuals with mental illnesses in safety-net hospitals. To uncover insights about suicidality embedded in unstructured clinical notes, we propose to annotate, analyze, and model a corpus for suicidality understanding and lifesaving. Methods: A multidisciplinary panel developed an annotation guideline to capture four key suicide-related factors: Suicidal Ideation (SI), Suicide Attempt (SA), Exposure to Suicide (ES), and Non-Suicidal Self-Injury (NSSI). We created an annotated corpus of 500 notes through a clinically validated annotation process and performed cohort analysis to characterize demographic and suicidal distributions. A large language model was deployed for automatic classification. Results: The annotated corpus was created with a Cohen's Kappa of 0.95 and further de-identified for data sharing. Most notes (79.4%) contained one (34.4%) or more (45%) suicide-related labels, with SI and SA co-occurrence as the most frequent combination (35.6%), which demonstrates significant overlap. The cohort was characterized with a mean age of 33.4, 51.7% male, and 75.8% singles. Prevalent stressors included unemployment (24.2%), homelessness (12.0%), limited healthcare access (5.4%), and legal challenges (5.0%). We identified four key insights to improve documenting suicidality, including implicitness, confliction, ambiguity, and definition coverage incompleteness. The baseline model achieved a micro-averaged F1 score of 0.70, demonstrating satisfying performance in multi-label classification. Conclusion: The near-perfect inter-annotator agreement underscores the proposed annotation process and data quality. Cohort analysis highlights the distribution and documentation insights of suicidality. Data modeling demonstrates the potential of insight generation via AI-powered methods for mining large-scale clinical notes.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis research work was supported by the National Library of Medicine under award number R01-LM011934 at the National Institutes of Health and by the Cancer Prevention and Research Institute of Texas under award number RR230020.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was approved by the Institutional Review Board (IRB# HSC-SBMI-17-0354) at UTHealth. All clinical data are de-identified by humans and only made available through data use agreement according to UTHealth rules.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
留言 (0)