INSIGHTFUL: Insight Generation through Clinical Annotation, Analysis, and Modeling of Suicide-Related Factors towards Understanding and Lifesaving

Abstract

Objective: Suicide is a critical medical and public health challenge, particularly among individuals with mental illnesses in safety-net hospitals. To uncover insights about suicidality embedded in unstructured clinical notes, we propose to annotate, analyze, and model a corpus for suicidality understanding and lifesaving. Methods: A multidisciplinary panel developed an annotation guideline to capture four key suicide-related factors: Suicidal Ideation (SI), Suicide Attempt (SA), Exposure to Suicide (ES), and Non-Suicidal Self-Injury (NSSI). We created an annotated corpus of 500 notes through a clinically validated annotation process and performed cohort analysis to characterize demographic and suicidal distributions. A large language model was deployed for automatic classification. Results: The annotated corpus was created with a Cohen's Kappa of 0.95 and further de-identified for data sharing. Most notes (79.4%) contained one (34.4%) or more (45%) suicide-related labels, with SI and SA co-occurrence as the most frequent combination (35.6%), which demonstrates significant overlap. The cohort was characterized with a mean age of 33.4, 51.7% male, and 75.8% singles. Prevalent stressors included unemployment (24.2%), homelessness (12.0%), limited healthcare access (5.4%), and legal challenges (5.0%). We identified four key insights to improve documenting suicidality, including implicitness, confliction, ambiguity, and definition coverage incompleteness. The baseline model achieved a micro-averaged F1 score of 0.70, demonstrating satisfying performance in multi-label classification. Conclusion: The near-perfect inter-annotator agreement underscores the proposed annotation process and data quality. Cohort analysis highlights the distribution and documentation insights of suicidality. Data modeling demonstrates the potential of insight generation via AI-powered methods for mining large-scale clinical notes.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research work was supported by the National Library of Medicine under award number R01-LM011934 at the National Institutes of Health and by the Cancer Prevention and Research Institute of Texas under award number RR230020.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was approved by the Institutional Review Board (IRB# HSC-SBMI-17-0354) at UTHealth. All clinical data are de-identified by humans and only made available through data use agreement according to UTHealth rules.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

留言 (0)

沒有登入
gif