AI-based Prediction of Imminent Primary Stroke on Claims Data Enables Accurate Patient Stratification

Abstract

Background With an annual rate of 5.5 million cases, ischemic stroke is the second leading cause of death and permanent disability worldwide posing a significant medical, financial and social burden. Current approaches relax high-risk profiles of imminent stroke to mid- to long-term risk assessment, tempering the importance of immediate preventative action. Claims data may support the development of new risk prediction paradigms for better, individualized management of disease. Methods We developed a data-driven paradigm to predict personalized risk of imminent primary ischemic stroke. We used social health insurance data from northeast Germany (between 2008-2018). Stroke events were defined by the presence of an ischemic stroke ICD-10 diagnosis within the available insurance period. Controls (n=150,091) and strokes (n=53,047) were matched by age (mean=76) and insurance length (mean=3 years), resulting in a generally aged, high-risk study population. We trained traditional and Machine Learning (ML) classifiers to predict the overall likelihood of a primary event based on 55 features including demographic parameters, ICD-10 diagnosis of diseases and dependence on care. Binary ICD-10 features were translated into temporal duration of diagnoses by counting days since the first appearance of disease in the patients records. We used SHAP feature importance scores for global and local explanation of model output. Findings The best ML model, Tree-boosting, yielded notably high performance with an area under the receiver operating characteristics curve of 0.91, sensitivity of 0.84 and specificity of 0.81. Long duration of hypertension, dyslipidemia and diabetes type 2 were most influential for predicting stroke while frequent dependence on care proved to mitigate stroke risk. Interpretation Our proposed data-driven ML approach provides a highly promising direction for improved and personalized prevention and management of imminent stroke, while the developed models offer direct applicability for risk stratification in the north-east German population. Funding Horizon2020 (PRECISE4Q, #777107)

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work has received funding by the European Commission through a Horizon2020 grant (PRECISE4Q Grant No. 777 107, coordinator: DF) and the German Federal Ministry of Education and Research through a Go-Bio grant (PREDICTioN2020 Grant No. 031B0154 lead: DF).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

No human or animal studies were conducted by the authors for this article. All ethical guidelines specified for secondary data analysis apply and were executed. See also described in STROBE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2020496/ And hereby a description of reporting standards on secondary data in germany (authored by co-authors of this article): https://www.thieme-connect.de/products/ejournals/pdf/10.1055/s-0042-108647.pdf

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Models produced in the present study are available upon reasonable request to the authors.

留言 (0)

沒有登入
gif