Predicting stroke severity of patients using interpretable machine learning algorithms

Stroke, or cerebrovascular accident (CVA), occurs in two primary forms: ischemic stroke, caused by blocked blood flow, and hemorrhagic stroke, due to blood vessel rupture [1,2,3]. Ischemic strokes disrupt oxygen delivery, leading to cerebral infarction and neuronal loss [4]. Timely diagnosis and treatment are critical to restoring blood flow and enabling nerve recovery [5]. Delays can result in irreversible damage [4]. Hemorrhagic strokes pose significant mortality risks, with rates of 10–20% in developed countries and up to 50% in developing nations [6].

Stroke is a considerable health concern [7, 8], standing as the second most prevalent cause of mortality and the third most prevalent cause of both mortality and disability [9]. Moreover, it places considerable financial strains, demanding costly, protracted, and intricate medical interventions [10]. Consequently, there is a pressing need for swift and effective preventive measures, with a particular emphasis on addressing the primary risk factors [11].

A systematic review has highlighted the top ten risk factors associated with stroke, including high systolic blood pressure, elevated body mass index, increased fasting glucose levels, exposure to particulate matter pollution, smoking, a diet low in fruits, kidney dysfunction, elevated Low-Density Lipoprotein (LDL) levels, household air pollution from solid fuel use, and a sodium-rich diet [9]. Notably, stroke prevalence is significantly higher in the elderly population, particularly among individuals aged over 65, underscoring age as a significant risk factor for CVA [12, 13]. Moreover, there is evidence suggesting that a country's economic and income conditions influence stroke incidence, with a documented rise in stroke cases, especially in low- and middle-income nations [9, 11, 14].

Early consideration of stroke severity is crucial when assessing future clinical outcomes. Stroke severity is a vital indicator for various significant outcomes, including mortality, duration of hospitalization, discharge destination, and functional recovery [15]. Stroke assessment scales fall into two main categories: diagnostic and impairment scales [16].

The Rapid Arterial Occlusion Evaluation (RACE) and National Institutes of Health Stroke Scale (NIHSS) are two diagnostic scales renowned for their exceptional accuracy in identifying Large Vessel Occlusion (LVO) cases [17]. Specifically, the RACE scale, tailored for prehospital emergencies, is the pioneering validated tool for diagnosing acute stroke and LVO [18, 19]. It comprises a 10-value range, where "0" signifies a normal state and "9" indicates severe obstruction. Notably, it exhibits predictive capabilities for LVO likelihood assessment. Scores meeting or exceeding "5" provide an 85% sensitivity and 69% specificity in detecting LVO cases, while scores below "5" maintain an 89% sensitivity, albeit with reduced specificity at 55%. On the other hand, the NIHSS scale, part of a comprehensive set for measuring stroke-related impairments, is adept at evaluating stroke effects in acute settings, albeit primarily designed for research and clinical trial applications rather than widespread bedside assessments [16, 20, 21]. It is essential to recognize that each scale possesses its strengths and limitations, with no universally acknowledged gold standard among them in research studies [16].

In recent years, Artificial Intelligence (AI), including Machine Learning (ML) and Deep Learning (DL), has gained significant traction in healthcare, particularly for stroke prediction and diagnosis [22]. ML methods such as Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF) have become popular due to their ability to handle structured data, ease of interpretability, and relatively fast processing times [23]. Traditional ML models' main advantage is their simplicity and explainability, making them suitable for clinical applications. However, they may require manual feature engineering and often struggle with large unstructured datasets [24].

DL methods, especially Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have revolutionized the field of medical imaging and time-series data analysis [25]. CNNs have shown exceptional performance in analyzing complex medical images like Magnetic Resonance Imaging (MRI) and Computer Tomography (CT) scans, where feature extraction is automated. RNNs, particularly in analyzing temporal data such as electronic health records, can learn sequential patterns that are difficult to capture with traditional models [23]. The main advantage of DL models is their ability to handle large volumes of high-dimensional data and detect complex patterns. However, these models require vast training data, are computationally expensive, and are often criticized for their black-box nature, making them difficult to interpret for clinical decision-making [26]. In the medical field, transparency and interpretability are critical, which is why explainable AI (XAI) methods like SHapley Additive Explanations (SHAP) are now being integrated with ML and DL models to ensure that clinicians can understand the basis of a model's predictions [27, 28].

Given these considerations, the use of AI in cerebrovascular disorders, such as stroke, holds promise for early detection and severity prediction [27]. However, for AI models to be clinically relevant, they must address several challenges, such as class imbalances in datasets, generalizability across healthcare systems, and the need for model interpretability [24, 29, 30]. Generalization issues can arise when AI models trained on one dataset perform poorly on other datasets due to differences in population, data collection practices, or healthcare systems [31]. Handling imbalanced datasets, common in stroke prediction where certain severity levels may be underrepresented, requires advanced techniques like resampling or specialized algorithms to ensure accurate model performance [32]. Furthermore, clinician trust is paramount, and models perceived as "black boxes" are less likely to be adopted in practice [33]. Addressing these issues is essential for AI's reliable integration into clinical workflows [29].

Author contributions

The following are the main contributions of this study to address some of the research mentioned earlier gaps:

ASA, AN, and JBM were involved in the conception and design of this study. TS, GT, BR, ZH, SN, and HLA prepared datasets, and ASA performed the analysis. ASA, AN, UKW, SN, JBM, and HLA interpreted the results. ASA, TS, and GT drafted the manuscript, and all authors (ASA, TS, GT, AN, BR, ZH, UKW, SN, JBM, and HLA) contributed to writing the final draft of the manuscript.

The study introduces the first stroke severity prediction approach employing RACE and NIHSS scales across two hospitals in the West Azerbaijan province of Iran.

This study evaluates various ML models with hyperparameter tuning and addresses class imbalances.

Pioneering the application of explainable ML methods, including SHAP and decision rules extraction, to uncover the decision-making process of the top-performing RF model in predicting brain stroke severity to enhance transparency and understanding in the prediction process, ensuring the models are clinically interpretable.

Literature review

In recent years, ML techniques have emerged as powerful tools in predicting stroke outcomes, with numerous studies highlighting their efficacy. Stroke prediction models have become increasingly sophisticated, driven by advancements in ML algorithms and data availability.

Su et al. [24] developed ML models such as SVM, RF, and Light Gradient Boosting Machine (LGBM) using stroke registry data from the Chang Gung Healthcare System. Their study focused on predicting modified Ranking Scale (mRS) outcomes and in-hospital deterioration. Notably, RF excelled in both predictions, whereas deep neural networks (NNs) outperformed other models in predicting in-hospital deterioration, particularly when resampling was not applied. This highlights the potential of Deep Learning (DL) in highly complex data characteristics.

Wu et al. [30] took a different approach, focusing on predicting stroke risk among elderly individuals using imbalanced datasets. Applying balancing techniques, such as the Synthetic Minority Oversampling Technique (SMOTE), significantly improved the performance of ML models like RF, SVM, and Ridge Logistic Regression (RLR). Their results emphasized the importance of addressing data imbalance, a critical challenge in medical datasets, to achieve robust model predictions.

Similarly, Zhang et al. [31] explored stroke prediction in elderly surgical patients, developing models incorporating data balancing and imputation techniques. Among the seven ML models evaluated, Extreme Gradient Boosting (XGBoost) emerged as the top performer, benefitting from data balancing techniques that optimized model performance. The study illustrated the significance of preprocessing techniques, mainly when working with medical datasets prone to missing values and class imbalances.

Kogan et al. [32] extended the application of ML to Electronic Health Records (EHR), where they aimed to estimate NIHSS scores for stroke patients using Natural Language Processing (NLP). Their RF model demonstrated a strong correlation between NLP-extracted NIHSS scores and clinical assessments, emphasizing the potential of integrating ML with unstructured clinical data for accurate stroke severity prediction.

Cui et al. [33] further validated the use of ML for predicting acute ischemic stroke and neurological impairment severity in patients with Anterior Circulation Large Vessel Occlusions (AC-LVO). This study compared four ML models (RF, SVM, RLR, and Logistic Regression (LR)), with SVM and RLR outperforming the others. Their work underscores the utility of combining different models to achieve optimal predictive performance for acute stroke conditions.

Regarding imaging-based stroke severity prediction, Faust et al. [34] developed a classification system using post-stroke MRI data. They applied various SVM classifiers and found that the SVM with a Radial Basis Function (RBF) kernel achieved superior accuracy, specificity, and sensitivity. This work highlights the importance of selecting appropriate kernels and parameters when applying ML models to imaging data, particularly in stroke.

Further, Yu et al. [35] focused on real-time stroke severity classification using NIHSS features, employing the C4.5 DT algorithm. This model demonstrated the highest recall and precision, suggesting that DTs can offer high interpretability alongside predictive accuracy, which is crucial for clinical application.

Someeh et al. [36] took a different angle by utilizing a Multilayer Perceptron (MLP) on a decade-long dataset, demonstrating high accuracy rates (81–85%) in stroke prediction. Their study pointed out that MLP models can be particularly effective when trained on longitudinal data, providing insights into long-term stroke risks.

Zhu et al. [37] focused on mortality prediction in stroke patients using a dataset of over 7,000 individuals. Their ML models achieved the highest reported accuracy for this task, pinpointing demographic and clinical factors, such as age, BMI, and marital status, as key predictors of mortality. This work emphasized the role of patient demographics in predicting stroke outcomes, a critical aspect for improving tailored care strategies.

Kokkotis et al. [38] employed a dataset of over 43,000 subjects to investigate ten key stroke risk factors. Their comparative analysis of ML classifiers revealed that the MLP performed best in reducing false negatives, essential for minimizing misdiagnoses in clinical settings. The study underscores the importance of balancing predictive performance with the need for low false-negative rates in critical health conditions like stroke.

Moreover, Dritsas and Trigka [28] advanced stroke prediction by proposing a stacking classifier framework, achieving an impressive Area Under the Curve (AUC) of 98.9%. Their work highlighted the potential of ensemble learning techniques to enhance predictive accuracy, a key consideration for future stroke prediction models.

JM and P [39] emphasized stroke's growing risk in younger populations due to unhealthy diets and highlighted the need for early detection. They developed an ML model to enhance stroke prediction, applying feature selection techniques like gradient boosting and RF. Their evaluation of classifiers, including DT, SVM, LR, GB, RF, K-Nearest Neighbor (KNN), and XGBoost, showed RF achieving the highest accuracy at 98%, demonstrating its strong predictive ability.

Finally, Hassan et al. [40] tackled the issue of imbalanced datasets, using three imputation techniques and SMOTE to enhance model performance. Their Dense Stacking Ensemble (DSE) model achieved over 96% accuracy, illustrating the potential of ensemble methods for handling complex stroke datasets. They identified age, BMI, and glucose levels as crucial for early stroke detection.

A summary of the reviewed literature from these studies are provided in Table A1 in Appendix A. While many studies have demonstrated the utility of ML for stroke prediction, few have focused explicitly on predicting stroke severity using RACE or NIHSS scales. The work of Su et al. [24] and Kogan et al. [32] highlights the promise of ML in predicting patient outcomes and stroke severity; however, a clear gap exists in multi-center studies that combine these two widely recognized scales. Our study addresses this gap by being the first to develop a stroke severity prediction model using both RACE and NIHSS scales, applying seven ML algorithms (KNN, DT, RF, AdaBoost, XGBoost, SVM, and Artificial Neural Network (ANN) across two different hospitals. Furthermore, SHAP enhanced model interpretability, a crucial step towards increasing clinician trust in ML-driven decision-making.

Table 1 General Information of IKTH and IRH datasets

留言 (0)

沒有登入
gif