In this study, the XGBoost model was identified as the optimal predictive model. This study aimed to determine the factors that lead to poor functional outcomes 3 months after an AIS in young patients. It also aimed to compare the predictive performance of the ML algorithm and the logistic model. The main findings of this study were:
In 2268 young patients, poor functional outcome was significantly associated with a high mRS score at admission, living alone conditions, and a high NIHSS score at discharge.
ML is superior to logistic regression, with XGBoost being the best model.
The lifelong impact of stroke in young adults is associated with significant costs for patients themself, their families, and society. The long-term medical, psychosocial, and socioeconomic consequences are particularly severe at younger ages [23]. Therefore, there is a need to identify risk factors and develop and validate predictive scores for post-AIS outcomes. Recently, many ML models have been designed to predict adverse outcomes using algorithms that can learn from large amounts of complex data. In a recent study, the RF method using a combination of Random Under-Sampling (RUS) and biomarkers was found to be the best stroke prediction model in Chinese adult patients with hypertension [24]. A multidisciplinary study of atherosclerosis found that of nine predictive tests, RF was the best model for predicting cardiovascular disease risk including AIS [25].
In addition, results from the China Longitudinal Health and Longevity Study, show that red light running (RLR) applied to the Synthetic Minority Over-sampling Technique (SMOTE) is superior to other test models in predicting stroke in the elderly [22]. Also, the study by Hao et al. showed that a deep neural network model could improve the prediction of long-term outcomes in 2604 AIS patients aged 66.2 ± 12.6 years [9].
Our study shows that the XGBoost model has good discrimination (AUC = 0.81), and is better than other algorithms in predicting poor functional outcomes in young AIS patients within 3 months, followed by RF, lightGBM, and GBDT. Among them, XGBoost, RF, and lightGBM were better than logistic regression. Choosing the right ML model for disease prediction is critical for optimization. Various ML models have already been developed to predict clinical outcomes after stroke in both general and elderly patients. The study of Chen et al. suggested that the CatBoost algorithm had the best predictive performance compared to logistic regression and other ML models [14], and found that gender, age, stroke history, heart rate, d-dimer, creatinine, TOAST classification, mRS at admission and discharge, and NIHSS score at discharge predicts poor outcomes at 90 days in patients with TIA [14]. In addition, the study by Xiang et al. [26] showed that the RF model could better predict 6-month outcomes of Chinese AIS patients than the Houston intra-arterial therapy (HIAT) score, the total health risks in vascular events (THRIVE) score, as well, the NIHSS score on admission, age, previous Diabetes mellitus and crEatinine (NADE) Nomogram. This study found that NIHSS at admission, age, premorbid mRS, fasting glucose, and creatinine were significant predictor factors. Moreover, the study by Xio et al. proved that the XGB model is a reliable predictive model, and also showed that hypertension, cancer, congestive heart failure, chronic lung, and peripheral vascular disease may be closely associated with stroke in elderly patients [27]. However, predicting risk factors for poor functional impairment in young patients using different types of ML remains unclear.
Feature selection from ML has shown that a high mRS score at admission and a high NIHSS score at discharge, as well as, the patient living alone remained independent predictors of poor 3-month outcomes in young patients with AIS. The NIHSS and mRS are quantitative tools used to efficiently and effectively assess the degree of neurological impairment in patients with AIS. In addition, these neurological severity scores are closely related to the patient’s brain necrosis volume, location, type, perfusion, and injury [28, 29]. On the other hand, our results are consistent with Waje-Andreassen et al. who found that living alone was a predictor of long-term mortality in 232 young stroke patients [30]. Additionally, in the Riks-Stroke-based study, living alone condition was an independent predictor of short-term mortality after stroke [31]. In addition, a recent study suggests that stroke severity is associated with living alone [32]. Mathew et al. showed that individuals living alone at home were much less likely to arrive at the hospital early than those living with others and that this delay resulted in a much lower thrombolysis treatment rate [33]. The Swedish stroke registry also showed that treatment rates were ≈ 50% lower in patients living alone [34], which may explain the association between the condition of living alone and poor functional outcomes after stroke in young patients. Moreover, other studies have demonstrated that living alone can be considered a proxy for low social support, and for coronary heart disease, biological processes such as inflammatory and prothrombotic disorders, and mental disorders [35].
Our finding shows that the XGboost model can better predict the risk of 3-month poor functional outcomes in young patients with AIS. These results are similar to the study by Chung et al. which suggested that the XGBoost model is a reliable predictive power for AIS and also demonstrates the validity of the model for use in patients receiving various AIS treatments [15]. In addition, Yuan et al. have shown that the XGBoost model has better performance in predicting the 90-day readmission risk in AIS patients [36]. XGBoost is a new integrated learning method that boosts gradient. It implements a ML algorithm in the context of gradient boosting and is efficient, flexible, and portable. XGBoost is an efficient gradient-boosting algorithm capable of handling large-scale datasets, outperforming many other ML algorithms in terms of performance. It features built-in regularization, effectively preventing overfitting and enhancing the model's generalization ability. Overall, XGBoost excels in processing large-scale data, high-dimensional features, and complex tasks. The XGBoost classification method is more suitable for clinical predictive analysis than other ML techniques because it is effective and can combine the classification and regression tree process, allowing the processing of different, complex, and nonlinear models (such as multiple cases, and medical conditions). The potential of ML to significantly improve health care by automating routine processes and improving clinical decision-making is tantalizing today [37]. The future is likely to be characterized by augmented intelligence, in which computers become indispensable tools for patient care, and allow physicians to spend more time on patient care [38].
In the future, we could use the XGBoost model accessible via an online web page or integrated into clinical decision support systems (CDSS). This would allow clinicians to conveniently use the model in their daily work. Additionally, providing clear expectations to patients and their families can help them better understand the illness and actively participate in the treatment and rehabilitation process. Our prediction model will require further validation in prospective studies to confirm its effectiveness. We believe that with additional research and validation, the XGBoost model has the potential to be widely applied in clinical practice, enhancing the treatment and prognosis of young stroke patients.
Using the smallest variables to achieve better predictions is our strength. The simpler a model is, the easier it is to validate. Second, the predictors used in our study were comprehensive and included demographic, lifestyle, and clinical variables, which allowed us to examine the relationship between risk factors and stroke from multiple perspectives. In addition, the data used in this study were from large a Chinese cohort with high-quality data representing AIS patients in China.
Our study also has some limitations, first, there is some level of missing values, but, our all-missing values are < 5%. We used imputations to fill in missing laboratory data, and no statistical differences were observed between the data before and after the imputation process. Second, this study does not include genomic and imaging data, which may have limited predictive power. Third, external validation is absent, and this will be conducted in an independent external cohort population in the future.
留言 (0)