Comprehensive Sepsis Risk Prediction in Leukemia Using a Random Forest Model and Restricted Cubic Spline Analysis

Introduction

Sepsis is a life-threatening condition that significantly increases mortality in leukemia patients.1 Due to the immunosuppressed state of leukemia patients, they are particularly vulnerable to infections, which can rapidly escalate to sepsis.2–4 Early identification of sepsis is critical for improving outcomes, yet accurately predicting sepsis risk remains challenging.5

Although several biomarkers and clinical indicators have been associated with sepsis, there is a need for a reliable, data-driven predictive model that can identify patients at high risk for sepsis.2 This challenge arises due to the multifactorial nature of sepsis and the intricate immune dysfunctions seen in leukemia.6 Leukemic blasts can significantly impair immune responses by inhibiting T cell ligands (such as PD-L1 and Gal-9) and diminishing the cytotoxic activity of NK cells, which increases vulnerability to infections.7 Furthermore, treatments like chemotherapy and hematopoietic stem cell transplantation further elevate the risk of severe infections.8 As a result, identifying more sensitive biomarkers is critical to improving sepsis prediction accuracy in leukemia patients.

In recent years, machine learning has offered new insights for the comprehensive analysis of biomarkers, holding promise for the development of more sensitive and specific predictive models to aid clinicians in identifying high-risk patients and optimizing treatment strategies.9,10 These methods are now widely used in clinical practice to facilitate early disease detection and timely prevention.9 However, their application in leukemia remains limited. For example, the Epic sepsis model, a proprietary tool for predicting sepsis, has been adopted by several hospitals across the United States.11 Despite its widespread use, the model’s sensitivity is relatively low, and its predictive performance has been suboptimal compared to current clinical practices.

This study aims to develop a predictive model using machine learning techniques to assess sepsis risk in leukemia patients. By incorporating clinical, demographic, and laboratory data, we seek to identify key predictors of sepsis and design a model that can be effectively applied in clinical settings to enhance the timely detection and management of this severe complication. To achieve this, we compared multiple machine learning models to evaluate their performance. Each model possesses unique strengths and limitations, making a comprehensive comparison essential to identify the most suitable approach for this specific clinical scenario. This comprehensive comparison was critical to identify the most clinically feasible and effective approach for sepsis prediction, prioritizing sensitivity to ensure timely identification of high-risk cases in this vulnerable population.

Materials and Methods Data Source

A total of 5872 eligible participants with leukaemia were retrospectively enrolled from the Affiliated Hospital of Guangdong Medical University between January 2005 and June 2024. The diagnoses were made based on the criteria of the World Health Organization (WHO), the Society for Hematopathology, and the European Association for Hematopathology. Patients were excluded if more than 20% of their data were missing and with prior malignancy. After applying the criteria, 4310 eligible leukaemia patients were included (Figure 1). For participant allocation, we divided the data into training (70%) and validation (30%) cohorts to develop and test the predictive model effectively, ensuring a balance between learning complex patterns and validating model performance to prevent overfitting. Ethical approval for the study was obtained from the Institutional Review Board of the Affiliated Hospital of Guangdong Medical University (Approval No. PJKT2024-211). This study complies with the Declaration of Helsinki.

Figure 1 Flow diagram of patient selection.

Data and Variables

The extracted variables included demographic and clinical characteristics of the patients, Age, Weight, Marital Status, Hypertension, coronary heart disease (CHD), Diabetes, Smoking, Alcohol, Infection History, Hepatitis, Family Cancer History, Family History, Gender, ABO blood, duration of hospital stay, discharge status, basic vital signs, and laboratory parameters. Blood and biochemical test results were collected on the first day of admission. In cases where multiple test results were available for a specific variable, the first measurement was used in the analysis.

Model Construction and Validation

Univariate logistic analysis, Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis, and Boruta algorithm were employed to determine the potential risk factors in the training dataset. In the univariate logistic analysis, variables with p < 0.05 were considered potential biomarkers. We chose LASSO regression analysis due to its ability to impose penalties on variables, helping to reduce the likelihood of overfitting. To determine the most predictive variables, we employed 10-fold cross-validation, focusing on those that minimized the cross-validated error. Boruta is a feature selection method based on random forests that determines the importance of each variable by comparing its Z-score with that of its “shadow” counterparts. During the algorithm’s execution, all real features were duplicated and randomly shuffled to generate Z-scores. If a real feature’s Z-score consistently exceeded the maximum Z-score of the shadow features across multiple independent tests, it was deemed important and included in subsequent machine learning model construction. The identical variables obtained by the above three methods were subsequently included in the multivariate logistic regression analysis to determine the final model. A variance inflation factor (VIF) of ≤5 indicated no collinearity among the variables in the final model.

The important variables were incorporated into seven different machine learning algorithms for model construction, including logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), decision tree (DCtree), random forest (RF), extreme gradient boosting (XGBoost), and artificial neural network (NNET). To prevent overfitting, we employed a 10-fold cross-validation strategy. The dataset was split into ten folds, with nine folds used for training and one-fold used for validation, cycling through all folds. Grid search was used for hyperparameter tuning, optimizing parameters such as the number of trees (n-estimators), maximum tree depth (max-depth), and minimum samples per leaf (min-samples-leaf). The optimal configuration was selected based on the highest AUC and F1 score from the validation set.

The model with the highest area under the receiver operating characteristic curve (ROC) was selected to determine model performance. Model discrimination was further assessed using sensitivity, specificity, recall, accuracy, and F1 score metrics. Decision curve analysis (DCA) was conducted to evaluate the clinical utility of the models. The best-performing model was subsequently used for further interpretive analysis. After training the model on the training cohort, all model parameters were fixed, and the model’s performance was further evaluated using the validation cohort.

To better understand the decision-making process of the best-performing model, Shapley additive explanations (SHAP) were employed. Based on cooperative game theory’s Shapley values, SHAP rationally allocates contributions to the model’s output among individual input features. SHAP also reveals feature importance and visually displays the direction and magnitude of each feature’s contribution to the predictive outcome, thereby providing a deep understanding of the model’s decision-making process. Restricted cubic spline (RCS) regression was used to describe the non-linear relationship. Possible nonlinear relationships between the change in important variables and Sepsis were examined by a logistic regression model with RCS. In addition, we explore the interactions between predictors to gain insight into the potential relationships between predictors.

Statistical Analysis

Missing data (<20% of total) were imputed using the Multiple Imputation (MI) method. Additionally, to assess the impact of missing data handling on the model, we performed sensitivity analysis. Continuous variables were expressed as medians and interquartile ranges (IQR), while categorical variables were expressed as total numbers and percentages. Chi-square (χ2) tests, Wilcoxon rank-sum tests, or Fisher’s exact tests were used as appropriate. Propensity score matching (PSM) was performed using 1:1 optimal pair matching to minimize the influence of confounding factors and enhance the validity of our findings. The matching was considered optimal when the sum of the absolute pairwise distances in the matched sample was minimized. The analyses were performed using R software (version 4.2.2). Statistical significance was determined by a two-tailed P value of less than 0.05.

Results Comparative Analysis of Clinical Profiles in Leukemia Patients With and Without Sepsis

Based on the inclusion and exclusion criteria, a total of 4310 patients with leukemia were enrolled and randomly assigned to the training (n=3017) and validation cohorts (n=1293) in a ratio of 7:3. The baseline characteristics table provides key insights into the demographic and clinical profiles of the study population. As a result, all the characteristics were equally distributed between the two cohorts (Table 1). A total of 3017 leukemia patients were included in the training cohort, of whom 917 experienced sepsis.

Table 1 Patient Demographics and Baseline Characteristics in the Training Cohort and Validation Cohort

To minimize potential confounding factors, PSM was performed on variables such as age, weight, marital status, hypertension, CHD, diabetes, smoking, alcohol, infection history, hepatitis, family cancer history, family history, gender, and ABO blood, achieving a balanced distribution of these selected characteristics between the groups. Supplementary Table 1 compares the baseline characteristics before and after matching. Post-PSM, no statistically significant differences were observed between the groups for the matched variables (P > 0.05), indicating that the matching process effectively reduced confounding bias.

The detailed demographic and baseline clinical characteristics of the PSM-adjusted training cohort are summarized in Table 2, with 917 cases each in the sepsis and non-sepsis groups. The analysis revealed significant differences in various clinical parameters between the two groups, particularly in hospitalization days, pulse rate, heart rate, respiratory rate, diastolic pressure, systolic pressure, and several hematological indices. Patients with sepsis experienced longer hospital stays and exhibited elevated pulse rates, heart rates, respiratory rates, white blood cell counts, and lower hemoglobin levels compared to the non-sepsis group. Additionally, liver function tests, including ALT, AST, and GGT, demonstrated significant disparities between the groups. Biochemical markers also indicated lower albumin levels and higher creatinine levels in the sepsis cohort. Furthermore, lipid profiles showed lower HDL cholesterol and higher LDL cholesterol in the sepsis group. Overall, the sepsis group displayed more severe pathological conditions across most physiological and laboratory characteristics. These findings suggest distinct clinical profiles between sepsis and non-sepsis patients, which may facilitate early identification and management of individuals with sepsis.

Table 2 Patient Demographics and Baseline Characteristics in the Training Cohort

Identifying Key Predictors of Sepsis Risk in Leukemia Patients

To identify key variables associated with sepsis occurrence in leukemia patients, univariate logistic analysis, Boruta algorithm, and LASSO regression analysis were employed for feature selection.

Univariate logistic regression analysis revealed several variables significantly associated with sepsis risk, including urine indicators such as Urine URO (p < 0.001), Urine BIL (p < 0.001), and Urine OB (p = 0.013). Physiological metrics like temperature (p < 0.001), pulse rate (p < 0.001), diastolic pressure (DBP, p < 0.001), and systolic pressure (SBP, p < 0.001) were also significant. Hematological indices, including white blood cell count (WBC, p < 0.001), red blood cell count (RBC, p < 0.001), hemoglobin (Hb, p < 0.001), and C-reactive protein (CRP, p < 0.001), showed notable differences. Additionally, mean platelet volume (MPV, p = 0.029) and the activated partial thromboplastin time ratio (APTTR, p = 0.020) were significant as well. These results may provide crucial clinical insights for the early identification and management of sepsis patients (Supplementary Table 2). Figure 2A illustrates the Z-scores of each variable, demonstrating how this analysis enhanced model optimization by concentrating on the most relevant features. A total of 45 variables were identified as important, including vital signs such as Temp, PR, HR, RR, DBP, and SBP; hematological indices including WBC, RBC, Lymph, Mono, Neut, Hct, Eos, Baso, and Hb; liver function tests like ALT and AST; biochemical markers such as ALB and TP; lipid profiles including HDL.C and LPa; and coagulation factors like Fbg and CRP. This grouping underscores the multifaceted nature of sepsis risk assessment, indicating their strong explanatory power in predicting sepsis occurrence. In the LASSO regression analysis, the optimal tuning parameter λ, determined through 10-fold cross-validation, was 0.032, based on the one standard error of the minimum criteria. At this λ value, the following variables were identified as risk factors for sepsis: Urine URO, Temperature, Pulse Rate, SBP, RBC, Lymph, Neut, TP, TBA, ChE, TT, CRP, and PCT (Figure 2B and C).

Figure 2 Feature selection process for variables included in the prediction model. (A) Variable importance based on the Boruta algorithm, where attributes are classified as “Tentative” (green), “Confirmed” (blue), “Rejected” (red), and “NA” (grey). (B) LASSO regression path showing the coefficients of variables across different values of the regularization parameter (λ). (C) Cross-validation error plot for selecting the optimal λ in LASSO. The vertical dashed line represents the optimal λ where the minimal cross-validation error is achieved. (D) Venn diagram comparing variables selected by three different methods: univariate logistic regression (green), LASSO (blue), and the Boruta algorithm (red), showing the overlap of selected variables.

Following the identification of 13 common biomarkers through the three selection methods (Figure 2D), we incorporated these variables into a multivariate logistic regression analysis to determine the most reliable predictors of sepsis risk. The analysis indicated that SBP, RBC, Lymph, Neut, TBA, TT, CRP, and PCT emerged as significant independent risk factors for sepsis in leukemia patients (Table 3). To ensure the robustness of our model, we conducted a multicollinearity analysis, which yielded an average VIF of 1.038. This result suggests that there is no significant multicollinearity among the selected variables, thereby reinforcing the reliability of our findings (Supplementary Table 3). These variables were incorporated as primary predictors in the subsequent model construction and analysis, providing a solid foundation for developing predictive models.

Table 3 Multivariate Logistic Regression Analysis for Sepsis in the Training Cohort

The RF Model Outperforms Other Machine Learning Models in Predicting Sepsis in Leukemia Patients

The performance of seven machine learning models in predicting Sepsis in leukemia patients was compared, with their ROC curves and DCA results presented (Figure 3). In the training cohort, the SVM model exhibited excellent predictive performance, achieving an AUC of 0.972, indicating high accuracy. In contrast, the AUC values for the other models were as follows: RF at 0.765, XGBoost at 0.762, NNET at 0.727, LOGI at 0.722, and DCtree at 0.685, while the KNN model performed the poorest with an AUC of 0.665 (Figure 3A). DCA assessed the clinical utility of the models, further confirming that the RF model provided the highest net benefit across most threshold ranges, particularly within the intermediate range, highlighting its superiority in predicting sepsis (Figure 3B). The RF model exhibited the best overall performance, with the highest AUC and net gain, indicating that it was superior in predictive power for sepsis, hence the choice of the RF model as the best model. To further evaluate the model’s performance, results from the validation cohort are depicted in Figure 3C. The RF model achieved an AUC of 0.700, demonstrating its ability to generalize effectively to new data. Similarly, the net benefit of the RF model in the validation cohort, shown in Figure 3D, reflects trends akin to those observed in the training group, reaffirming its robust predictive capability for sepsis among leukemia patients across both cohorts.

Figure 3 Predictive performance of the model in training and validation cohorts. (A and C) Receiver Operating Characteristic (ROC) curves for the training (A) and validation (C) cohorts. The curves illustrate the discriminatory ability of different predictive models, with the area under the ROC curve (AUC) values displayed for each model: Random Forest (RF), XGBoost (XGB), Support Vector Machine (SVM), Neural Network (NNET), K-Nearest Neighbors (KNN), Logistic Regression (LOGI), and Decision Tree (DCTree). (B and D) Decision Curve Analysis (DCA) for the training (B) and validation (D) cohorts. The curves show the net benefit of each model across various threshold probabilities, comparing them with the “Treat All” and “Treat None” strategies, indicating their potential clinical utility.

Further detailed performance metrics for each model, including sensitivity, specificity, recall, accuracy, and F1 score (Table 4). The RF model demonstrated the highest performance, achieving an accuracy of 0.710, a sensitivity of 0.727, a specificity of 0.694 and a F1 score of 0.715, indicating its strong ability to differentiate between sepsis and non-sepsis patients. The XGBoost model maintained balanced metrics, with an accuracy of 0.689. The SVM model followed closely, achieving an accuracy of 0.686, with a sensitivity of 0.710 but a lower specificity of 0.661. The NNET model showed moderate performance, with an accuracy of 0.675 and a sensitivity of 0.623, while the KNN model had the lowest overall performance, with an accuracy of 0.593 and a sensitivity of 0.541. The Logistic Regression model exhibited an accuracy of 0.658, demonstrating balanced sensitivity and specificity. The DCtree model reached a specificity of 0.792 but had a low sensitivity of 0.508, leading to an overall accuracy of 0.650. In the validation cohort, the performance metrics exhibited similar trends, further validating the models’ effectiveness. The RF model achieved an accuracy of 0.758, with a sensitivity of 0.655 and a specificity of 0.714, affirming its reliability in distinguishing sepsis cases (Supplementary Table 4). We also performed sensitivity analyses to assess the impact of missing data processing on the model, confirming the robustness of the chosen imputation method (Supplementary Table 5). The analysis showed that the results produced by no imputation were consistent with those imputed using the MI method.

Table 4 Comparison of Training Cohort Results of the Machine Learning Models

Overall, the RF model consistently outperformed other machine learning models in both predictive accuracy and clinical utility, making it the most reliable tool for predicting sepsis in leukemia patients across multiple datasets.

SHAP Analysis Quantifies Feature Contributions to Sepsis Risk in Leukemia Patients

The SHAP analysis was employed to interpret the predictions of the best-performing model the RF. SHAP analysis highlighted the impact of 8 key features on the sepsis prediction model in leukemia patients, ranked by SHAP values (Figure 4A). Figure 4B provides a detailed visualization of the multidimensional influence of each feature, with the SHAP values clearly showing how both high and low levels of these variables contribute to the overall prediction model. CRP demonstrates the strongest predictive power, with elevated levels significantly increasing the risk of sepsis. PCT follows closely, contributing substantially to the model, indicating that higher levels of PCT are also associated with an increased risk. Immune-related markers, such as Neut and Lymph, further play important roles in the prediction, where lower neutrophil and lymphocyte counts are linked to a greater sepsis risk. TT, RBC, and TBA also have notable impacts, with abnormal values correlating with heightened risk. Lastly, SBP adds to the prediction, where lower blood pressure suggests a higher likelihood of sepsis, possibly reflecting cardiovascular instability. This analysis highlights the critical roles of inflammatory, immune, and coagulation markers in predicting sepsis risk.

Figure 4 SHAP (SHapley Additive exPlanations) analysis of the model. (A) Bar plot showing the mean SHAP values for the top features ranked by their contribution to sepsis prediction in leukemia patients. Features include C-reactive protein (CRP), procalcitonin (PCT), neutrophil count (Neut), lymphocyte count (Lymph), thrombin time (TT), red blood cell count (RBC), total bile acid (TBA), and systolic blood pressure (SBP). Higher mean SHAP values indicate greater importance of the feature in the model. (B) SHAP summary plot visualizing the distribution of SHAP values for each feature. Each dot represents an individual data point, with the x-axis showing the SHAP value (feature’s impact on the model output) and the color representing the feature value (yellow for higher values, purple for lower values).

Nonlinear Relationship Between Model Factors and Sepsis Risk in Leukemia Patients and Interactions Between Model Factors

The RCS analysis, as shown in Figure 5A–H, visualizes the nonlinear relationships between sepsis risk and various continuous clinical variables in leukemia patients. In the analysis of sepsis risk factors, we utilized standardized odds ratios to compare the relative impact of various variables on sepsis risk (Supplementary Table 6). The results show that CRP levels above 24.068 mg/L significantly increase the risk of sepsis, with a standardized OR of 3.22 (95% CI: 2.58, 4.03, p<0.001), while lower CRP levels (<24.068 mg/L) are associated with a decreased sepsis risk (OR=0.85, 95% CI: 0.74, 0.98, p=0.029). Additionally, lower of Lymph and Neut (Lymph <5, Neut <5) are associated with a reduced sepsis risk, with standardized ORs of 0.68 (95% CI: 0.61, 0.77, p<0.001) and 0.76 (95% CI: 0.67, 0.86, p<0.001), respectively. Elevated PCT levels (≥0.324 μg/L) are also significantly associated with a higher risk of sepsis (OR=4.91, 95% CI: 2.77, 8.71, p<0.001). Similarly, higher TBA levels (≥6.012 μmol/L) increase the risk of sepsis (OR=1.37, 95% CI: 1.18, 1.59, p<0.001), while lower TT values (<17 seconds) reduce the risk (OR=0.83, 95% CI: 0.71, 0.96, p=0.015). Furthermore, RBC counts above 2.48 x10^6/μL are associated with a significantly decreased risk of sepsis (OR=0.68, 95% CI: 0.60, 0.77, p<0.001). Lastly, lower SBP (<128.63 mmHg) is associated with a decreased risk of sepsis (OR=0.68, 95% CI: 0.61, 0.75, p<0.001).

Figure 5 Restricted cubic spline (RCS) analysis of the continuous variables included in the model. (A–H) The RCS curves illustrate the nonlinear relationships between the risk of sepsis in leukemia patients and the following variables: C-reactive protein (CRP), Procalcitonin (PCT), Neutrophil count (Neut), Lymphocyte count (Lymph), Thrombin time (TT), Red blood cell count (RBC), Total bile acid (TBA), and Systolic blood pressure (SBP).

In addition to the nonlinear associations, pairwise correlation analysis among the eight variables included in the final model revealed key insights into their interactions (Supplementary Figure 1). CRP and PCT show a significant positive correlation (r = 0.231, p < 0.001), as do Neut and Lymph (r = 0.153, p < 0.001). TT exhibits a moderate positive correlation with TBA (r = 0.159, p < 0.001) and weak but significant positive correlations with Neut (r = 0.100, p < 0.001), Lymph (r = 0.092, p < 0.001), and RBC (r = 0.104, p < 0.001). TBA showed positive correlations with Lymph (r = 0.057, p < 0.05). SBP exhibits a negative correlation with PCT (r = −0.066, p < 0.01) and TBA (r = −0.053, p < 0.05), while showing weak but significant positive correlations with Neut (r = 0.048, p < 0.05) and RBC (r = 0.124, p < 0.001). RBC and CRP exhibit a weak negative correlation (r = −0.077, p = 0.001).

Discussion

In this study, we compared multiple machine learning models to evaluate their performance in predicting sepsis in leukemia patients. Based on a comprehensive analysis of multiple metrics, including ROC, DCA, accuracy, sensitivity, specificity, recall, and F1 score, the RF model was selected for its superior overall performance and its ability to effectively balance predictive accuracy and clinical utility.12 These metrics underscore RF’s ability to balance sensitivity and specificity, minimizing missed sepsis cases while maintaining predictive accuracy. Key features, including CPR, PCT, Neut, Lymph, TT, RBC, TBA, and SBP, were identified as significant predictors of sepsis through SHAP analysis. These findings highlight the importance of inflammatory, immune, and coagulation markers in predicting sepsis risk in leukemia patients. The nonlinear relationships between these variables and sepsis risk, as revealed by RCS analysis, provide valuable insights into how changes in these parameters affect patient outcomes.

Previous studies on sepsis diagnosis have identified some biomarkers, including CPR, PCT, RBC, neutrophil-to-lymphocyte ratio (NLR), and prothrombin time (PT).13,14 PCT and CRP are among the most widely used, showing good sensitivity for sepsis screening and monitoring, making them valuable tools for early diagnosis.13,15 PCT, while more specific for bacterial infections, can also be influenced by other factors, reducing its utility in some contexts. Similarly, while CRP is a common marker of systemic inflammation, its elevation in various pathological states limits its effectiveness as a sepsis biomarker.16,17 These challenges are particularly pronounced in leukemia patients, where severe immunosuppression can blunt inflammatory responses, potentially masking sepsis-related elevations in these biomarkers.18,19 This does not negate their relevance in sepsis, as these markers are commonly associated with disseminated intravascular coagulation (DIC) and other coagulopathies in septic patients.20 The significant positive correlation between CRP and PCT (r = 0.231, p < 0.001) aligns with their roles as systemic inflammatory markers. Both are widely used in sepsis detection, and their correlation underscores their complementary contribution to the inflammatory response in septic patients.21 Interestingly, the weak negative correlation between RBC and CRP (r = −0.077, p = 0.001) suggests a potential inverse relationship between systemic inflammation and oxygen-carrying capacity. This finding may reflect the anemia of inflammation commonly observed in sepsis patients, where elevated inflammatory markers are associated with reduced RBC counts.22,23

For inflammatory and immune-related predictors, our final model includes Neut and Lymph counts, both of which provide valuable insights into the systemic inflammatory response and immune status of leukemia patients. NLR, while conceptually relevant, was not explicitly included as a separate predictor because its components (Neut and Lymph) are directly incorporated in the model.24 This approach ensures that the underlying inflammatory dynamics are captured while avoiding redundancy. NLR is a marker of acute inflammation, especially useful for predicting bacterial infections leading to sepsis.25,26 However, as sepsis progresses, most patients quickly exhibit signs of severe immune suppression.27 In leukemia patients, who experience secondary bone marrow suppression and severely reduced leukocyte production,28 NLR may not accurately reflect their true immune status, limiting its reliability in this population. Similarly, the positive correlation between Neut and Lymph (r = 0.153, p < 0.001) highlights the interplay between different components of the immune system, although the suppressed immune response often observed in leukemia may moderate this relationship.29

The protective effects of higher RBC counts and SBP further emphasize the role of cardiovascular stability and oxygen transport in preventing sepsis.30,31 SBP also showed weak but significant positive correlations with Neut (r = 0.048, p < 0.05) and RBC (r = 0.124, p < 0.001), highlighting the interplay between immune response and cardiovascular function.32 Conversely, the negative correlations observed between SBP and PCT (r = −0.066, p < 0.01) and SBP and TBA (r = −0.053, p < 0.05) reflect the hemodynamic instability and liver impairment characteristic of sepsis progression.33 Elevated TT indicates impaired clotting efficiency, which is common in sepsis due to DIC.34 This prolonged TT may reflect the consumption of clotting factors and fibrinolysis, contributing to the higher risk of sepsis-related complications in these patients.20 Higher TBA levels may reflect liver dysfunction, a common complication in sepsis, where impaired bile acid metabolism leads to accumulation in the bloodstream. This can further contribute to systemic inflammation and immune dysregulation, exacerbating the severity of sepsis.35,36 TBA’s positive correlation with Lymph (r = 0.057, p < 0.05) reinforces its role as a marker of liver dysfunction and its downstream effects on immune regulation. TT exhibited a moderate positive correlation with TBA (r = 0.159, p < 0.001), suggesting a potential link between coagulation dysfunction and liver impairment in sepsis.37 This connection is well-documented in the pathophysiology of sepsis, where DIC and hepatic dysfunction are common complications.38 Additionally, TT showed weaker but significant positive correlations with Neut (r = 0.100, p < 0.001), Lymph (r = 0.092, p < 0.001), and RBC (r = 0.104, p < 0.001). These findings further emphasize the multifaceted impact of coagulation abnormalities on immune cells and oxygen transport capacity in septic patients.39 During model development, we considered a wide range of potential predictors, including coagulation markers such as D-dimer and PT. However, after rigorous feature selection using SHAP and other statistical methods, these coagulation markers were not retained in the final model due to their relatively lower contributions to predictive performance within our dataset. Although these markers are known to play a role in the coagulopathy associated with sepsis, their signal in this specific leukemia cohort may have been overshadowed by other variables.

In this study, RF demonstrated clear advantages over traditional methods like logistic regression in predicting sepsis risk among leukemia patients. While logistic regression is valued for its simplicity and interpretability, its reliance on linear assumptions and limited ability to capture complex interactions among predictors constrained its performance.40 By contrast, RF achieved higher accuracy, recall, and F1 score due to its ability to model nonlinear relationships and interactions. Furthermore, the use of SHAP analysis addressed RF’s interpretability limitations, providing meaningful insights into the importance of individual predictors, thereby enhancing its clinical utility.41 Although individual biomarkers may contribute to sepsis prediction, their accuracy is often limited.42 The underlying mechanisms of sepsis in leukemia are complex, involving multiple biological pathways and immune responses, making it difficult for a single biomarker to capture the entire pathological process.43,44 SHAP analysis identified eight key predictors of sepsis in leukemia patients, these variables provide a holistic view of sepsis risk by capturing inflammation, immune function, coagulation abnormalities, and hemodynamic stability. For example, CRP and PCT are established inflammatory biomarkers, while Neut and Lymph reflect immune status, critical in immunosuppressed leukemia patients. TT highlights coagulopathy, a hallmark of sepsis, and TBA reflects liver dysfunction, a frequent complication. This comprehensive approach ensures that the model’s predictions are grounded in clinically relevant markers, supporting its potential utility in risk stratification and management. Combining multiple biomarkers offers a more comprehensive view of the dynamic changes in immune function, inflammatory response, and pathogen activity, thereby enhancing the accuracy of early sepsis detection and risk assessment.45 Recently, combined biomarker models have emerged as a key area of research for sepsis prediction, and their integration into machine learning models has shown improved sensitivity and specificity.46

The eight biomarkers selected in this study are easily accessible in clinical settings and reflect various aspects of inflammation, immune function, and pathological changes, providing a more accurate assessment of sepsis risk in leukemia patients compared to single biomarkers. Their combined use significantly improved the model’s predictive accuracy, offering clinicians better tools for identifying high-risk patients. This study has limitations. The data were sourced from a single medical center, which may introduce regional bias and limit the generalizability of the model. Additionally, the retrospective nature of the study, while potentially introducing inherent biases, was mitigated through rigorous preprocessing techniques such as multiple imputation and cross-validation. Nonetheless, external validation using multi-center data from diverse geographical regions is essential to ensure the robustness and broader applicability of the model. Collaborative efforts with other institutions are underway to address these limitations and improve its clinical utility across varied patient populations.

Conclusion

The RF model demonstrated strong predictive performance for sepsis in leukemia patients, using eight key biomarkers. This model provides a valuable tool for early sepsis detection, enhancing clinical decision-making in high-risk patients.

Declaration of Figures Authenticity

All figures submitted have been created by the authors who confirm that the images are original with no duplication and have not been previously published in whole or in part.

Data Sharing Statement

The datasets from Guangdong Medical University Hospital used and analyzed in this study are available from the corresponding author upon reasonable request.

Ethics Approval and Consent to Participate

This study was reviewed and approved by the Ethics Committee of the Affiliated Hospital of Guangdong Medical University (Approval No. PJKT2024-211). The data were anonymized, eliminating the need for informed consent. This study complies with the Declaration of Helsinki.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This study was supported by the Big Data Platform of Affiliated Hospital of Guangdong Medical University.

Disclosure

The authors declared no competing interests for this work.

References

1. Bauer M, Gerlach H, Vogelmann T, Preissing F, Stiefel J, Adam D. Mortality in sepsis and septic shock in Europe, North America and Australia between 2009 and 2019- results from a systematic review and meta-analysis. Crit Care. 2020;24(1):239. doi:10.1186/s13054-020-02950-2

2. Jantunen E, Hämäläinen S, Pulkki K, Juutilainen A. Novel biomarkers to identify complicated course of febrile neutropenia in hematological patients receiving intensive chemotherapy. Eur J Haematol. 2024;113(4):392–399. doi:10.1111/ejh.14264

3. Malik IA, Cardenas-Turanzas M, Gaeta S, et al. Sepsis and acute myeloid leukemia: a population-level study of comparative outcomes of patients discharged from Texas Hospitals. Clin Lymphoma Myeloma Leuk. 2017;17(12):e27–e32. doi:10.1016/j.clml.2017.07.009

4. Alam ST, Dongarwar D, Lopez E, et al. Disparities in mortality among acute myeloid leukemia-related hospitalizations. Cancer Med. 2023;12(3):3387–3394. doi:10.1002/cam4.5084

5. Bleakley G, Cole M. Recognition and management of sepsis: the nurse’s role. Br J Nurs. 2020;29(21):1248–1251. doi:10.12968/bjon.2020.29.21.1248

6. Cho DS, Schmitt RE, Dasgupta A, Ducharme AM, Doles JD. Acute and sustained alterations to the bone marrow immune microenvironment following polymicrobial infection. Shock. 2022;58(1):45–55. doi:10.1097/SHK.0000000000001951

7. Vago L, Gojo I. Immune escape and immunotherapy of acute myeloid leukemia. J Clin Invest. 2020;130(4):1552–1564. doi:10.1172/JCI129204

8. Peseski AM, McClean M, Green SD, Beeler C, Konig H. Management of fever and neutropenia in the adult patient with acute myeloid leukemia. Expert Rev Anti Infect Ther. 2021;19(3):359–378. doi:10.1080/14787210.2020.1820863

9. Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383–400. doi:10.1007/s00134-019-05872-y

10. Angione C, Silverman E, Yaneske E. Using machine learning as a surrogate model for agent-based simulations. PLoS One. 2022;17(2):e0263150. doi:10.1371/journal.pone.0263150

11. Wong A, Otles E, Donnelly JP, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med. 2021;181(8):1065–1070. doi:10.1001/jamainternmed.2021.2626

12. Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform. 2023;24(2). doi:10.1093/bib/bbad002

13. Huang YH, Chen CJ, Shao SC, et al. Comparison of the diagnostic accuracies of monocyte distribution width, procalcitonin, and C-reactive protein for sepsis: a systematic review and meta-analysis. Crit Care Med. 2023;51(5):e106–e114. doi:10.1097/CCM.0000000000005820

14. Póvoa P, Coelho L, Dal-Pizzol F, et al. How to use biomarkers of infection or sepsis at the bedside: guide to clinicians. Intensive Care Med. 2023;49(2):142–153. doi:10.1007/s00134-022-06956-y

15. Kubo K, Sakuraya M, Sugimoto H, et al. Benefits and harms of procalcitonin- or C-reactive protein-guided antimicrobial discontinuation in critically ill adults with sepsis: a systematic review and network meta-analysis. Crit Care Med. 2024;52(10):e522–e534. doi:10.1097/CCM.0000000000006366

16. Paudel R, Dogra P, Montgomery-Yates AA, et al. Procalcitonin: a promising tool or just another overhyped test? Int J Med Sci. 2020;17(3):332–337. doi:10.7150/ijms.39367

17. Rizo-Téllez SA, Sekheri M, Filep JG. C-reactive protein: a target for therapy to reduce inflammation. Front Immunol. 2023;14:1237729. doi:10.3389/fimmu.2023.1237729

18. Schlaweck S, Radcke A, Kampmann S, Becker BV, Brossart P, Heine A. The immunomodulatory effect of different FLT3 inhibitors on dendritic cells. Cancers. 2024;16(21):3719. doi:10.3390/cancers16213719

19. Chen S, Zhu H, Jounaidi Y. Comprehensive snapshots of natural killer cells functions, signaling, molecular mechanisms and clinical utilization. Signal Transduct Target Ther. 2024;9(1):302. doi:10.1038/s41392-024-02005-w

20. Giustozzi M, Ehrlinder H, Bongiovanni D, et al. Coagulopathy and sepsis: pathophysiology, clinical manifestations and treatment. Blood Rev. 2021;50:100864. doi:10.1016/j.blre.2021.100864

21. Pierrakos C, Velissaris D, Bisdorff M, Marshall JC, Vincent JL. Biomarkers of sepsis: time for a reappraisal. Crit Care. 2020;24(1):287. doi:10.1186/s13054-020-02993-5

22. Erickson YO, Samia NI, Bedell B, Friedman KD, Atkinson BS, Raife TJ. Elevated procalcitonin and C-reactive protein as potential biomarkers of sepsis in a subpopulation of thrombotic microangiopathy patients. J Clin Apher. 2009;24(4):150–154. doi:10.1002/jca.20205

23. Dulmovits BM, Tang Y, Papoin J, et al. HMGB1-mediated restriction of EPO signaling contributes to anemia of inflammation. Blood. 2022;139(21):3181–3193. doi:10.1182/blood.2021012048

24. Prüfer S, Weber M, Sasca D, et al. Distinct signaling cascades of TREM-1, TLR and NLR in neutrophils and monocytic cells. J Innate Immun. 2014;6(3):339–352. doi:10.1159/000355892

25. Wu H, Cao T, Ji T, Luo Y, Huang J, Ma K. Predictive value of the neutrophil-to-lymphocyte ratio in the prognosis and risk of death for adult sepsis patients: a meta-analysis. Front Immunol. 2024;15:1336456. doi:10.3389/fimmu.2024.1336456

26. Zahorec R. Neutrophil-to-lymphocyte ratio, past, present and future perspectives. Bratisl Lek Listy. 2021;122(7):474–488. doi:10.4149/BLL_2021_078

27. Venet F, Monneret G. Advances in the understanding and treatment of sepsis-induced immunosuppression. Nat Rev Nephrol. 2018;14(2):121–137. doi:10.1038/nrneph.2017.165

28. Thol F, Döhner H, Ganser A. How I treat refractory and relapsed acute myeloid leukemia. Blood. 2024;143(1):11–20. doi:10.1182/blood.2023022481

29. Greaves M. Infection, immune responses and the aetiology of childhood leukaemia. Nat Rev Cancer. 2006;6(3):193–203. doi:10.1038/nrc1816

30. Tang Y, Sorenson J, Lanspa M, Grissom CK, Mathews VJ, Brown SM. Systolic blood pressure variability in patients with early severe sepsis or septic shock: a prospective cohort study. BMC Anesthesiology. 2017;17(1):82. doi:10.1186/s12871-017-0377-4

31. Piagnerelli M, Boudjeltia KZ, Vanhaeverbeek M, Vincent JL. Red blood cell rheology in sepsis. Intensive Care Med. 2003;29(7):1052–1061. doi:10.1007/s00134-003-1783-2

32. Gong X, Xia L, Su Z. Friend or foe of innate lymphoid cells in inflammation-associated cardiovascular disease. Immunology. 2021;162(4):368–376. doi:10.1111/imm.13271

33. Strnad P, Tacke F, Koch A, Trautwein C. Liver - guardian, modifier and target of sepsis. Nat Rev Gastroenterol Hepatol. 2017;14(1):55–66. doi:10.1038/nrgastro.2016.168

34. Zhang Y, Khalid S, Jiang L. Diagnostic and predictive performance of biomarkers in patients with sepsis in an intensive care unit. J Int Med Res. 2019;47(1):44–58. doi:10.1177/0300060518793791

35. Wang Y, Deng K, Lin P, et al. Elevated total bile acid levels as an independent predictor of mortality in pediatric sepsis. Pediatr Res. 2024. doi:10.1038/s41390-024-03438-3

36. Leonhardt J, Dorresteijn MJ, Neugebauer S, et al. Immunosuppressive effects of circulating bile acids in human endotoxemia and septic shock: patients with liver failure are at risk. Critical Care. 2023;27(1):372. doi:10.1186/s13054-023-04620-5

37. Yan J, Li S, Li S. The role of the liver in sepsis. Int Rev Immunol. 2014;33(6):498–510. doi:10.3109/08830185.2014.889129

38. Anderko RR, Gómez H, Canna SW, et al. Sepsis with liver dysfunction and coagulopathy predicts an inflammatory pattern of macrophage activation. Intensive Care Med Exp. 2022;10(1):6. doi:10.1186/s40635-022-00433-y

39. Unar A, Bertolino L, Patauner F, Gallo R, Durante-Mangoni E. Pathophysiology of disseminated intravascular coagulation in sepsis: a clinically focused overview. Cells. 2023;12(17):2120. doi:10.3390/cells12172120

40. Couronné R, Probst P, Boulesteix AL. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinf. 2018;19(1):270. doi:10.1186/s12859-018-2264-5

41. Ponce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: explaining supervised machine learning model predictions in drug development. Clin Transl Sci. 2024;17(11):e70056. doi:10.1111/cts.70056

42. Faix JD. Biomarkers of sepsis. Crit Rev Clin Lab Sci. 2013;50(1):23–36. doi:10.3109/10408363.2013.764490

43. Xiao YP, Cheng YC, Chen C, Xue HM, Yang M, Lin C. Identification of the shared gene signatures of HCK, NOG, RNF125 and biological mechanism in pediatric acute lymphoblastic leukaemia and pediatric sepsis. Mol Biotechnol. 2023;67:80–90. doi:10.1007/s12033-023-00979-6

44. Xia S, Zhao YC, Guo L, et al. Do antibody-drug conjugates increase the risk of sepsis in cancer patients? A pharmacovigilance study. Front Pharmacol. 2022;13:967017. doi:10.3389/fphar.2022.967017

45. Barichello T, Generoso JS, Singer M, Dal-Pizzol F. Biomarkers for sepsis: more than just fever and leukocytosis-a narrative review. Crit Care. 2022;26(1):14. doi:10.1186/s13054-021-03862-5

46. Rojas JC, Fahrenbach J, Makhni S, et al. Framework for integrating equity into machine learning models: a case study. Chest. 2022;161(6):1621–1627. doi:10.1016/j.chest.2022.02.001

View original article

JOURNAL OF INFLAMMATION RESEARCH

分享书签

0 0 0 0 0 0 0

More from this channel

Comprehensive Sepsis Risk Prediction in Leukemia Using a Random Forest Model and Restricted Cubic Spline Analysis

留言 (0)