COVID-19 risk prediction scores for mortality: A validation study from the National Registry of COVID-19 in China

To the Editor: Coronavirus disease 2019 (COVID-19) has evolved into a global pandemic, with over 700 million confirmed cases worldwide and still mounting.[1] The clinical impact of this pandemic is immense, given the wide spectrum of disease manifestation, ranging from mild respiratory symptoms to severe pulmonary infections. Identifying high-risk individuals, especially during their initial encounters in clinics and hospital emergency services, is crucial to provide timely effective treatment. Therefore, the timely risk stratification of COVID-19 patients in the emergency room can greatly benefit both infected individuals and healthcare professionals.[2] This study aims to identify features of serious COVID-19 infection at presentation that may lead to mortality of the patients.

In pursuit of this goal, our retrospective diagnostic study was approved by the Ethics Committee of the First Affiliated Hospital of Sun Yat-sen University (No. 2023-239), and the requirement for written informed consent was waived. The study population with confirmed COVID-19 was selected from Wuhan Hankou Hospital and No. Six Hospital of Wuhan in Wuhan between January 1 and March 28, 2020. Demographics and laboratory results of confirmed COVID-19 cases were meticulously extracted from medical records by trained research technicians. The data acquisition process encompassed variable extraction, conversion of variables into standardized units, and data storage. To ensure the quality of data extraction, a team of experienced doctors conducted random checks of both medical records and data extracted by research technicians. All medical documents of these patients were saved in Portable Document Format and subjected to text mining by computer to read unstructured medical records. A team of research technicians randomly checked the quality of data extraction by machine text-mining to ensure data quality. The models developed from the cohorts of the two hospitals in Wuhan were tested with the National Health Commission (NHC) cohort of 18,826 patients with COVID-19 between January 2 and May 12, 2020 (http://www.nhc.gov.cn).

Our study encompassed a total of 36 variables collected upon hospital admission, as listed in Supplementary Table 1, https://links.lww.com/CM9/B956. The primary outcome under scrutiny was in-hospital mortality during the index admission. Continuous variables are presented as means ± standard deviations or medians (Q1, Q3), if deemed appropriate. Categorical variables are presented as numbers and percentages. We compared the baseline characteristics of surviving patients in the training group using the independent t-test for continuous variables, the Kruskal–Wallis test for non-normally distributed continuous variables, and chi-squared test or Fisher’s exact test for categorical variables. To address missing data in both the training and validation sets, we utilized the R software (Version 3.6.1, R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/) for multiple imputations, generating 10 imputations to compute the mean values for variables. Subsequently, min–max normalization was applied to standardize continuous variables.

Three modeling methods were used for variable selection: the absolute value of weight sorting of logistic regression, variable importance sorting in random forest, and Lasso-Cox variable selection. We selected six variables which were the intersection of Lasso-Cox variable selection, top 10 variables of logistic regression absolute value of weight sorting and random forest variables importance sorting. The six most frequently selected variables from these modeling methods were ranked with different values of the area under the receiver operating characteristic curve (AUC). We conducted further selection on the intersection of the six variables using Delong’s test and obtained five variables (age, oxygen saturation [SpO2], neutrophil/lymphocyte ratio [NLR], lactate dehydrogenase [LDH], C-reactive protein [CRP]). Bootstrapping was used to calculate the AUC with a 95% confidence interval (CI) to evaluate model performance. Sensitivity, specificity, positive predictive value, and negative predictive value were determined. Finally, logistic regression was used to establish the prediction model. Model performance was further validated using an external cohort from the NHC of the People’s Republic of China. Stata/MP (Version 14.0, StataCorp LLC, USA. https://www.stata.com/statamp/) and Python (Version 3.6, Python Software Foundation. https://docs.python.org/3.6/) were used to conduct data analyses, and results with two-sided P-values less than 0.05 were considered statistically significant.

Data of 2188 COVID-19 patients were extracted from two hospitals in Wuhan, China. Among them, 1531 patients were assigned to a development cohort and 657 patients assigned to a validation cohort (ratio of 7:3) [Figure 1A]. The median age of the patients was 62 years, and 48.2% (1055/2188) were male [Supplementary Table 1, https://links.lww.com/CM9/B956]. No missing data were observed for patient characteristics; however, 10.0% (219/2188) of the laboratory results were missing because these tests were not performed during admission. A total of 215 patients (9.8%) died during hospitalization. Age, male sex, smoking history, comorbidities with hypertension, coronary heart disease, chronic kidney disease, and stroke were risk factors for mortality in patients with COVID-19 [Supplementary Table 2, https://links.lww.com/CM9/B956].

F1Figure 1:

Data extraction and ROC curves for prediction models. (A) Data extraction for model development and validation. (B) ROC curve for the CRPS model. (C) ROC curve for the SCRPS model. CRPS: COVID-19 risk prediction score; COVID-19: Coronavirus disease 2019; FPR: False positive rate; NHC: National Health Commission; ROC: Receiver operating characteristic curve; S-CRPS: Simplified version of COVID-19 risk prediction score; TPR: True positive rate.

After variable selection from logistic regression, random forest, and Lasso-Cox analyses, six variables, including age, SpO2, NLR, LDH in Units per Liter, CRP, and blood urea nitrogen (BUN), were shortlisted [Supplementary Table 3, https://links.lww.com/CM9/B956]. Among the variables, age, SpO2, NLR, LDH, and CRP were the five top-ranked risk factors for mortality in the logistic regression and random forest analyses. BUN ranked 7th in the logistic regression and 6th in the random forest. Using the Delong method, the addition of BUN to the top five risk factors did not significantly improve the predictive ability of this model. While age, NLR, LDH, and CRP levels were positively correlated with mortality, SpO2 was negatively associated with mortality [Supplementary Table 4, https://links.lww.com/CM9/B956].

Based on the five selected variables, a logistic regression model was developed to predict the death risk index (COVID-19 Risk Prediction Score [CRPS]) of COVID-19 patients. This is expressed as follows:

The standardized method is min–max normalization, and the formula of this method that is implemented on the training set is (for reference): SpO2sta = (SpO2–29)/71, Agesta = (age–15)/82, NLRsta = NLR/121, CRPsta = CRP/284, and LDHsta = (LDH-80)/3048, with a cut-off value of 0.05. This CRPS score detected mortality with a sensitivity of 97% and a specificity of 56%, with an AUC of 0.91 in the development cohort. It detected mortality with a sensitivity of 97% and a specificity of 46%, with an AUC of 0.92 in the validation cohort [Figure 1B].

In the CRPS model, age and SpO2 are variables that can be measured easily at home. SpO2 was negatively associated with mortality (OR = 0.003, 95% CI = 0.002–0.006), whereas patient’s age showed positive association with mortality (OR = 61.069, 95% CI = 29.577–132.290). Considering the feasibility of our model for COVID-19 outpatients, we developed a simplified two-variable model comprising SpO2 and age [Supplementary Table 4, https://links.lww.com/CM9/B956]. The standardized method was the same as that used in the CRPS model, and the cutoff value was estimated to be 0.05. Based on the logistic regression results of the training dataset, Simplified version of COVID-19 risk prediction score (S-CRPS) was constructed as follows:

S-CRPS predicted mortality with a sensitivity of 96%, a specificity of 37%, and an AUC of 0.87 in the development cohort. In the validation cohort, it predicted mortality with a sensitivity of 97%, a specificity of 29%, and an AUC of 0.89 [Figure 1C].

In an external cohort from the NHC of the People’s Republic of China, a total of 30,120 COVID-19 patients were identified across different Chinese provinces [Figure 1A]. In the CRPS model, 1000 patients with five variables were selected for model validation. The CRPS model showed a sensitivity of 97% and specificity of 53%, with an AUC of 0.91 in predicting mortality. In the S-CRPS model, 18,826 patients with the age and SpO2 for model validation. The S-CRPS model showed a sensitivity of 95% and a specificity of 39%, with an AUC of 0.85 [Figure 1B,C].

A prediction model of CRPS with five clinical and biochemical parameters, including age, SpO2, NLR, LDH, and CRP, was shown to have a high predictive efficiency for the mortality of COVID-19. Furthermore, this study established an S-CRPS using only the patient’s age and SpO2 for self-assessment before being sent to a hospital. Given the substantial clinical burden during the pandemic, the need to triage patients for home care vs. hospital care and stratifying admitted patients into intensive care units or low-risk beds is highly desirable.

Several prognostic models have been developed to predict outcomes in patients with COVID-19.[3,4] However, these models need further validation in practical clinical applications, as they may not fully represent the entire disease spectrum and may not account for subjective factors and heterogeneous endpoints. Our study suggests that CRPS with patient age, SpO2, CRP, LDH, and NLR can predict COVID-19 mortality and has been fully validated based on national data from the NHC of China. All five parameters were objective data measurable at the entry point of the clinic or hospital emergency department within 2–3 hours. However, the S-CRPS is suitable for home monitoring of patients living in remote areas.

This study was based entirely on a clinical dataset from a large cohort of Chinese patients; however, this may limit the generalizability of the results to other ethnic groups.

In conclusion, a prediction model of CRPS with five clinical and biochemical parameters and a simplified version of CRPS with only age and oxygen saturation levels showed good predictive efficiencies for the mortality of COVID-19. As the clinical burden is immense during the pandemic, triaging patients for home care vs. hospital care and stratifying admitted patients into intensive care units or low-risk beds is highly desirable.

Funding

This work was supported by the grant from Guangdong Provincial Key R&D Program for COVID-19 (No. 232020012620600001).

References 1. WHO Coronavirus (COVID-19) Dashboard. World Health Organization, 2023. Available from: https://covid19.who.int. [Last accessed on June 9, 2023]. 2. Truog RD, Mitchell C, Daley GQ. The toughest triage - Allocating ventilators in a pandemic. N Engl J Med 2020;382:1973–1975. doi: 10.1056/NEJMp2005689. 3. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet 2020;395:1054–1062. doi: 10.1016/S0140-6736(20)30566-3. 4. Wu C, Chen X, Cai Y, Xia J, Zhou X, Xu S, et al. Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Intern Med 2020;180:934–943. doi: 10.1001/jamainternmed.2020.0994.

留言 (0)

沒有登入
gif