Automatic detection of severely and mildly infected COVID-19 patients with Supervised Machine Learning Models

Elsevier

Available online 1 June 2022

IRBMHighlights•

An effective feature-set has been obtained in determining the prognosis of COVID-19.

ML methods can be used to reduce the pressure on COVID-19 intensive care units.

The LWL, K*, NB, and KNN models were the most successful in detecting patients.

Provided an open access data source for the dataset used in this article.

AbstractObjectives

When the prognosis of COVID-19 disease can be detected early, the intense-pressure and loss of workforce in health-services can be partially reduced. The primary-purpose of this article is to determine the feature-dataset consisting of the routine-blood-values (RBV) and demographic-data that affect the prognosis of COVID-19. Second, by applying the feature-dataset to the supervised machine-learning (ML) models, it is to identify severely and mildly infected COVID-19 patients at the time of admission.

Material and methods

The sample of this study consists of severely (n = 192) and mildly (n = 4010) infected-patients hospitalized with the diagnosis of COVID-19 between March-September, 2021. The RBV-data measured at the time of admission and age-gender characteristics of these patients were analyzed retrospectively. For the selection of the features, the minimum-redundancy-maximum-relevance (MRMR) method, principal-components-analysis and forward-multiple-logistics-regression analyzes were used. The features set were statistically compared between mild and severe infected-patients. Then, the performances of various supervised-ML-models were compared in identifying severely and mildly infected-patients using the feature set.

Results

In this study, 28 RBV-parameters and age-variable were found as the feature-dataset. The effect of features on the prognosis of the disease has been clinically proven. The ML-models with the highest overall-accuracy in identifying patient-groups were found respectively, as follows: local-weighted-learning (LWL)-97.86%, K-star (K*)-96.31%, Naive-Bayes (NB)-95.36% and k-nearest-neighbor (KNN)-94.05%. Also, the most successful models with the highest area-under-the-receiver-operating-characteristic-curve (AUC) values in identifying patient groups were found respectively, as follows: LWL-0.95%, K*-0.91%, NB-0.85% and KNN-0.75%.

Conclusion

The findings in this article have significant a motivation for the healthcare professionals to detect at admission severely and mildly infected COVID-19 patients.

Graphical abstractDownload : Download high-res image (113KB)Download : Download full-size imageKeywords

COVID-19

biochemical and hematological biomarkers

routine blood values

feature selection methods

classification

supervised machine learning models

View Abstract

© 2022 AGBM. Published by Elsevier Masson SAS. All rights reserved.

留言 (0)

沒有登入
gif