Determining important features for dengue diagnosis using feature selection methods

Abstract

Objectives: This research aims to determine the important features including symptoms and risk factors for dengue diagnosis. Methods: The dataset for this study is in the form of medical records collected from two hospitals in East Nusa Tenggara Province including Kewapante and Soe hospitals. Feature selection methods including feature importance, recursive feature elimination, correlation matrix from Pearson correlation coefficient and KBest were leveraged to determine important features. Important features were also gathered from fifteen Indonesian medical doctors to confirm the results. To obtain the best significant features for dengue prediction, we used six machine learning techniques including logistic regression, k-nearest neighbors, eXtreme gradient boosting, random forests, Naive Bayes and support vector machines. Results. The random forest classifier yields the highest accuracy for the best combination of features with the accuracy of 0.93 (LR: 0.90 (0.04), KNN: 0.89 (0.04), XGBoost: 0.91 (0.03), RF: 0.93 (0.04), NB: 0.88 (0.09), SVM: 0.89 (0.04)) and precision of 0.90 (LR: 0.86 (0.22), KNN: 0.67 (0.14), XGBoost: 0.77 (0.13), RF: 0.90 (0.13), NB: 0.66 (0.20), SVM: 0.66 (0.18)). This study shows the significant features for dengue diagnosis including fever, fever duration, headache, muscle and joint pain, nausea, vomiting, abdominal pain, shivering, malaise, loss of appetite, shortness of breath, rash, bleeding nose, bitter mouth, temperature and age. Conclusions. This beneficial information can help society in differentiating dengue from non-dengue diseases including malaria, typhoid fever, COVID-19 and other dengue-like symptoms diseases. This is pivotal to educate society to seek medical advice when dengue symptoms appear. Keywords: Dengue fever, Feature selection, Significant dengue features, Dengue prediction, Dengue diagnosis

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was supported by Universitas Katolik Widya Mandira Kupang, East Nusa Tenggara Province Indonesia (044/WM.H9/SKP/IX/2023).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committee/IRB of Universitas Katolik Widya Mandira gave ethical approval for this work (Ethics ID: 001/WM.H9/LPPM/SKKEP/X/2023).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are not available publicly as they are generated from the medical records of patients. However, the unrevealed identity dataset can be provided upon reasonable request to the authors.

留言 (0)

沒有登入
gif