Characterizing the Importance of Hematologic Biomarkers in Screening for Severe Sepsis using Machine Learning Interpretability Methods

Abstract

Abstract Background Early detection of sepsis in patients admitted to the emergency department (ED) is an important clinical objective to help reduce morbidity and mortality. We aimed to use data from Electronic Health Records (EHR) system to characterize the relative importance of a new biomarker called Monocyte Distribution Width (MDW) that has been recently approved by the US Food and Drug Administration (FDA) for sepsis screening in the presence of routinely available hematologic parameters and vital signs measures. Methods In this retrospective cohort study, we included ED patients admitted to the MetroHealth hospital (a large regional safety-net hospital in Cleveland, OH, USA) with suspected infection who later developed severe sepsis. All adult patients presenting to the ED were eligible for inclusion and encounters that did not have complete blood count with differential data or vital signs data were excluded. We developed seven data models and an ensemble of four high accuracy machine learning (ML) algorithms using the Sepsis-3 diagnostic criteria for validation. Using the results generated by the high accuracy ML models, we applied the Local Interpretable Model- Agnostic Explanation (LIME) and Shapley Additive Value (SHAP) post-hoc ML interpretability methods to characterize the contributions of individual hematologic parameters, including MDW, vital signs measures in screening for severe sepsis. Findings We evaluated 7071 adult patients from 303,339 adult ED visits occurring between May 1st, 2020 and August 26th, 2022. Implementation of the seven data models reflected the ED clinical workflow with incremental addition of standard complete blood count (CBC), CBC with differential, with MDW, and finally vital signs measures. Random forest and deep neural network model reported classification area under the receiver operating characteristic curve (AUC) value of up to 93% (CI 92 : 94) and 90% (CI 88 : 91) over data model with hematologic parameters and vital signs measures. We applied the LIME and SHAP ML interpretability methods on these high accuracy ML models. Both the interpretability methods were consistent in their findings that the value of MDW is grossly attenuated (low feature importance scores of 0.015 (SHAP) and 0.0004 (LIME)) in the presence of other routinely reported hematologic parameters and vital signs measures for severe sepsis detection. Interpretation Using ML interpretability methods applied to EHR data, we show that MDW can be replaced with routinely reported CBC with differential together with vital signs measures for severe sepsis screening. MDW requires specialized laboratory equipment and modification of existing care protocols; therefore, these results could guide decisions about allocation of limited resources in cost constrained care settings. Additionally, the analysis shows the practical application of ML interpretability methods in clinical decision making.

Competing Interest Statement

The authors report no conflicts of interest related to this manuscript. YT receives research funding from Beckman Coulter Inc. (Brea CA USA). Beckman Coulter Inc. played no role in the design or analysis of this study or its resultant manuscript.

Funding Statement

National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health/National Center for Advancing Translational Sciences, National Institute on Drug Abuse

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

MetroHealth hospital system institutional review board (IRB) (approval: STUDY00000097)

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The machine learning workflows and performance metrics were implemented using the Scikit libraries. The individual patient records cannot be made publicly available due to regulatory reasons. Models and data can be made available on request; however, this requires the execution of a data transfer agreement approved by the participating institutions together with an Institutional Review Board (IRB) or equivalent ethics approval for the proposed study.

留言 (0)

沒有登入
gif