Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

  SFX Search  Buy Article Permissions and Reprints Abstract

Background Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.

Methods Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008–2010, 2011–2013, 2014–2016, and 2017–2019). We trained baseline models using L2-regularized logistic regression on 2008–2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008–2010) and improve OOD performance (2017–2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.

Results The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017–2019 data using features selected from training on 2008–2010 data generally reached parity with oracle models trained directly on 2017–2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.

Conclusions While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

Keywords dataset shift - machine learning - clinical outcomes - feature selection Consent for Publication

Not applicable.


Availability of Data and Materials

The data that support the findings of this study are available from PhysioNet but restrictions apply to the availability of these data, which were used under license for the current study, and thus not publicly available. Data are however available from the authors upon reasonable request and with permission of PhysioNet.


Ethical Considerations

The institutional review boards of Beth Israel Deaconess Medical Center, Boston, Massachusetts and the Massachusetts Institute of Technology, Cambridge, Massachusetts, United States waived the need for ethics approval and consequently participant informed consent due to the deidentification of patient records. All methods were performed in accordance with relevant guidelines and regulations.


Authors' Contributions

L.L.G. and L.S. designed the project with input from all authors. J.P. suggested the use of causal inference models. J.L. performed all experiments. J.L., L.L.G., and L.S. analyzed and interpreted results, with some input from all other authors. J.L. wrote the manuscript with major contributions from L.L.G. and L.S. All authors revised and commented on the manuscript. All authors read and approved the final manuscript.


Publication History

Received: 02 September 2022

Accepted: 04 January 2023

Article published online:
22 February 2023

© 2023. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

留言 (0)

沒有登入
gif