Explainable machine learning approach to predict extubation in critically ill ventilated patients: a retrospective study in central Taiwan

Weaning from mechanical ventilation is an essential but complex issue in critical care and requires the interpretation of multi-domain data in critically ill patients. In the study, we used an explainable ML approach, including domain-based cumulative feature importance, SHAP, PDP and LIME plots, to develop an extubation prediction model with high accuracy and visualised explanations. Notably, the explainability was in line with clinical workflow in critical illness, and we think the proposed extubation prediction model should severe as an autonomous screen tool to aid the clinician for the timely start of breathing trials.

Weaning from mechanical ventilation consists of a patient-tolerated breathing trial followed by extubation, and the start of weaning requires the multi-disciplinary interpretation of data in critical care [3]. Therefore, AI appears to be used for integrating the information in critical care and serves as a decision supporting system to facilliate weaning. Notably, the establishment of the AI model depends on accurate labelling; however, the precise tolerability of distinct breathing trials, mainly T piece and pressure support trial, might somehow be ambiguous and could not be precisely defined in the critical care database [23, 24]. Therefore, we used extubation, which is an explicit, objective and critical medical event in ventilated patients, as the target labelling in the present study to establish an extubation prediction model.

In this study, we found that levels of consciousness/awareness, fluid status relevant features and ventilatory parameters were crucial features with high feature importance to predict extubation one day later, and the finding is in line with the variables of daily screen readiness for spontaneous breathing trial in the respiratory therapist–driven protocol [5]. Indeed, both left- and right-aligned designs can be used to establish the ML models [25]. In brief, left-aligned models predict the incident of the targeted event following a fixed time point, but various time periods among patients may lead to difficulty in the real-world landing of an established model. In contrast, right-aligned models can be used to continuously predict whether the target event will occur after the set time period, so-called real-time or continuous prediction models [25]. Therefore, the right-aligned design in the present study enables the proposed model to serve as an autonomous daily screen system to timely identify patients who were ready for breathing trial and to facilitate the weaning process through early recognition of the potential extubation one day earlier (Supplemental Fig. 3). Furthermore, we think the practical value of the established explainable ML model is high, given that the interpretation of ML models aligns with the real-world workflow in critical care. Recently, the Good Machine Learning Practice for Medical Device Development has incorporated human interpretability into the ML model, the so-called human in the loop [13]. The European Commission also has proposed the ethics guideline for trustworthy AI and includes the need to enhance the explanation of AI-based systems even at the cost of compromised accuracy of the AI-based model [14]. Indeed, safety is a fundamental issue in the field of critical care, and increasing transparency of the model through explanation may at least partly mitigate the concern with respect to the black-box issue [26]. Given that clinicians take accountability with respect to patient safety, the understanding of how the AI systems reach suggested decisions should be crucial in the landing of AI-based systems in the field of critical care [26]. Notably, the design of explanation in accordance with clinical workflow, as we have shown in this study, should further enable clinicians to realise the explainable ML-based model. Nevertheless, it is needed to clarify that to open the black box directly might somehow be difficult, and the current explanation methods are more likely to be post-hoc interpretability of key features through analysing the model after training instead of direct explanations for the entire model [27].

Similar with our study, Chen KH et al. used data of 1,483 patients at three medical ICUs in northern Taiwan and ML approach to establish the shifting of ventilator mode from assisted/controlled mode to spontaneous breath trial, and the accuracy determined by the area under the receiver operating characteristic curve of ML-based model was approximately 0.79 [9]. We think the increased performance of the extubation prediction model in the present study can be attributed not only to a high number of enrolled subjects but also to the explicit target labelling with extubation. Furthermore, the proposed individual-level explanation at distinct time points might serve to continuously monitor the readiness for extubation. In brief, gradual improvement of crucial clinical parameters and steady increase of extubation probability indicates the readiness for extubation of an individual patient (Supplemental Fig. 5). The aforementioned findings further highlight that explanation that is consistent with clinical evidence should enable the clinicians to work with AI, the so-called Human-AI Team [13].

Indeed, feature selection is an essential issue given that a high number of features might be a concern with regard to landing, particularly in the edge device [28, 29]. We hence used recursive feature elimination and found a high accuracy while using the top 20 features in this study (Supplemental Fig. 3) [30]. In line with our findings, Roimi et al. used merely 50 features from 7000 features among the two critical care databases at Beth Israel Deaconess Medical Center and Rambam Health Care Campus to develop an ML-based model to predict bloodstream infections in critically ill patients [31]. Similarly, Jia et al. used 25 features in the Medical Information Mart for Intensive Care (MIMIC) III databases and convolutional neural networks approach to establish a decision support system for suggesting breathing trial, with the accuracy was 0.86 [10]. Moreover, Xie et al. employed merely 9–12 variables to establish an easy-to-use, machine learning-based mortality prediction model through using data of the Medical Information Mart for Intensive Care (MIMIC) III database [32]. These studies and our data demonstrate the potential to establish a model with high accuracy with a reasonable number of features for practical landing.

With respect to the comparison among distinct ML models, we used the Delong test to determine the difference in performance among ML models [33] (Supplemental Table 3). Similar to our previous studies, we found that the tree-based models, including XGBoost, CatBoost, LightGBM and RF, had an apparently better performance compared with those in LR and postulated that the relatively low performance of LR may result from the assumption of linear correlation among features in LR [16, 17]. We also found that XGBoost, LightGBM and CatBoost had a slightly higher performance than that in RF and speculated this minor difference might potentially be attributed to the high flexibility with a number of adjustable hyperparameters of XGBoost, LightGBM and CatBoost. However, we think the difference among XGBoost, Catboost and LightGBM was not the performance but the easy categorical data preprocessing in Catboost as well as the less hardware requirement in LightGBM.

There are limitations in this study. First, this study used a single hospital database, and external validation is warranted to confirm our findings. Second, the retrospect design and the decision of extubation are individualised, but the study hospital is a referral centre in central Taiwan with the administration of intensivists as well as respiratory therapies that might mitigate the concern. Third, the established model predicts the timing of extubation instead of successful weaning (i.e. extubation without re-intubation); however, the proportion of re-intubation in the present study is consistent with previous studies (Supplemental Fig. 6). Fourth, the single imputation method by the average value could potentially lead to a bias in this study.

留言 (0)

沒有登入
gif