Investigating Machine Learning Techniques for Predicting Risk of Asthma Exacerbations: A Systematic Review

Characteristics of Included Studies

Characteristics of the 20 studies included in this review are elaborated in Fig. 3. The included studies were conducted in a range of different countries, represented in Fig. 3a. (A study [23] that used an international data set from multiple countries is not included in the figure.) Importantly, as can be seen from Fig. 3a, the USA is the main country of origin for many of these studies predicting asthma attack risk using ML techniques. As shown in Fig. 3b, 75% of the studies were published as journal articles and the rest as conference papers. Figure 3c shows the distribution of the studies by year of publication. The majority of the studies have been published after 2020. The data sets employed in these studies have different sample sizes according to the number of participants or records (data instances). These details are presented in Table 1, showing the absolute values and percentages. Table 1 highlights the distribution of the two classes of the target variable: asthma attacks, both absent and present, as well as the portions of data used for training and testing the prediction models.

In this review, we identify the data sources that the previous studies incorporated to predict the risk of asthma attacks. Table 2 shows the different data domains including biological, clinical, environmental and meteorological, hospital and medical, and socio-demographic that have been used to develop the asthma risk prediction models. Clinical data may include asthma symptoms, PEFR, and inhalations while prescribed medications and treatments come under medical data. Hospital data consists of hospital admissions, previous attacks, comorbidities, ED visits, etc.

Fig. 3figure 3

Characteristics of the included studies

Table 1 Characteristics of datasets used in previous studiesTable 2 Data sources used in previous studiesML Models

The 20 studies used different ML techniques to predict the risk of asthma attacks. The outcome, asthma exacerbation, was considered either as a categorical variable or as a continuous variable in the form of a probability. Therefore, the studies can be categorised into 2 groups: 1) studies that predict the risk of asthma attacks as a classification (n=18) and 2) studies that predict the risk of asthma attacks as a probability (n=2). The classification group was further divided into 2 groups: studies with (n=11) and without (n=7) a prediction window. The studies with a prediction window can again be subdivided based on the window size: less than (n=6) and more than (n=5) a month. This categorisation of the studies is illustrated in Fig. 4.

Fig. 4figure 4

Presentation of the results of the review

In the literature, many studies predicted the risk of asthma exacerbations without considering the temporal effect. For instance, the impact of weather data from the previous day or a few days ago that might have triggered the symptoms of asthma patients. Therefore, it is critical to consider the impact of the different factors from previous days (lags) in forecasting the risk of asthma attacks. Further, instead of just making a prediction, a group of studies constructed models to predict asthma attacks for a specific time (prediction window), such as the coming 3 days, 7 days, 3 months, 1 year, and so on. Details about these models are presented in the following sections.

Among the ML algorithms employed in these studies, logistic regression (LR), decision trees (DT), random forest (RF), gradient boosting machines (GBM), extreme gradient boost (XGB), support vector machines (SVM), and neural network (NN) algorithms were used most often. Most of the studies exercised the k-fold cross-validation technique to validate the model on the training data. Different studies chose different k values such as 3 [38], 4 [27, 28], 5 [16, 23, 24, 32, 34] and 10 [30, 33, 35].

Hyperparameter Tuning

Hyperparameters in ML models are external configuration settings that are not learned from the data but are set prior to the training process. They influence the overall behaviour of the model and affect its performance. Hyperparameters can be optimised using different techniques. Among the studies included in this review, only a very few [24, 34, 38, 40] conducted hyperparameter tuning. The grid search technique was applied in two of these studies [24, 34] while the randomised search technique was applied in another one [38]. (There are not enough details regarding the hyperparameter tuning process in [40].) Table 3 presents the details of the hyperparameter tuning conducted by past studies. It shows the various hyperparameters tuned with different values and techniques.

Model Performance

The studies used different evaluation metrics to evaluate and compare the performance of the models, as shown in Table 4. These metrics were predominantly accuracy, area under the receiver operating curve (AUC-ROC), specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV). Accuracy is the ratio of correctly predicted outcomes and the total number of samples, simply the overall correctness of the model. The AUC-ROC value represents the capability of the model to distinguish between the classes. Sensitivity, also called recall, measures the completeness of positive predictions, while specificity measures the completeness of negative predictions. PPV, also called precision, is the accuracy of positive predictions, while NPV is the accuracy of negative predictions.

Table 3 Hyperparameter optimisation details of the previous studiesTable 4 Performance of the ML models developed for asthma risk predictionPredicting Risk of Asthma Attacks As a Classification

Nineteen of the previous studies predicted the risk of an asthma attack as a binary classification, and one study [37] considered asthma attacks as a multi-class classification problem. Most studies in the binary category predicted the presence or absence of an asthma attack, while others considered different levels of asthma attack, such as mild, moderate, or severe. This section discusses the studies that predicted the risk of asthma exacerbation as a category. Furthermore, imminent attacks could be predicted for a future time frame, for instance, the possibility of an attack in the next 3 days. While some studies constructed models by taking prediction windows into account, others did not. The following sub-section describes these groups.

Predicting Risk of Asthma Attacks With a Prediction Window of Less Than a Month

In the literature, many studies predicted the risk of asthma exacerbations without considering the temporal effect. For instance, the impact of weather data from the previous day or a few days ago might have triggered the symptoms of asthma patients. Therefore, it is critical to consider the impact of the different factors from previous days (lags) in forecasting the risk of asthma attacks. Further, instead of just making a prediction, a group of studies constructed models to predict asthma attacks for a specific time (prediction window), such as the coming 3 days, 7 days, 3 months, 1 year and so on. Figure 5 depicts the association between prediction window size and the model’s performance. The figure highlights that the shorter the prediction window, the higher the model’s performance. The following section discusses those studies. We synthesised these studies into two categories according to the size of the prediction window as follows. This section represents the studies that classified the risk of asthma attacks using prediction windows for less than a one-month period. Table 5 in the Appendix shows a summary of the studies that developed ML models to predict asthma risk as a category using the prediction window concept.

Fig. 5figure 5

Association between prediction window size and model performance

Six studies [23,24,25,26,27,28] developed models for short-term prediction of severe asthma exacerbations. Five studies [23,24,25,26, 28] kept the prediction window at less than a week while one study [27] used more than 2 weeks (15 days) for the prediction window. In training the ML models, the authors used data from several previous days, which they defined as a lookback window. Four of the studies [24,25,26,27] applied the lookback window concept with the size of the lookback window ranging from 5 to 365 days for near-term prediction. However, [27] included inputs such as count of events for multiple lookback windows sizes - 10, 30, 60, 90, and 365 days. With the aim of exploring telemonitoring data for asthma risk prediction, and using the minimum description length (MDL) principle, [26] found that the telemonitoring alert (out of four zones) on day 7 has higher importance in predicting the asthma risk on day 8. Comorbidity burden and previous exacerbations were important predictors, identified through collinearity [27]. Even though they have implemented principal component analysis (PCA) and recursive feature elimination techniques to identify important features, those are not clearly stated in the article. Most works used tree-based algorithms such as DT [23], RF [27, 28], XGB [24, 27, 28], and CART [25]. Studies also developed models using LR [23, 24, 27, 28], SVM [24, 26], and NN [23, 27] algorithms. Only two studies applied data imbalance handling techniques- random under-sampling [23, 27], random over-sampling [23], and synthetic minority oversampling technique (SMOTE) [23]. There is no clear data available for data imbalance handling in other works.

Predicting Risk of Asthma Attacks with a Prediction Window of More Than a Month

A set of studies defined their prediction window size as greater than or equal to a one-month period. One study [29] used a 1-month period while [30] used 6-months period as the prediction window size. All of the other studies [31,32,33] kept the prediction window size to 1 year. While [29, 30, 32] considered lookback windows similar in size to prediction windows, no clear details of lookback windows are provided in [31, 33]. One study [29] identified clinical factors such as obesity, atopy, medication, asthma controller plan and patient service utilisation history as important asthma risk predictors. Asthma medication also played an important role in the research by [33]. Further, previous asthma exacerbations and length of treatment with biologics were key predictors of asthma risk in [30]. Meanwhile, age, hospital stay, blight prevalence, and neighbourhood inequality are important predictors, according to another study [32]. In developing prediction models, the most common ML algorithms utilised by these studies are LR [29, 30, 33], RF [29, 30, 32, 33], and XGB [29, 31, 33]. Only one study [32] in this category applied random undersampling to handle data imbalance.

Predicting Risk of Asthma Attacks Without a Prediction Window

This section focuses on the studies that developed ML models to predict the risk of asthma attacks without considering a prediction window. Table 6 represents the study summary for these studies. Seven studies [34,

留言 (0)

沒有登入
gif