This study used the same dataset as was used to develop the prediction models in the study from Stevens et al.; the full study protocol can be consulted there [16].
This retrospective two-centred cohort study, performed in two non-university teaching hospitals in the Netherlands (Catharina Hospital, Eindhoven; Elkerliek Hospital Helmond), included 446 patients who have had an EA for complaints of abnormal uterine bleeding [16]. Both hospitals used similar ablation techniques between 2004 and 2013, being Cavatherm® (Veldana Medical SA, Morges, Switzerland), Gynecare Thermachoice® (Ethicon, Sommerville, USA), and Thermablate® EAS (Idoman, Ireland). Recent publications have shown that these ablation techniques were equally effective [14, 25]. Local medical ethical review boards approved the study. All patients gave informed consent.
Patients were identified in the electronic patient care system by using specified search terms related to endometrial ablation. Exclusion criteria were a postmenopausal status at time of EA or (suspicion of) endometrial malignancy or uterine cavity deformations (adenomyosis, anomalies, fibroids, or a polyp). Follow-up period after treatment was at least 2 years. This time-interval was chosen because previous literature stated that most re-interventions were done within 2 years. Follow-up ended on the day of hysterectomy, in case of death or on April 15, 2015 [9, 17, 18, 25,26,27].
Data were extracted from individual patient files by two researchers (K.S. and D.M [16].). Next, patients were asked to fill in a questionnaire regarding follow-up information. In case of non-response, patients were contacted by letter and ultimately by telephone by the authors of Stevens et al. [16]. The used questionnaire contained questions based on significant variables predicting surgical re-intervention after EA that were previously published [2, 5, 8, 11,12,13,14,15,16,17, 28, 29].
The entire dataset consists of 446 patients with different categorical and continuous variables. For the machine learning algorithms all features were extracted from the original dataset of Stevens et al. [16]. A total of five pre-operative variables were used to develop the machine learning model. This were the pre-operative variables that were significant predictors in the final multivariate re-intervention model of Stevens et al. (age, duration of menstruation, dysmenorrhea, parity, and previous caesarean section) [16]. The continuous data were not discretized into categories as was done in the development of the previously published logistic regression model [16].
Development of the logistic regression modelStatistical analysis of the data was performed by using SPSS 21.0 for Windows (IBM Corp., Armonk, NY, USA).
To determine which variables were significant, univariable logistic regression analysis was used.
The variables with a p-value < .10 were used in the multivariable analysis. This was followed by a backward stepwise manual selection process, progressively excluding the variable with the highest p-value [16].
As described by Steyerberg et al., the p-value of 0.10 was used to prevent a potential incorrect exclusion of a predictive factor. This would be far more detrimental for the test than missing a potential discriminating factor [28, 29].
Multicollinearity and interaction between the significant variables in the model was tested. Bootstrap resampling was used for internal validation (n = 5000) [29, 30]. To correct for over-optimism of the model, regression coefficients were multiplied by the calculated shrinkage factor. A detailed description of the development of the LR model can be found in the study of Stevens et al. [16].
Development of the machine learning model (random forest model)For the development of the machine learning model, we used a random forest (RF) technique. This is a machine learning method used for classification and regression, which operates by constructing a large ensemble of decision trees on training data [22, 23, 31]. Each tree in the random forest is built using a bootstrap sample randomly drawn from a training dataset. This results in a reduction of variance and corrects for a single decision tree ability to overfit to a training set. Each tree in the forest gives an individual prediction on the outcome measure. For a classification problem (in this case, surgical re-intervention or no surgical re-intervention after EA) the final random forest model averages the prediction of all the trees in the forest [21, 23, 31, 32].
Making the model, we first trained a RF model using the five following pre-operative predictors: age, duration of menstruation, dysmenorrhea, parity, and previous caesarean section. These factors were associated with a higher probability of surgical re-intervention within 2 years after EA in the previously published multivariate logistic regression model [16].
As described above, a RF model is an ensemble of many decision tree models. Figure 1 shows an example of an individual decision tree in the random forest. The decision tree is a flowchart-like binary branch structure. At each “node split” in the tree, the data are divided in two, based on the value of variable of the decision node. If no more splits are possible a prediction will be calculated for the cases in the final leaf node [23, 31, 33].
Fig. 1An illustration of a decision tree in the random forest model. The decision tree directs each case from the root node to the leaf nodes, resulting in a prediction. N, number; SRR, surgical re-intervention rate
At each node split, a random subset of features (such as duration of menstruation and parity) is considered; this is done to avoid over-selection of strong predictive features, leading to similar splits in the trees. This finally leads to a robust model and prevents model overfitting [21, 23, 31,32,33,34].
Following this process, the classification result of a RF model is produced by computing a large ensemble of those trees and averaging the prediction of each single decision tree on surgical re-intervention. Figure 2 shows a simplified example of the RF model. In practice, the decision trees and the resulting prediction model contain a large number of leaf nodes [31, 35].
Fig. 2A simplified random forest model for the prediction of the surgical re-intervention
The RF was trained in MATLAB (2018b) using the TreeBagger function in the Statistics and Machine Learning Toolbox.
To predict the chance of surgical re-intervention within 2 years after EA, the model was initially trained and internally validated on the 446 cases. To make a good comparison between de RF and LR, the same validation technique was used. Therefore, a bootstrap resampling of 5000 was used. The performance measure area under the receiver operating curve (AUROC) was calculated.
Comparison of the prediction modelsThe performance of the models was tested and compared using the AUROC. Accuracy was not used as performance measure, since the database is unbalanced (ratio between re-intervention and no re-intervention 1:8 (53:446)) [36]. It was chosen to use the performance measures (AUC) as used in the previous study of Stevens et al. [16]. In this way a good comparison can be made.
Predictors of surgical re-intervention: variable importance measure (VIM)To identify important predictors of surgical re-intervention, we used two methods for analysis.
First, a statistical univariate logistic regression analysis was applied to assess the importance of each variable. For each variable, an odds ratio (OR) with a 95% confidence interval (CI) was calculated.
Secondly, a permutation-based variable importance was used. This VIM is based on the AUC statistic of the ML model. The AUC statistic is computed by randomly permutating (leaving out) the values of predictor x and comparing the resulting AUC to the not permutated AUC. Leaving out an important feature will result in a lower AUC of the ML model, while leaving out an unimportant feature will not change the AUC significantly [23, 35, 37].
留言 (0)