This study followed the CheckList for EvaluAtion of Radiomics research (CLEAR) guidelines to ensure comprehensive reporting for more reproducible and transparent research [27]. Details of the completed checklist can be found in Appendix I. The technical workflow is shown in Fig. 1.
Fig. 1Study populationPatients with histologically confirmed NPC who were treated with radiotherapy at the Queen Elizabeth Hospital (QEH), Hong Kong, between 2008 and 2018 and at the Prince of Wales Hospital (PWH), Hong Kong, between 2020 and 2021 were retrospectively enrolled in this study. Institutional review board ethics approval was obtained from each institution, and patient informed consent was waived due to the retrospective nature of the study. The exclusion criteria were: (1) patients with distant metastasis at diagnosis, (2) patients who did not have the necessary CT image or dose distribution.
The QEH dataset was used for model development, with the PWH dataset used for external validation. For the validation dataset, a minimum sample size of 81 patients was determined using MedCalc v22.018, to detect an AUC of 0.7 versus a null hypothesis value of 0.5 with 80% power and 0.05 significance level, assuming severe OM incidence of 40% [9, 28]. Patients were recruited consecutively by the scheduled start date of radiotherapy. The patient recruitment diagram is shown in Fig. 2.
Fig. 2Patient recruitment diagram. *See feature data preprocessing for further details
Imaging acquisitionThe contrast-enhanced CT image used for RT planning and the resulting planned radiation dose distribution were collected for each patient. Imaging acquisition parameters are provided in Appendix A.
Clinical data collection and outcome definitionClinical data included age, sex, height, weight at CT simulation, TNM staging according to the 8th Edition of UICC/AJCC [29, 30], chemotherapy regimen and details of the radiotherapy delivery. The severe OM label was assigned to patients who had a maximum CTCAE grade of 3 (severe) or higher during weeks 1 to 7 of radiotherapy [31, 32]. The missing data handling strategy is reported in Appendix B.
VOI segmentationThe extended oral cavity and pharyngeal constrictor (PC) muscles were selected as VOIs for this study. Several studies have previously investigated the extended oral cavity for predicting OM [10, 13, 15]. This VOI, as defined by the guidelines by Brouwer et al. [33], contained several areas that typically exhibit the most severe mucosal changes, including the soft palate, tongue and floor of the mouth [34]. The PC VOI, consisting of the superior, middle and inferior muscles, was frequently contoured as part of the RT planning process and included part of the mucosa at risk of severe reaction. Specifically, the hypopharyngeal mucosa was reported as the region experiencing the most severe OM after the soft palate [34]. Moreover, Tao et al. reported the radiation dose to the pharyngeal space as a significant predictor of OM [35]. The extended oral cavity and PC contours were automatically segmented using a deep learning model (see Appendix C). The primary and neck nodal GTVs (GTVp and GTVn), used for contouromic feature calculation, were segmented by clinicians during radiotherapy planning.
Pre-processing and feature extractionRadiomic features were extracted from the planning CT, including shape, first-order and texture features. Dosiomic features were extracted from the planned radiation dose, including first-order and texture features. Original first-order mean, median, minimum and maximum dose features were categorized as DVH features in subsequent analysis. Additional fractional volume and fractional dose DVH features were also calculated. Contouromic features were computed as in [24] for GTV-OAR pairs for both GTVp and GTVn to quantify the difficulty of dose sparing for each patient. The total number of extracted features was 2206, including the following feature types: clinical (8), DVH (126), radiomic (784), dosiomic (712), contouromic GTVp-OAR (288), contouromic GTVn-OAR (288). Details of the feature extraction settings are provided in Appendix D.
Feature selectionFeature selection was performed in two phases. Firstly, features with low stability and high redundancy were removed in an unsupervised manner. Removal of unstable features was conducted as outlined in Appendix E. Redundant features were removed using a hierarchical clustering approach outlined in Appendix F. Secondly, supervised feature selection utilizing the severe OM outcome label was applied as part of the model pipeline, using Maximum-Relevance Minimum-Redundancy (mRMR) algorithm, implemented using the “mRMR-selection” package for Python [36].
Model developmentTwo types of models were developed in this study: 1) conventional models using only clinical and DVH features and 2) multi-omic models using clinical, DVH, radiomic, dosiomic and contouromic features.
The model pipeline consisted of three steps: feature selection, scaling and model fitting. Different machine learning algorithms were investigated, including logistic regression with Ridge regression, Support Vector Machine (SVM) with linear and radial basis function kernels, Random Forest, XGBoost and Gaussian Naïve Bayes classifier.
The model pipeline hyperparameters, including those for mRMR and for the model, were optimized in a cross-validated grid search outlined in Fig. 3. This was conducted by maximizing the area under the receiver operating characteristic curve (AUC), a discrimination metric that is threshold-invariant and scale-invariant. Further details on the hyperparameter optimization are shown in Appendix G. The optimum settings were re-fitted on the development dataset, obtaining a training performance score for the final model which was then externally validated.
Fig. 3Model development flowchart. ICC = intraclass correlation coefficient, CV = cross-validation, IQR = interquartile range, SVM = support vector machine, RBF = radial basis function, NB = Naïve Bayes
Performance evaluation and validationThe internal validation performance was calculated from the mean AUC across the cross-validation folds and its associated 95% confidence interval (CI). CIs on the apparent (training) score and external validation scores were calculated using 1000 bootstrapped samples from QEH and PWH data, respectively.
Feature importance assessmentFeature importance in the multi-omic model was assessed using the Shapley Additive exPlanations approach (SHAP) [37]. This method quantifies the impact of each feature on the model output.
Decision curve analysisThe decision curve analysis was conducted for the calibrated models. The net benefit, a measure of clinical utility defined in Eq. (1), was plotted against the threshold probability (pt) [38].
$$Netbenefit = \fracpositives - Falsepositives \times \frac }} }}}}$$
(1)
留言 (0)