Predicting pathological complete response to neoadjuvant chemotherapy in breast cancer patients: use of MRI radiomics data from three regions with multiple machine learning algorithms

Patient characteristics

The study design was approved by the local institutional ethics committee, and the records of all patients were anonymized prior to data analysis. Due to the retrospective nature of the study, the institutional review board waived the need for written informed consent. Figure 1 shows the procedures used for patient selection and the overall experimental design. The inclusion criteria were: (a) presence of biopsy-proven primary invasive BCa without distant metastases; (b) completion of a standard NACT regimen, with no treatment prior to NACT; and (c) receipt of surgery after NACT followed by complete postoperative pathological evaluation. The exclusion criteria were: (a) no receipt of NACT, or receipt of a non-standard NACT regimen; (b) no surgery or surgery performed in another hospital; (c) unilateral BCa with multiple lesions; and (d) poor quality of MRI images.

Fig. 1figure 1

Disposition of retrospectively enrolled patients and overall study design

All 210 patients with primary BCa who received NACT at our local hospital between December 2017 and September 2022 and underwent MRI before starting NACT were included. Ten patients with unavailable MRI results and 67 patients with multiple tumors in a single breast were excluded. Among the remaining 133 patients, the pathology results indicated that 49 had pCR status (Miller–Payne grade 5 and the absence of lymph node invasion in the ipsilateral sentinel node or lymph nodes removed during axillary dissection) and 84 had non-pCR status. These patients were randomly divided into a training group (n = 93) and a validation group (n = 40) in a 7:3 ratio. The Supplementary Materials provide details of the NACT protocol, Miller–Payne grading system, definition of pCR, immunohistochemical evaluations, and BCa subtypes.

Image preprocessing and segmentation of regions of interest

All patients were scanned using a 3.0 T MRI scanner (Skyra; Siemens Healthineers) with a 16-channel body coil while in the prone position. T1-weighted images (T1WIs), third-phase enhanced T1-weighted (T1 + C) images, and dynamic contrast-enhanced subtraction images were recorded. A post-processing workstation was used to subtract the T1WI images from the third-phase enhanced T1-weighted (T1 + C) images to obtain subtraction images. The Supplementary Materials and Table S1 provide details of the imaging protocol and parameters. Before feature extraction, T1WI was used as a rigid registration template for all sequences. Image preprocessing with Matlab and SPM software, with registration of the T1WIs, T1 + C images, and silhouette images, was performed to ensure that the three sequences had the same resolution, spacing, and origin, by reducing the potential influence of scanning protocol parameters (https://www.fil.ion.ucl.ac.uk/spm/).

The standardized T1WIs were then imported into the open-source ITK-snap software (www.itksnap.org, version 3.8.0) to manually segment the entire tumor volume of interest (VOI) layer by layer. As described in previous studies, the peritumoral VOI was manually segmented around the tumor with a radius of 2.5 to 5 mm (Braman et al. 2017). Finally, the remaining normal breast tissue was segmented into regions with BPE. Two radiologists (one with 8 years and the other with 10 years of experience in BCa diagnosis) independently performed VOI delineation, and interobserver reproducibility was assessed. These two radiologists were blinded to clinical information and histopathological diagnosis. They independently segmented the images recorded prior to NACT in 20 randomly selected samples, and the features extracted from the above two VOIs from each of these 20 patients were compared using the intra-class correlation coefficient (Curigliano et al. 2017). An ICC greater than 0.8 was considered to indicate almost perfect agreement.

Extraction and dimensionality reduction of radiomics features

The VOIs of the tumor, peritumor, and BPE regions were subjected to feature extraction using Pyradiomics version 2.1.2 (Griethuysen et al. 2017). Depending on the registration of the sequences, the T1WI, T1 + C images, and silhouette images can have the same VOIs. Six categories of radiomics features were extracted (first-order, shape, gray-level concurrence matrix [GLCM], gray-level run-length matrix [GLRLM], gray-level size zone matrix [GLSZM], and gray-level co-occurrence matrix [GLDM]) and 1132 features were included in these six categories. There were 3396 radiomics features from the three regions (tumor, peritumor, and BPE) in each sequence, so each patient had 10,188 radiomics features from the three sequences. The Supplementary Materials provide further details of the feature extraction algorithms.

R software version 4.3.1 was used for reduction of feature dimensionality. First, univariate analysis with the univariate rank sum test was used to analyze highly repeatable and significantly correlated features. Then, correlation analysis was performed on the features extracted from the intratumoral, peritumoral, and BPE areas, and highly redundant features (correlation coefficient > 0.6) were deleted. Finally, to prevent overfitting, elastic net logistic regression was used to filter important modeling features with the following specific formula for the cost function:

$$Cost\left( W \right) = \sum\limits_^ - W^ x_ } \right)^ + \lambda \alpha \left\| W \right\|_ + \frac \right)}}} \left\| W \right\|_^$$

where Y is the variable to be predicted, W is the weight to be calculated, λ is the penalty term, X is the input feature matrix, and α is the weight of the two error terms (L1 and L2).

Establishment of an optimal radiomics signature

Three independent radiomics signatures were constructed using multiple logistic regression with tenfold cross-validation and based on the optimal features of the intratumoral, peritumoral, and BPE regions in the training group. Then, the optimal features of these regions were combined in pairs using multiple logistic regression to construct joint radiomics signatures. In addition, the same method was used to fuse the best features of the three regions to construct a mixed joint-signature of all the three regions (Intra-Peri-BPE). The score of each case calculated from these signatures reflects the probability of pCR and was named the “rad-score.” The predictive performance of these radiomics signatures in the training set and the validation set were evaluated using receiver operating characteristic (ROC) curves. Finally, the best signature was selected to construct the prediction model.

Construction and analysis of machine learning models

Based on the radiomics features of the optimal signature, the logistic regression (LR), support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), Bayesian, and extreme gradient boosting (XGBoost) algorithms were used to develop the machine learning models (Fig. 2). Each type of model was based on the training set and used a nested cross-validation procedure that consisted of two nested loops: an outer loop had a repeating stratified random split of the training cohort with 50 repetitions to evaluate classification performance and an inner loop had 5 passes of cross-validation to optimize the hyperparameters of the algorithm. One model was created for each stratified random split, resulting in 50 models. Finally, the model with the highest accuracy in the test group was selected for further use. Then, based on the test group, the diagnostic performance of different machine learning models was verified using ROC curves, and values were compared using the DeLong test. Finally, the machine learning model with the best AUC value was selected. SHapley additive explanation (SHAP) was also used to analyze the relationship between features and outputs in the machine learning models (Rodríguez-Pérez and Bajorath 2020). The Supplementary Materials provide additional details of the procedures used for machine learning and SHAP.

Fig. 2figure 2

Procedures used for acquisition of MR images, feature extraction, feature selection, model establishment, and model validation

Statistical analysis

Statistical analyses were performed using SPSS version 24.0, MedCalc version 11.2, R version 4.3.1, and Python version 3.7.3. The Kolmogorov–Smirnov test was used to test the normality of continuous variables. Variables with normal distributions were expressed as means ± standard deviations (SDs) and were compared using the independent samples t test; variables with non-normal distributions were expressed as means and quartiles and compared using the Mann–Whitney U test. Categorical variables were expressed as numbers and percentages and compared using the Chi-square test. ROC curves, with calculations of area under the curve (Griethuysen et al. 2017), sensitivity, and specificity, were used to evaluate the predictive performance of the different models. All statistics were two sided, and a P value below 0.05 was considered statistically significant.

留言 (0)

沒有登入
gif