Tomography, Vol. 8, Pages 2784-2795: Deep Learning Prediction of Pathologic Complete Response in Breast Cancer Using MRI and Other Clinical Data: A Systematic Review

A typical workflow of the DL algorithms is shown in Figure 2A. The workflow starts with data inputs, followed by data curation, and then training, validation and evaluation of DL methods. Evaluation metrics could include, but not limited to, accuracy, sensitivity, and specificity as well as clinical outcomes. A typical model of how imaging and non-imaging data are incorporated into the CNN workflow is shown in Figure 2B. DCE MRI or a single post-contrast MRI is often used. Some time, multiple treatment time points are included. Images are first put into the CNN. Multiple treatment time point data, if available, are entered as parallel channels on the CNN. In a separate channel, non-imaging data (such as molecular subtypes and demographics) are entered. The multiple channel networks are then concatenated into a fully connected layer. Note that this is only one of the many models of how different data are incorporated into the CNN to predict pCR. 3.1. CNN Prediction of pCRTable 1 summarizes the papers on deep learning prediction of pathological complete response in breast cancer using MRI. Braman et al. utilized a multiphasic CNN to predict pCR from 2D DCE MR images acquired pre-NAC. The study focused on patients with HER2+ breast cancer (N = 157) receiving HER2 targeted NAC [16]. They found that prediction models using a combination of pre-contrast and third post contrast MR images showed the highest predictive performance, with an AUC and accuracy of 0.93 and 95%. This study demonstrated the feasibility of DL-based prediction of response including data from multiple sites not included in training. Subtype specific analysis of therapeutic outcome of specific targeted therapies has the potential to precisely guide treatment.Comes et al. [17] reported a transfer learning approach to predict pCR by exploiting, separately or in combination, pre-treatment and early treatment exams. First, low-level features were automatically extracted by a pre-trained CNN overcoming manual feature extraction. Next, an optimal set of most stable features was detected and then used to design an SVM classifier. By combining the optimal features extracted from both pre-treatment and early treatment exams with some clinical features, an accuracy of 92.3%, and an AUC value of 0.90, were returned the independent test, respectively. They concluded that the low-level CNN features have an important role in the early evaluation of the NAC efficacy by predicting pCR.Duanmu et al. (2020) studied 3D T1-weighted post-contrast whole images and included molecular and demographic data in their analysis [18]. Their CNN model differs from conventional CNNs in that MRI data and non-imaging data are convolved to inform each other through interactions, instead of a concatenation of multiple data type channels. This is achieved by channel-wise multiplication of the intermediate results of imaging and non-imaging data. Using a subset of curated data from the I-SPY-1 TRIAL of 112 patients with stage 2 or 3 breast cancer with breast tumors underwent NAC, they found an accuracy of 0.83, AUC of 0.80, sensitivity of 0.68 and specificity of 0.88. This model significantly outperforms models using imaging data only or traditional concatenation models. Heatmaps of where the algorithms weighted as importance were provided.Duanmu et al. (2022) used CNN to evaluate 3D DCE whole images at multiple treatment timepoints and incorporated molecular subtype and demographic data [19]. They predicted PCR as well as residual cancer burden (RCB), and progression-free survival (PFS) in breast cancer patients treated with NAC using longitudinal (multiple treatment time points), multiparametric MRI, demographics, and molecular subtypes as inputs. The data came from I-SPY-1 TRIAL (155 patients with stage 2 or 3 breast cancer with breast tumors underwent NAC). The inputs were DCE MRI, and T2-weighted MRI as 3D whole-images without the tumor segmentation, as well as molecular subtypes and demographics. Three (“Integrated”, “Stack” and “Concatenation”) CNN were evaluated using receiver-operating characteristics and mean absolute errors. The Integrated approach outperformed the “Stack” or “Concatenation” CNN. Inclusion of both MRI and non-MRI data outperformed either alone. The combined pre- and post-neoadjuvant chemotherapy data outperformed either alone. Using the best model and data combination, PCR prediction yielded an accuracy of 0.81 ± 0.03 and AUC of 0.83 ± 0.03; RCB prediction yielded an accuracy of 0.80 ± 0.02 and Cohen’s of 0.73 ± 0.03; PFS prediction yielded a mean absolute error of 24.6 ± 0.7 months (survival ranged from 6.6 to 127.5 months). Deep learning using longitudinal multiparametric MRI, demographics, and molecular subtypes accurately predicts PCR, RCB and PFS in breast cancer patients.El Adoui et al. applied a 3D CNN to predict pCR from DCE-MRI (N = 42) using two treatment time points [20]. Using a two-branch CNN model to take inputs from MRI pre- and post-chemotherapy, they found an accuracy and ROC AUC of 92.72% and 0.96, respectively in one study and similarly 91.03% and 0.92 in another study. They reported that data augmentation greatly improved prediction performance.Ha et al. similarly looked at pre-treatment imaging to predict response to NAC [21]. They applied a CNN implemented using the Keras toolbox with tensor flow backend in Python. The CNN architecture followed the general structure of the VGG 16 network. The first post-contrast dynamic T1W images were used prior to NAC for their analysis, producing an 88% overall mean accuracy for 3 class prediction (complete response versus partial response versus progression/no response) of NAC treatment response.Huynh et al. utilized CNN and Linear Discriminant Analysis (LDA) classifier to predict response to NAC in breast cancer patients (N = 64) using 2D DCE MR images [22]. Features which were first extracted using CNN were then used to train the LDA classifier. They found the best ROC AUC was 0.85 for the pre-contrast DCE. A limitation of this study is that slices containing the tumor were manually selected.Joo et al. explored the use of deep learning with 3D-CNN when applied to pretreatment MRI (T1W subtraction and T2W images) versus clinical data as well as a multimodal fusion approach using both clinical and pretreatment MRI data in predicting post NAC pCR [23]. They also compared cropped MR images to whole uncropped 3D bilateral images covering the axilla and chest wall. They had the largest cohort for pCR prediction model with breast MRI, with 536 patients with invasive breast cancer. T1W and T2W images showed poorer AUC alone compared to when combined (AUC of 0.725, 0.663, 0.745), clinical data performed better than combined MRI data (AUC of 0.827 versus 0.745), and whole T1W subtraction images performed better than cropped T1W subtraction images (AUC of 0.745 versus 0.624). Using whole images is less labor intensive, allows for multiple findings to be analyzed, and eliminated need for manual or automated segmentation for tumor area extraction. They conjecture that adding multiple post-contrast T1W timepoints allowing use of kinetic information and use of DWI could further improve performance of the model.Liu et al. used a 12-layer CNN to analyze patients from the I-SPY trial dataset to predict pCR in NAC patients (N = 131) using 2D MR images with first post-contrast DCE [24]. They demonstrated the feasibility of using a CNN algorithm on a multi-institutional MRI dataset, reporting an accuracy, sensitivity, specificity, and ROC AUC of 72.5%, 65.5%, 78.9%, and 0.72, respectively.Massafra et al. reported the use of DL on different MRI protocols (i.e., axial for private database or sagittal for public database) to predict pCR [25]. By merging the features extracted from baseline MRIs with some pre-treatment clinical variables, accuracies of 84.4% and 77.3% and AUC values of 80.3% and 78.0% were achieved on the independent tests related to the public database and the private database, respectively. AUC values with combined clinical and imaging data exceeded those for either clinical or imaging data alone for both public and private databases.Peng et al. compared the performances of DL to radiomics analysis in predicting pCR based on pretreatment DCE-MRI in breast cancer [26]. The AUC of the image-molecular radiomics analysis model was 0.755 (95% CI: 0.708, 0.802). The AUC of the image-kinetic-molecular DL model was 0.83 (95% CI: 0.816, 0.847). They concluded that pretreatment DCE-MRI-based DL model is superior to the radiomics analysis model in predicting pCR. Heatmaps of where the algorithms weighted as importance were provided.Qu et al. applied a 2D CNN to predict pCR from multiple DCE MR images plus molecular subtypes including ER, PR, and HER2 (N = 302) [27]. This model used 12 channels to combine 6 DCE phases from both pre- and post-NAC. They reported an AUC of 0.553 from pre-NAC images and 0.968 from post-NAC images with a combined model of 0.970. A limitation is that pre-NAC and post-NAC MR images were included by concatenation and thus the temporal information might not be optimally utilized.Ravichandran et al. used a voxel-wise RGB CNN to predict pCR in Pre-NAC patients (N = 166) using selected 2D slices (3 adjacent slices with the largest tumor area) from DCE MR images [28]. The corresponding pre-contrast, first post contrast, and second post contrast images of each slice were placed into red, green, and blue color channels, creating a three-channel color image that is then evaluated on a pixel-wise basis. Inclusion of HER2 status was beneficial, improving the AUC from 0.77 to 0.85. Due to the nature of this approach, heatmaps could be generated and the centers of tumors were found to be the most predictive of pCR. A drawback of this study is that tumor segmentations were provided as part of the dataset and the slices were hand selected. The images from pCR patients were also hyper sampled due to data imbalance. Heatmaps of where the algorithms weighted as importance were provided. 3.7. How Could DL Be Employed in Practice?

Instead of relying on DL alone, there is also a potential for hybrid intelligence, which combines the expertise of radiologists with deep learning AI. Computer-assisted diagnosis (CAD) systems with deep learning AI can assist radiologists in finding breast cancers and can also potentially increase the radiologist’s efficiency. In a more active role, DL can potentially serve as a secondary or concurrent reader. The potential to increase accuracy while decreasing interpretation time/increasing radiologists’ productivity is a win-win, as this could help patients and reduce radiologist burn-out. Automated triaging to prioritize scans with findings that require more immediate attention is another potential application of AI. Decision making, such as treatments and patient management, can be guided with DL systems. For example, automated preprocessing, segmentation, detection, and classification of lesions may reduce unnecessary biopsies/surgeries due to the ability of DL to predict the behavior of precancerous lesions. DL systems can be incorporated into decision support systems. The ability to predict patients’ treatment responses early on during NAC and potentially alter treatment strategies to optimize outcome allows for individualized treatment and precision medicine. This ability to individualize treatment early on to improve patient survival would be particularly beneficial in underserved areas with lower socioeconomic populations, who currently suffer from worse breast cancer outcomes. ML may thus help to address some current healthcare disparities. Pooling large quantities of data and combining radiological, histological, and pathological information can provide insights into certain biomarkers in the prediction of individual outcomes.

留言 (0)

沒有登入
gif