Automatic segmentation-based multi-modal radiomics analysis of US and MRI for predicting disease-free survival of breast cancer: a multicenter study

Study design and patients

The patient recruitment and design of this study are presented in Fig. 1. First, data from 620 patients undergoing preoperative US at Sun Yat-sen University Cancer Center (SYUCC) (these patients were previously reported [14]), were collected and divided into training and testing sets to develop and validate the US ASF (uASF), with 26 of them being included in subsequent prognostic analysis; and data from 249 patients undergoing preoperative MRI at SYUCC were collected and divided into training and testing sets to develop and validate the MRI ASF (mASF), with 98 of them being included in subsequent prognostic analysis. Then, data from 643 patients who underwent preoperative US and presented as a single mass on dynamic contrast enhanced (DCE) MRI were collected for prognostic analysis; data from 480 of these patients (mean age, 47.77 years; range, 23–80 years) from SYUCC (between March 2007 and December 2018) were randomly divided into training (n = 311) and internal testing (n = 169) sets; data from the remaining patients (n = 163; mean age, 49.90 years; range, 30–77 years) from the First Affiliated Hospital of Gannan Medical University and Ganzhou People’s Hospital were included as external testing set.

Fig. 1figure 1

The patient recruitment and design of this study

Clinical characteristics and follow-up

The Age; menopausal status; breast cancer risk factors (family history of breast cancer and/or history of breast surgery for begin disease); surgery type and postoperative treatment; pathological tumor size; histological type and grade; stages of T, N, and TNM; status of lymphovascular invasion, associated ductal carcinoma in situ; and status of hormone receptor (HR) (including estrogen receptor [ER] and progesterone receptor [PR]), human epidermal growth factor receptor 2 (HER2), and Ki-67 were collected. In immunohistochemistry, HR was considered as positive when nuclear staining presented in at least 1% of tumor cells; HER2 as negative when scored as 0 and 1+, and positive when scored as 3+, while score 2 + needed further confirmation with fluorescence in situ hybridization; Ki-67 was considered as high and low expressions when proliferation was ≥ 14% and < 14%, respectively. Tumors were categorized into luminal A, luminal B, HER2-enriched, and TNBC [32].

Study end point, DFS, was defined as time from date of surgery to that of locoregional recurrence, distant metastasis, contralateral breast cancer, or death, whichever came first, and patients without events at the last follow-up were censored.

Image acquisition and segmentation

First, the single US greyscale image (JPEG format) containing the largest section of tumor, and the peak phase of DCE-MRI (DICOM format) according to the time-signal intensity curve (TIC) were selected. Second, the US tumor regions of interests (ROIs) based on manual segmentation (denoted as US-MSeg-intra-ROIs) were used to train a U-Net for developing uASF; the MRI breast mask and MRI tumor ROIs based on manual segmentation (denoted as MRI-MSeg-intra-ROIs) were used to train a three-dimensional (3D) ResUNet and WNet (a network we proposed in this study based on the structure of U-Net) for breast segmentation and breast cancer segmentation respectively (supplementary Figure S1), for developing mASF. The dice similarity coefficient (DSC) of the testing set was calculated to assess segmentation performance. Details regarding image acquisition including DCE-MR protocol (supplementary Table S1), preprocessing and segmentation are provided in the supplement.

Afterwards, images from the 643 patients were inputted into ASFs to generate automatic segmentation-based tumor ROIs (denoted as ASeg-intra-ROIs), which were then manually checked and adjusted by reader 1 (LX), with these adjustments further validated by reader 2 (XHJ). Both readers collaborated to reach a consensus, thereby ensuring standardization and consistency across different cases. The criteria used for manually checking and adjusting the ASeg-intra-ROIs were as follows: (1) if it only predicted breast cancer region, nothing would be done for MRI-ASeg-intra-ROI. But under the same condition, the missed part of breast cancer in the US-ASeg-intra-ROI would be manually filled, which is due to the fact that it is relatively easy to fill in the missed part for the single US image; (2) if it predicted breast cancer and non-breast cancer regions, the latter would be manually deleted; (3) and if it did not predict breast cancer, it would be replaced by MSeg-intra-ROI. Then, dilated segmentation was performed by morphological expansion of 1–7 pixel for US and 1–7 mm for MRI based on the adjusted ASeg-intra-ROI by using python, and the ASeg-intra-ROI was then subtracted to generate automatic segmentation-based peritumoral ROI (denoted as ASeg-peri-ROI). Altogether, sixteen ROIs were defined for each patient, including one US-ASeg-intra-, seven US-ASeg-peri-, one MRI-ASeg-intra-, and seven MRI-ASeg-peri-ROIs.

Radiomics signature construction

PyRadiomics package of Python, which conforms to the Image Biomarker Standardization Initiative (IBSI) guidelines for radiomic analysis [33], was used to extract radiomics features. Spearman correlation coefficients and Ward linkage method, along with the least absolute shrinkage and selection operator (LASSO) Cox regression model, were performed to select features to construct intra- and peri-radiomics signatures. And corresponding Rad-score was calculated for each patient via a linear combination of the selected features that were weighted by their respective LASSO coefficients. Then, the intra- and peri-radiomics signatures were combined to construct the gross-radiomics signature by using cox regression model. Details regarding radiomics signature construction are provided in the supplement.

Radiomics signature validation

The potential association of radiomics signature with DFS was assessed and validated in the training, internal, and external testing sets. Using the optimal cutoff of Rad-score identified by X-tile [34], patients with Rad-score < cutoff and Rad-score ≥ cutoff were divided into low- and high-risk groups, respectively. Kaplan–Meier survival analysis was subsequently performed to evaluate the survival rates and generate the corresponding survival curves for patients in these two groups. Following this, the log-rank test was applied to determine whether the differences in survival curves between the low- and high-risk groups were statistically significant. Furthermore, Harrell’s concordance index (C-index) [35] and time-dependent receiver operating characteristic (ROC) curves were used to evaluate performance of the radiomics signature in predicting DFS. Finally, the DeLong test was utilized to statistically compare the differences between area under the receiver operating characteristic curves (AUC), thereby assessing the discriminatory power of these radiomics signatures.

MRI features

Using the 2013 Breast Imaging Reporting and Data System MRI lexicon, two radiologists (LX and XHJ, with 8 and more than 10 years of experience in breast MRI interpretation, respectively), independently reviewed MRI features and reached a joint decision by consensus. These features evaluated in this study included MR tumor size (the largest diameter of tumor); shape (oval, round, or irregular); margin (circumscribed, irregular, or spiculated); internal enhancement (homogeneous, heterogeneous, or rim enhancement); and TIC (persistent, plateau, or washout types), which was calculated by drawing an ROI on the fastest-enhancing area of tumor. Three months after the first review, 60 patients were randomly selected and reviewed again to evaluate intra-observer agreement of categorical and continuous features by using the Kappa test and intraclass correlation coefficient (ICC).

Model construction and evaluation

In order to investigate the value of multi-modal radiomics signature and MRI features for DFS prediction, we constructed six predictive models by using cox regression model, including a clinical model, a traditional MRI model, a clinical traditional MRI model, a multi-modal radiomics signature, a multi-modal radiomics model, and a multi-modal clinical imaging model.

We first used the univariate Cox regression model to analyze the relationship of clinical and MRI features with DFS. Then, we used multivariate Cox proportional hazards model to select the best combination of clinical predictors by including variables in a step-wise manner based on the Bayesian information criterion (BIC), to build the clinical model. And the same method was used to select the best combination of MRI predictors to build the traditional MRI model. Subsequently, we combined the selected clinical and MRI predictors to build the clinical traditional MRI model. We also combined the US-intra-, US-peri-, MRI-intra-, and MRI-peri-radiomics signatures to construct the multi-modal radiomics signature, which was then incorporated into the clinical model to build the multi-modal radiomics model. Finally, we combined the multi-modal Rad-score and the clinical traditional MRI model-score to construct the multi-modal clinical imaging model. The multi-modal Rad-score and the clinical traditional MRI model-score were calculated by the score formulation (score = ∑ feature values × coefficient of feature) using the features selected for constructing the multi-modal radiomics signature and the clinical traditional MRI model, respectively.

These developed models were comprehensively evaluated and compared, focusing on the following four terms: (1) the C-index, which measures the agreement between the model predicted DFS and the actual DFS observed in all patients, was calculated to assess the discrimination power of models; (2) calibration curves were plotted to evaluate the consistency between predicted survival probabilities and actual survival probabilities at different time points for models; (3) decision curve analysis (DCA) was conducted to evaluate the clinical usefulness of models by calculating net benefits across different threshold probabilities [36]; and (4) the BIC values were calculated to evaluate the goodness-of-fit of models.

Statistical analysis

Python 3.7.11 and R 4.0.3 were used for statistical analyses. A bilateral P < 0.05 was considered as significant. Categorical and continuous variables were compared using the chi-squared test and Kruskal–Wallis H-test, respectively. The python package “lifelines” was used to perform the Kaplan–Meier survival analysis, log-rank test, and Cox regression. The R packages “glment” and “survival” were used to perform the LASSO-Cox regression analysis. The R packages “rms” and “rmda” were used to draw curves of calibration and DCA, respectively.

留言 (0)

沒有登入
gif