A radiomics-based interpretable machine learning model to predict the HER2 status in bladder cancer: a multicenter study

Patients

The patients who underwent radical cystectomy or partial cystectomy with pathologically confirmed BCa were retrospectively recruited from four independent hospitals. The inclusion criteria were: (1) patients with pathologically confirmed BCa; (2) patients who underwent radical cystectomy or partial cystectomy; (3) contrast-CT scans available within 2 weeks before surgery. The exclusion criteria were (1) received neoadjuvant therapy; (2) pathologically confirmed non-urothelial carcinoma; (3) incomplete CT data or poor image quality. Finally, a total of 154 patients were enrolled from the primary center between June 2015 and June 2023 as the training set for model construction. In the test set, we retrospectively enrolled 53 patients with BCa between June 2019 and December 2023 from the other three centers. The flowchart of the patient recruitment process is shown in Fig. 1. This study was approved by the institutional review board of our institutions, and the requirement for informed consent was waived. This study has been reported in line with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement and in accordance with the Declaration of Helsinki [25].

Fig. 1figure 1

The flowchart of the patient recruitment process

The following clinical and pathological data were collected from the electronic medical records: age, gender, pathological T-stage, pathological N-stage, and pathological grade. Pathologic tumor staging was based on the eighth edition of the American Joint Committee on Cancer Staging System [26].

Assessment of HER2 status

HER2 status was assessed by IHC assay performed on formalin-fixed, paraffin-embedded surgical specimens. The samples were scored based on the 2018 American Society of Clinical Oncology HER2 testing guideline by two pathologists who were blinded to the clinical data [27]. IHC staining results were scored as follows: 0, no staining or less than 10% of tumor cells stained; 1, more than 10% of cells exhibited weak and partial membrane staining; 2, more than 10% of cells showed weak to moderate, complete membrane staining; 3, more than 10% of tumor cells displayed strong, complete membrane staining. HER2-p was defined by IHC scores of 2+ or 3+, while HER2-n was defined by scores of 0 or 1+.

Image collection and ROI segmentation

All patients underwent enhanced CT scan within 2 weeks before surgery. The details of the CT scanners and scanning parameters for the different institutions are shown in Table S1. The CT scan images were downloaded from Picture Archiving and Communication Systems (PACS) and saved in the original Digital Imaging and Communications in Medicine (DICOM) format. In this study, the nephrographic phase (NP) CT image, the most common imaging for tumor identification in bladder cancer, was chosen for subsequent analysis.

A urological radiologist with 5 years of experience (Reader A), who was blinded to pathological clinical information, manually segmented the region of interest (ROI) of the tumors via ITK-SNAP software (version 3.6.0, http://www.itksnap.org). For multiple tumors, the lesion with the largest diameter was chosen for ROI segmentation and subsequent feature extraction. Thirty images were randomly selected for ROI segmentation by Reader A after 2 weeks of the first segmentation and Reader B (with more than 10 years of experience in the diagnosis of genitourinary diseases), respectively, in order to evaluate inter- and intra-observer reproducibility of the radiomics feature extraction. Radiomics features with inter- and intra-class correlation coefficient (ICC) > 0.75 were considered to exhibit strong reliability and were consequently retained for model construction.

Radiomics feature extraction and selection

Radiomics features were extracted from the 3D ROIs of each patient’s CT images using the open-source pyradiomics package (version 2.2.0) in Python. The details of the radiomics feature extraction are shown in Supplementary Material: Appendix E1. All radiomics features were standardized separately using z-scores normalization. Several feature selection methods were used to further reduce overfitting and improve the robustness of the model. Feature selection was performed in the training set according to the following steps: (1) the stable features with ICC > 0.75 were selected for further analysis; (2) using Spearman correlation analysis to filter out redundant features. (3) Features with significant differences in the HER2-p and HER2-n groups were selected using independent samples t-test; (4) the least absolute shrinkage and selection operator (LASSO) regression algorithm with fivefold cross-validation was used to further eliminate irrelevant features.

Model construction

Five commonly used ML algorithms were used to construct predictive models in the training set, including logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost), and k-nearest neighbors (KNN). The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) value were used to evaluate the discrimination performance of established ML models. Delong’s test was used to compare the AUC between models. The cutoff value identified by Youden index was used to calculate the accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Model construction and performance evaluation were performed in Python 3.8.0. The overall workflow of this study is summarized in Fig. 2.

Fig. 2figure 2

The overall workflow of this study. CT, computed tomography; ICC, inter- and intra-class correlation coefficient; ROI, region of interest; ROC, receiver operating characteristic; SHAP, Shapley additive explanation; LASSO, the least absolute shrinkage and selection operator

Model interpretation

The SHAP, as a model interpretation method based on game theory, can provide insights into the influence of each feature on model predictions by calculating the contribution of each feature [28]. It can obtain a global interpretation and a local interpretation for each sample. In this study, we used the SHAP method to interpret the constructed ML model, addressing the “black box” challenge. All analyses were conducted using the SHAP package (version 2.0.0) in Python. SHAP feature importance plots and summary plots were generated. A few representative cases were selected to create SHAP force plots to better understand the model’s predictions.

Statistical analysis

All statistical analyses were performed using the R package (version 4.1.2; https://www.r-project.org) and SPSS statistical software. Continuous variables were presented as the means and standard deviation (SD), and categorical variables were presented as frequencies and percentages; 95% confidence intervals (CI) were calculated using the bootstrapping method. Clinical characteristics between training and test sets were compared by using the chi-square test (or Fisher exact test) and t-tests (or Mann–Whitney U test), respectively. All analyses were considered statistically significant with a two-sided p-value < 0.05.

留言 (0)

沒有登入
gif