Multi-algorithms analysis for pre-treatment prediction of response to transarterial chemoembolization in hepatocellular carcinoma on multiphase MRI

Study sample

The study has been approved by the Institutional Review Board. Due to the retrospective nature of this study, informed consent was not required. We retrospectively identified all consecutive patients who underwent TACE for HCC from April 2016 to June 2021 in one center. Our inclusion criteria included patients with HCC who underwent initial TACE and had contrast-enhanced MR(CE-MR) before and after TACE, and with complete clinical information (i.e. demographics, preoperative hepatitis, serum alpha-fetoprotein (AFP) levels, and liver function tests). Exclusion criteria included underage patients; synchronous therapies during follow-up time, such as resection, and systemic chemotherapy; other concurrent malignancies and follow-up for less than 3 months post-procedure. HCC was diagnosed histologically or by MR image evaluation. In total, 144 treatment-naïve HCC patients (Median follow-up time, 13.8 weeks) met the inclusion criteria. To further validate the generalization capability of the founded models, we collected 28 HCC patients treated with TACE between August 2021 to October 2022 as an independent external validation set. The inclusion and exclusion criteria of these patients were consonant with the preceding dataset.

TACE procedure and reference standard of TACE response

All patients included were treated with TACE, including conventional TACE (cTACE) and drug-eluting bead TACE (DEB-TACE). Interventional physicians choose cTACE or DEB-TACE based on tumor burden and patient characteristics. The basic treatment process of DEB-TACE resembles that of cTACE except for the embolic agents. cTACE uses lipiodol (Guerbet), gelatin sponge particles, and polyvinyl alcohol as embolic agents. Selective or super-selective embolization of tumor-supplying vessels is performed whenever technically justified [23]. For DEB-TACE, 100–300 μm diameter CalliSpheres® Beads (CB; Jiangsu Hengrui Pharmaceutical Co., Ltd.) were used as carriers, loaded with 60–80 mg epirubicin, pirarubicin, or doxorubicin. All procedures were administered by interventional physicians with at least 10 years of experience. All patients were admitted for postoperative supportive care after TACE procedure and were managed routinely.

Study cohort judgment of TACE response was performed according to the modified Response Evaluation Criteria in Solid Tumors (mRECIST) [24] criterion. In brief, the therapeutic response of TACE was stratified into four grades: (a) complete response (CR): complete disappearance of the lesion; (b) partial response (PR): a minimum 30% reduction in the sum of diameters of viable target lesions (enhancement in the arterial phase); (c) progressive disease (PD): at least 20% extension in the sum of the diameters of viable (enhancing) target lesions; and (d) stable disease (SD): neither PR nor PD. Based on mRECIST, CR and PR patients were categorized as objective response (OR) cohort, and PD and SD patients as non-objective response (NOR) group. This assessment was determined by two professional abdominal radiologists based upon the follow-up MR images. Among the 144 patients enrolled, 75 were assigned to the NOR group and 69 to the OR group. In the independent external validation set, 14 patients were in the NOR group and 14 in the OR group.

MRI image acquisition

Before and after TACE, all recruited patients underwent Gadolinium injection meglumine-enhanced MR imaging using 1.5-T and 3.0-T MR scanners. For the Philips ENGENIA 3.0-T MR scanner (Philips Medical Systems), imaging sequences included axial T2-weighted sequence with spectral presaturation with inversion recovery, breath-hold precontrast and post-contrast (after injection 0.1 mmol/kg of Gadopentetate dimeglumine (Gd-DTPA)) mDIXON-T1-weighted (water) sequence and breath-hold diffusion-weighted echo-planar sequence. The main image acquisition parameters were as follows: T2-weighted sequence, repetition time (TR) 3000 ms, echo time (TE) 200 ms, matrix: 200 × 195, thickness 7 mm, gap 1 mm; T1-weighted with breath-hold, TR 3.6 ms, TE1/TE2: 2.38/4.76 ms, matrix: 224 × 166, thickness 5 mm, gap  2.5 mm, field of view (FOV): 400 mm × 314 mm, and 4 dynamic phases were scanned, which were the hepatic arterial phase (AP) (25–30 s), portal venous phase (PVP) (60–70 s), delayed phase (DP) (180 s), and hepatobiliary phase (HBP) (20 min); diffusion-weighted echo-planar sequence, TR 2500 ms, TE 64 ms, thickness 7 mm, gap 1 mm, FOV: 400 × 343 mm, matrix: 116 × 97, b value 0, and 800 s/mm2.

For the German MAGNETOM Area 1.5 T MR scanner, the MRI scan sequences included: T2-weighted sequence: TR 3500 ms, TE 90 ms, FOV 380 mm × 380 mm, matrix 320 × 320; CE-MR scans were performed with three-dimensional volume interpolation (3D-VIBE): TR 4.1 ms, TE 1.8 ms, FOV: 380 mm × 380 mm, matrix: 320 × 320, thickness 5 mm, gap1 mm. After injecting contrast agent Gd-DTPA (dose 0.1 mmol/kg, flow rate 2 ml/s), the images of AP, PVP, and DP were collected at 25 s, 60 s, and 180 s, respectively.

Image segmentation and radiomic features

The flowchart of the study is depicted in Fig. 1. The volumes of interest (VOIs) of tumors were delineated manually using 3D Slicer version 4.10 (www.slicer.org) by reader 1 (radiologist with 3 years of abdominal imaging experience) and reader 2 (radiologist with 10 years of abdominal neoplasms). The VOIs were drawn on T2-weighted images and 3 dynamic enhanced phase images (namely AP, PVP, and DP). The radiologists involved in the segmentation were unaware of all clinical and prognostic information. To standardize the voxel spacing and control image noise, all images were resampled to a 1 × 1 × 1 mm3 voxels with a fixed bin width of 25. Radiomics features were extracted automatically for the T2-weighted images and 3 enhanced phase images by using the PyRadiomics toolkit [25]. For each sequence, 110 radiomic features were extracted automatically. Hence, a total of 440 quantitative features were extracted in this procedure.

Fig. 1figure 1

Flowchart of the study procedure. Abbreviation: KNN, k-nearest neighbor; SVM, support vector machine; Lasso, the least absolute shrinkage and selection operator; DNN, deep neural network

To assess the variability of extracted features, 25% of all the involved cases were randomly picked and were again delineated independently by reader 1 (test–retest variability) and reader 2 (interobserver variability). The second lesion segmentations were conducted 2 months after the first segmentations. The intraclass correlation coefficient (ICC) was used to elaborate test–retest and interobserver repeatability, an ICC greater than 0.75 indicated good reproducibility.

Four forecasting models

This experiment compared the forecasting capability in four models, including machine learning classifiers KNN, SVM, Lasso, and deep learning classifier DNN. The schematic diagram of each algorithm is shown in Figure 1. All previously mentioned radiomics features were standardized using Z-score before model training. To reduce redundant features and prevent reduce bias or over-fitting, the minimum redundancy maximum correlation (mRMR) method was used for dimensionality reduction in KNN and SVM models. Finally, 10 features were retained for constructing the models. Since Lasso and DNN can reduce the dimension of features in an automatic and non-prioritized manner during model training, no additional feature selection methodology was needed.

The first prediction model applied in this study was the KNN algorithm, an instance-based learning method that uses the k-nearest to categorize unknown data of the new sample [26]. In the experiment, the number of neighbors of KNN is 4. The second predictive model used in this study was SVM, which is a supervised algorithm that separates the feature space into hyperplanes based on the object classes [27]. SVM also uses a kernel function to distinguish nonlinearly separable classes. The kernel function of SVM is Radial Basis Function, and the gamma is 0.2. Hence, the SVM algorithm supports both linear and nonlinear classification.

The third forecasting model used in this study was Lasso [28], which can achieve both data dimensionality reduction and feature selection. Based on the linear equations of the respective coefficients of the selected features, Lasso model was established and the Lasso score associated with each patient was obtained. The fourth forecasting model was DNN [29], which is an artificial neural network with multiple layers between the input features and output predictions. Each linear layer in DNN model is connected by nonlinear activation functions to learn complex nonlinear relationships. In this research, we utilized the neural network with BatchNorm and Dropout modules for better performance. BatchNorm [30] is a mini-batch normalization function that can prevent network over-fitting and accelerated training. Dropout [31] is a regularizing tool that randomly drops neurons from the neural network during training. The number of network layers is three and the number of nodes is 440-220-2 per layer. Each layer of the network is connected by a Rectified Linear Unit (ReLU) activation function, and the dropout rate is 0.5. The final activation of the output uses a softmax function to produce scores between 0 and 1. In the DNN experiment, the cosine annealing learning rate is used, and the learning rate is set to 0.01. All the trainable parameters are optimized by Adam algorithm, batch size is 32, and the network is trained for 200 epochs.

Construction and validation of comprehensive models

For the clinical factors, univariate and multivariate logistic regression analyses were applied to determine the independent predictors of TACE response in the training set. Multimodal features including forementioned classifier outputs (corresponding output values) and clinicopathological variables were incorporated into comprehensive model using the multivariate logistic regression analysis.

The discriminative ability of the predictive model was tested by ROC curve based on the AUC, sensitivity, and specificity. Calibration curves were drawn to compare the probability of TACE response between the predicted and actual rates. Comparisons of the AUCs of the ROC curves were performed using the Delong test. To determine the clinical value of the model, decision curve analysis (DCA) was performed to reckon the net benefits under different threshold probabilities.

Statistical analyses

Statistical analyses were performed using SPSS v25.0, R v4.0.4. and Python v3.7.6. The Python packages used for KNN, SVM, and Lasso modeling were sklearn.neighbors.KNeighborsClassifier, sklearn.svm.SVC, sklearn.linear_model.Lasso, respectively (sklearn machine learning library version is 1.0 [32]). The deep learning DNN modeling was conducted on the pytorch platform (version 1.10.0). The 144 involved patients were randomly divided into training set and test set with a ratio of 8:2. The differences in patient characteristics data between the OR and NOR groups were assessed for both training and test sets. To identify significant (p < 0.05) predictors for TACE response, continuous variables were analyzed using T test or Mann–Whitney U-test according to the results of Kolmogorov–Smirnov test; categorical variables were analyzed using Chi-square test or Fisher exact analysis. All statistical tests were two-sided; a p value ≤ 0.05 was considered statistically significant.

留言 (0)

沒有登入
gif