Radiomics as a tool for prognostic prediction in transarterial chemoembolization for hepatocellular carcinoma: a systematic review and meta-analysis

Literature retrieving, selection and data extraction

The literature retrieval process identified a total of 645 relevant articles, and 203 duplicated articles were subsequently removed. From the remaining 442 studies, scrutiny of titles and abstracts led to the exclusion of 396 articles that did not align with the language, study type, or PICO criteria specified for our review. After a thorough review of full texts, an additional 17 articles were excluded, resulting in the final inclusion of 29 articles (Fig. 1).

Fig. 1figure 1

PRISMA flowchart illustrating the literature selection process in this study.

Summarized information for all 29 included studies is presented in Table 1, covering a total of 5483 patients. Details of the studies are presented in Supplementary Table 4, with the primary research features illustrated in Supplementary Fig. 1. The majority of the articles originated from China (24 articles, 82.8%). The included studies date back to 2016, with a progressive increase in the number of publications, peaking at 19 studies in 2021–2022. All included studies adopted a retrospective design. Most studies utilized the BCLC staging system to determine the inclusion of patients, while others used the China liver cancer staging (CNLC) system [32] or other criteria [21, 31, 40, 43, 48]. Mid-stage patients were the most extensively studied population, with 21 articles (in whole or in part) including BCLC-B stage patients [20, 28,29,30, 33, 37, 41, 28,29,30, 49, 53], followed by BCLC-A (13 articles) and BCLC-C stage (9 articles) patients.

Table 1 Baseline characteristics of the included studies

In terms of imaging modalities, all studies utilized contrast-enhanced examinations. The most frequently investigated imaging modality was contrast-enhanced CT [20, 36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53], followed by contrast-enhanced MR [21, 28,29,30,31,32,33,34,35], with only one article exploring contrast-enhanced ultrasound [54]. Various phases or sequences of contrast-enhanced images were studied, with 18 articles incorporating single-phase/sequence images, while the remaining literature analyzed multi-phase/sequence images. The most commonly utilized phases for radiomics analysis were the arterial phase (AP) [21, 39, 42, 44, 46, 49] and the portal venous phase (PVP) [31, 34, 36, 38, 48, 50]. Regarding MR-based radiomics, T2WI was the most frequently analyzed sequence [29, 32, 35]. Several studies compared radiomics models constructed from images of different phases/sequences, but the conclusions were inconsistent. Some indicated that multi-phase/sequence models were superior [20, 30, 42] in performance, while others demonstrated that single-phase/sequence models were comparable to, or even better than, the multi-phase/sequence models [35, 48]. In addition, comparative studies between different single-phase models suggested that the prognostic value of PVP-based models might be superior to AP-based models [31, 34, 48].

The majority of studies employed 3D volumes of interest (VOIs) for analysis, with only two articles utilizing 2D regions of interest (ROIs) based on the maximum tumor section [46, 47]. Most studies only delineated the tumor regions, while a few explored radiomics features in the peritumor areas [36, 37, 45]. Song et al. suggested that the prognostic value of radiomics models based on tumor regions plus peritumor extensions was not as good as the models considering solely the tumor regions [34]. 28 articles reported the algorithms used for feature selection and model construction, with the majority employing machine learning (ML) algorithms. Five articles adopted deep learning (DL) algorithms in the modeling process [32, 33, 39, 46, 54]. A total of 44 independent datasets (training sets or validation sets) reported both sample sizes of the cohorts and the number of features in the predictive models. The median sample-ize-to-feature-number ratio was 9.30 (p 25–p 75: 5.16–12.43), with only 19 datasets having a value over 10.

Quality assessment

A quality assessment was conducted for the included studies. When assessing the risk of bias using QUADAS-2, prevalent biases were identified, originating from unclear case selection procedures (whether all patients were included consecutively), the absence of specified assessment criteria for diagnostic models (cut-off values), and the lack of validation of established models in independent datasets (Fig. 2a, Supplementary Fig. 2A). The latter two factors were also the primary sources of concerns about the studies' applicability.

Fig. 2figure 2

Quality assessment of included articles. A The bias risk assessment of included studies utilizing the QUADAS-2 scale. B RQS and METRICS scores of included articles. C Relationship between METRICS and RQS for each included study, illustrated by scatter plot

The summary of study quality according to RQS and METRICS is depicted in Fig. 2b, c, with detailed results presented in Supplementary Fig. 2B and Supplementary Table 5. The overall RQS across all 29 included studies averaged 12.90 ± 5.13 (35.82% ± 14.25%). METRICS scores showed positive correlation with RQS scores (Fig. 2c), averaged 62.98% ± 14.58%. Generally, most studies demonstrated satisfactory quality in terms of reporting imaging strategies, multiple segmentations, integrating non-radiomics clinical variables, assessing the model’s discriminative powers and calibration, model validation, and disclosing radiomics features within the model. Using the METRICS assessment, there were 2, 14, 10 and 3 studies meeting the criteria of “excellent”, “good”, “moderate” and “low”, respectively.

Study data synthesis (Meta-analysis)Pooled predictive performance of radiomics for TACE response

A total of 23 independent datasets from 14 studies provided sufficient information to extract a complete 2 × 2 contingency table and thus entered the meta-analysis. The included datasets comprised 12 training datasets (with or without resampling validation), including 1628 patients, and 11 independent validation datasets (internal or external validation), comprising 815 patients. In all of these datasets, the predictive endpoint of radiomics models was the tumor response after TACE, defined by objective response (OR) according to the RECIST [14] or mRECIST [15] criteria in the majority of the studies. All datasets included in the meta-analysis and their key methodological information are outlined in Table 2. For the total of 23 datasets, a synthetic analysis of predictive performance was conducted on 2443 subjects (Fig. 3a–e, Supplementary Table 6). The pooled sensitivity was 0.83 (95% CI: 0.78–0.87) (Fig. 3a), specificity was 0.86 (95% CI: 0.79–0.92) (Fig. 3b), and the pooled PLR and NLR were 6.13 (95% CI: 3.79–9.90) (Fig. 3c) and 0.20 (95% CI: 0.15–0.27) (Fig. 3d), respectively. The AUC of the sROC was 0.90 (95% CI: 0.87–0.93) (Fig. 3e). Heterogeneity tests revealed I2 exceeding 70% for sensitivity, specificity, PLR, and NLR, with Q-test P-values below 0.01, indicating significant heterogeneity (Supplementary Table 6). Further Galbraith plot suggested a limited and symmetrical impact of outliers on the result of the meta-analysis (Fig. 4a).

Table 2 Major results and methodological features of the studies in meta-analysisFig. 3figure 3

Summarized and pooled performance of radiomics models for predicting TACE response. A Pooled sensitivity. B Pooled specificity. C Pooled positive likelihood ratio. D Pooled negative likelihood ratio. E The summary receiver operating characteristic curve

Fig. 4figure 4

Heterogeneity among included studies. A Impact of outliers on the meta-analysis, illustrated by Galbraith plot. B No significant publication bias was indicated by Deek’s funnel plot and asymmetry test. CD Impact of methodological factors on (C) pooled sensitivity and (D) specificity of radiomics models, according to meta-regression

Origin of heterogeneity and subgroup analysis

Meta-regression analysis was performed based on the key methodological parameters, which incorporated six variables: imaging modalities (CT/MR/US), training or validation datasets, modeling algorithms (ML/DL), imaging phases (single-phase/multi-phase), single imaging phase (AP/PVP), and the inclusion of peritumoral features (Yes/No). In relation to pooled sensitivity, noteworthy inter-subgroup variances were identified between training sets and validation sets, MR studies and other studies, ML models and DL models, single-phase models and multi-phase models, as well as AP models and PVP models (Fig. 4c, Supplementary Table 7). Concerning specificity, significant disparities were discerned between MR studies and other studies, and between ML models and DL models (Fig. 4d, Supplementary Table 7).

Subsequent subgroup analyses were conducted for variables deemed significant in the meta-regression. Due to the limited studies reporting single-phase AP models and PVP models, subgroup analyses were executed based on the remaining factors, and pooled sensitivity, specificity, and heterogeneity within each subgroup was calculated (Table 3). Inter-study heterogeneity was significantly lower among the independent validation datasets compared to the training datasets, while comparable model performance was reported (I2: 51.32% vs. 82.11%, Supplementary Fig. 3). In the 11 validation datasets, further subgrouping was based on imaging modalities, imaging phases and modeling algorithms. Inter-study homogeneity with I2 < 50% was observed in the MR subgroup, ML subgroup, and the multi-phase model subgroup (Supplementary Fig. 4–6). Furthermore, the five studies combining CT and ML also showed satisfactory homogeneity (I2: 45.07, Supplementary Fig. 7).

Table 3 The pooled results from subgroup analysisEvaluation of publication bias and clinical interpretation

The funnel plot disclosed no significant evidence of publication bias among the studies included in the analysis (P = 0.64, see Fig. 

留言 (0)

沒有登入
gif