Radiomics and machine learning for renal tumor subtype assessment using multiphase computed tomography in a multicenter setting

Multiphase CT is the radiological mainstay of renal tumor assessment and aids in the planning of surgical and thermoablation procedures. While some renal tumors, such as fat-rich AMLs with macroscopic fat, can be confidently classified on CT, the identification of the other renal tumor subtypes remains a radiological challenge. Consequently, up to 20% of renal tumors that were radiologically classified as malignant turn out to be benign on histological assessment after surgical resection, resulting in a surgical overtreatment [7,8,9]. To date, the diagnostic benefit of multiphase CT has so far not been systematically evaluated in the context of AI-aided renal tumor assessment and using CT studies from multiple centers.

This study evaluated a large-scale cohort of renal tumor patients with multiphase CT from multiple imaging centers and provided an independent external dataset for statistical testing. Different CT scanners with varying slice thickness as well as CT studies with imaging artifacts were included to better reflect the diversity of CT studies that are clinically encountered by radiologists. All renal tumor specimens evaluated in this study underwent histopathological assessment to establish a precise reference standard. In this clinical dataset, radiomic feature analyses and ML algorithms were utilized to predict renal tumor subtypes using preoperative CT imaging.

The demographic characteristics of included patients are in line with the recent literature, showing a male predominance and peak incidence of renal tumors between age 60 and 70 [4, 5]. Interestingly, the frequency of specific renal tumor subtypes varied according to the accruing center, highlighting the heterogeneity encountered when assessing different patient cohorts and corroborating the need for external model testing. In general, the frequency of benign renal tumors in our study (14.8–24%) was comparable to the literature, reporting ranges between 20% to 30% [8, 29].

Using an XGB algorithm, we achieved an AUC = 0.84 in the internal validation dataset for the discrimination of different renal tumor subtypes using combined arterial + venous CM-phases. On the independent testing dataset, the algorithm yielded an AUC = 0.75. Using radiomic features from the venous CM-phase only, the XGB demonstrated an AUC = 0.75 in the testing dataset, and arterial CM-phase an AUC = 0.67. These results indicate that radiomic features derived from venous CM-phase reflect the most crucial imaging characteristics of different renal tumor subtypes. It also challenges the diagnostic benefit of an added arterial CM-phase for renal tumor subtyping.

Noticeably, the XGB algorithm performed better in our training cohort than in the independent testing dataset (AUC = 0.84 vs. AUC = 0.75). These discrepancies might be attributed to statistical overfitting that was not fully addressed using multifold cross-validation. Still, center-specific differences in the distribution of renal tumor subtypes and renal tumor diameter, as well as varying acquisition protocols and the high proportion of CT studies acquired from outside imaging centers in the testing cohort could have contributed as well. Overall, the center-specific diagnostic performance of our radiomics-based XGB algorithm shows the challenges that need to be addressed when implementing imaging-based ML models on data from clinical practice and the potential necessity for algorithmic adaptation at each clinical site. Further, as suggested by other authors, the combination of AI approaches with expert radiologist knowledge might further stabilize and improve their diagnostic performance [30].

In both our training and testing datasets, the identification of oncocytomas was most challenging for the XGB algorithm, demonstrating the lowest AUCs irrespective of CM phases. These results might reflect the similarities in radiological appearance between oncocytomas and ccRCC, presenting with a morphologically similar central scar and central necrosis, respectively.

Notably, the diagnostic performance for assessing renal tumor subtypes in this study is inferior to the discrimination reported for benign versus malignant renal tumors (AUC = 0.83) in a previous publication by our research team [31]. Other studies on renal tumor subtype assessment revealed diverging results. Evaluating different renal tumor subtypes, Coy et al used peak lesion attenuation analyses, which yielded pairwise AUCs ranging between 0.96 and 0.79 [19]. Similar to the results of our study, the diagnostic performance reported by Coy et al was worse for the identification of oncocytomas (AUC = 0.79) and fat-poor AMLs (AUC = 0.83). Using a standard logistic regression without cross-validation or external testing, Sasguri and colleagues evaluated to diagnostic performance of CT attenuation values and skewness from biphasic contrast CT to discriminate oncocytomas from other renal tumors, reporting an AUC = 0.8 [32]. A recent meta-analysis by Firouzabadi et al corroborates a high heterogeneity of radiomic feature analyses for the assessment of renal oncocytomas among included studies, reporting a pooled sensitivity and specificity of 0.82 and 0.8 [33]. Given the heterogeneity and diagnostic uncertainty in renal tumor subtype assessment, in particular, regarding oncocytomas, it would be beneficial to establish a diagnostic baseline by visual assessment and renal tumor subtyping by radiologists. Unfortunately, as CT studies and associated histology were known to the involved radiologists, a post-hoc blinded renal tumor assessment was not possible within the scope of the presented study.

The higher diagnostic performance of the aforementioned studies might have resulted from the utilization of standardized renal tumor CT acquisition protocols at one imaging center. On the contrary, the CT studies included in our study were obtained in a multicenter setting, including different CT scanners, variable slice thickness, and studies with imaging artifacts. Especially given the independent testing performed, our results might be more generalizable and realistic in a clinical scenario.

Recent meta-analyses and review articles have summarized the literature on ML for renal tumor assessment [34, 35]. For example, Feng et al evaluated n = 58 patients to distinguish RCCs and fat-poor AMLs, achieving an accuracy of approximately 94% [20]. Kocak et al reported an accuracy of 85% in discriminating ccRCC and non-clear-cell RCC in 68 patients with a lower accuracy for renal tumor subtype assessment (69%) [21]. An AUC of up to 92% for selected renal tumor subtypes was achieved by Yu et al in 119 patients using radiomic features and SVMs, although the authors failed to comprehensively assess all tumor subtypes in one global model [17].

The aforementioned studies yield promising results but might be limited in their generalizability, given their focus on single-center CT studies of high quality and lack of external, independent testing. Still, suboptimal renal tumor CT studies are routinely encountered in radiological clinical practice given that referral patterns might result in urological patients that present with CT studies from external imaging centers.

Our study is not devoid of limitations. First, patients were accrued at two tertiary urological referral centers in Germany, which might limit the generalizability of our findings to a non-Caucasion population. Second, our analyses were restricted to the five most common renal tumor subtypes, thus not reflecting the diversity of renal neoplasms (i.e., cystic renal masses) encountered in clinical routine. This limits the a-priori applicability of the presented methods for individual patients. Third, due to the inclusion of patients with histopathologically assessed renal tumors in this study, patients with fat-rich AMLs that have been correctly identified on CT studies by radiologists were excluded. In a clinical cohort without this patient preselection, the presented methods might therefore yield different results. Finally, renal tumor subtypes have not been evaluated by radiologists in this study, which would have provided a comparative measure for the diagnostic performance of the ML algorithm.

留言 (0)

沒有登入
gif