This single-center retrospective study received approval from our institutional review board, with a waiver for written informed consent. A total of 104 patients, either suspected or confirmed to have primary gallbladder cancer, underwent pre-treatment [18F]-FDG-PET/CT scans between January 2011 and December 2021. Clinical records were subsequently reviewed to select eligible patients for analysis.
Eligibility criteria included: (1) patients with histologically confirmed gallbladder cancer; (2) absence of prior radiotherapy, chemoradiotherapy, or chemotherapy before surgery; and (3) a primary tumor with detectable uptake on PET/CT imaging. Exclusion criteria were: (1) presence of other concurrent malignancies; (2) a small primary tumor making texture analysis difficult (volume of interest [VOI] <64 voxels; Discovery 600M scanner: less than 2.04 ml or Discovery MI scanner: less than 1.20 ml); and (3) incomplete follow-up records.
The flowchart in Fig. 1 illustrates the patient selection process for the study. The training cohorts consisted of 61 patients suspected of or diagnosed with primary gallbladder cancer who underwent a pre-treatment [18F]-FDG-PET/CT scan between January 2011 and January 2018. Among them, 12 without gallbladder cancer and physiological [18F]-FDG uptake and 11 with other conditions (9 cholecystitis, 2 adenomyosis) were excluded. In addition, three patients with non-avid [18F]-FDG lesions, four receiving best supportive care, and four lacking follow-up data were excluded. Finally, 27 patients (12 men and 15 women; mean age: 69 ± 10 years, range: 51–92 years) met the criteria for the training cohorts, and the development of the ML models was performed on this training cohort.
Fig. 1Flowchart of the study patient selection steps
The testing cohort was reserved to perform the external test for estimating the final predictive performances.
The independent external test cohort consisted of 43 patients suspected of or diagnosed with primary gallbladder cancer who underwent a pre-treatment [18F]-FDG-PET/CT scan between February 2018 and December 2021. Among them, nine without gallbladder cancer and physiological [18F]-FDG uptake and four with other conditions (2 cholecystitis, 1 adenomyosis, 1 metastatic gallbladder cancer) were excluded. In addition, three with coexisting malignancies (2 rectal cancers, 1 sarcoma) and two with lacking follow-up data were excluded. Finally, 25 patients (12 men and 13 women; mean age: 72 ± 9 years, range: 56–90 years) were assigned as the independent external test cohort.
Imaging protocolsThe PET/CT scans were conducted using two different whole-body PET/CT systems. From January 2011 to January 2018, the Discovery 600M PET/CT scanner (GE Healthcare, Milwaukee, WI, USA) was utilized, and from February 2018 to December 2021, the Discovery MI scanner (GE Healthcare) was employed. Patients fasted for at least 5 h before the scan (mean plasma glucose level ± SD: 110 ± 22 mg/dL, range: 83–169 mg/dL). Intravenous administration of [18F]-FDG (FDG Scan; Nihon Medi-Physics, Tokyo, Japan) was performed. A PET/CT emission scan was conducted 1-h post-injection of [18F]-FDG (mean ± SD: 222 ± 33 MBq, range: 159–276 MBq), following the CT data acquisition (parameters: 3.75 mm slice thickness, 1.375 mm pitch, 120 keV, and auto mA adjustment between 40 and 100 mA based on body mass). The acquisition time was 2.5 min per bed position, totaling 7–11 positions. Attenuation-corrected data were collected. Images from the Discovery 600M scanner were reconstructed using a 3D ordered subset expectation-maximization algorithm, with an image matrix of 192 × 192, 16 subsets, and two iterations, producing voxel sizes of 3.125 × 3.125 × 3.27 mm3 using VUE Point Plus. For the Discovery MI scanner, images were reconstructed using time of flight (TOF) with a Bayesian penalized likelihood algorithm (Q.Clear), featuring a matrix size of 192 × 192, voxel sizes of 2.60 × 2.60 × 2.78 mm3, and a penalization factor of 700, incorporating point spread function modeling. Consistent reconstruction settings and matrix were applied for each scanner.
Image and radiomic feature analysesTwo radiologists, with 12 and 19 years of experience in [18F]-FDG-PET/CT scans, respectively, were informed of the study’s purpose but were blinded to clinical and pathological details. They reached a consensus on whether the primary lesion exhibited abnormal [18F]-FDG uptake, defined as uptake exceeding the background activity of surrounding tissues. A third radiologist, with 17 years of experience in [18F]-FDG-PET/CT, conducted quantitative analysis on the primary visible lesions. This radiologist manually set the volume of interest (VOI) on a reference-fused axial image, determining the craniocaudal and mediolateral boundaries to include the entire visible lesion while excluding any nearby physiological [18F]-FDG-avid tissues. The VOI boundaries were set using a threshold of 40% of the maximum standardized uptake value (SUVmax). The LIFEx software (version 7.2) [22] was employed to derive 49 radiomic features from the PET images (Supplemental Table 1). This software requires VOIs to contain a minimum of 64 voxels to compute textural features. To mitigate correlations between features and reduce noise effects as well as matrix size, both the VOI and SUV values were resampled into discrete bins using an absolute resampling approach [23]. For the PET component resampling, a total of 64 bins were created, covering an SUV interval from 0 to 20. Voxel dimensions were standardized to 3.0 × 3.0 × 3.0 mm3, resulting in a bin size of 0.3 SUV. Any voxel with an SUV higher than 20 was assigned to the uppermost bin [23]. Since two PET scanners were utilized, we harmonized the PET parameters post-reconstruction using the ComBat method within R software (https://github.com/Jfortin1/ComBatHarmonization) [24], which has been validated in previous PET research [25].
Treatment and follow-upFor each patient, staging of the tumor was conducted using the TNM classification system (7th edition) by the UICC, based on standard pre-treatment clinical examinations and imaging studies, including [18F]-FDG-PET/CT. The results from these assessments guided the treatment strategy (Supplemental Material).
The prognosis of each patient was determined using medical records, with the last follow-up recorded in December 2023. Progression-free survival (PFS) was measured from the start of treatment to disease progression, death from cancer, or the date of the last follow-up, whichever was earlier.
ML approachThe workflow of this study, depicted in Fig. 2, utilized eight clinical factors (T stage, N stage, M stage, UICC stage, histology, tumor size and two biomarkers [carcinoembryonic antigen (CEA) and carbohydrate antigen 19-9 (CA19-9)]) and 49 radiomic features to assess PFS with ML approaches.
Fig. 2Work flow of the present study. Radiomics features have been extracted from the volume of interest segmented from PET/CT images. Then ComBat harmonization is applied to each radiomic feature due to two different PET scanners. Finally, two ML models (all features ML model and selected features ML model) are developed to predict PFS of the patients with gallbladder cancer using clinical features and PET/CT radiomic features
In the training cohort, the data were categorized by event and randomly split into training (70%) and validation (30%) sets. ML models were employed to analyze time-to-event data, specifically using the linear cox proportional hazard (CPH) model and the nonlinear random survival forest (RSF) approach, with detailed settings available in the Supplemental Material.
With a small sample size, feature reduction was necessary to minimize overfitting. In the training cohort, three feature selection methods were applied: univariate statistical feature selection based on the Mann–Whitney U test [26], least absolute shrinkage and selection operator (LASSO) Cox regression [27], and recursive feature elimination (RFE) [28]. The Mann–Whitney U test, a non-parametric method, assesses differences between two independent groups to identify features with significantly different distributions [26]. The LASSO method is used in data analysis to minimize the coefficients of variables that are not associated with survival, effectively reducing them to zero [29]. It identifies critical features by suppressing the influence of unimportant ones, removing redundancy, and selecting variables with non-zero coefficients for model development [27]. RFE systematically evaluates and discards features to determine those that best enhance model accuracy [28]. Both full-feature sets and selected sets were used for ML model construction, adhering to the principle of using fewer than 10% of the sample size as features for classification [30]. The study’s total sample size was n=52, leading us to select under five features for the ML model. Feature important score was used to evaluate the effect of the features for the ML model development [31]. In addition, to address potential overfitting, we applied k-fold cross-validation, specifically a fivefold approach, to reduce overfitting effects [32,33,34].
The predictive power of the ML models was evaluated using the concordance index (C-index), which determines how well the predicted event times correspond with actual patient outcomes. A C-index of 0.5 reflects random chance, while a value of 1 represents perfect accuracy [35, 36]. Typically, a C-index over 0.8 indicates strong predictive performance [35, 36].
The performance of the predictive models was assessed in the testing cohort using root mean squared error and mean/median absolute errors. These metrics compared actual versus predicted progression events [37, 38]. Within the testing cohort, each ML algorithm computed the PFS probability. The median survival time, representing a 50% PFS probability, was utilized to evaluate the precision of survival predictions, where closer proximity to the actual survival time indicated better performance [38]. The PySurvival library was employed for conducting this survival ML analysis [39]. In addition to PySurvival, we used scikit-learn for feature selection and data scaling.
Statistical analysisThe study utilized the Mann–Whitney U test or Chi-square test to analyze differences between variables and categorical data. Kaplan–Meier survival curves were drawn, with log-rank tests used to determine significance. Data were presented as medians with interquartile range (IQR). Statistical significance was established at p < 0.05, with two-sided p values considered. Analyses were performed using MedCalc software (MedCalc, Mariakerke, Belgium).
留言 (0)