Machine learning approach using 18F-FDG-PET-radiomic features and the visibility of right ventricle 18F-FDG uptake for predicting clinical events in patients with cardiac sarcoidosis

Patients

This retrospective study was approved by the institutional review board, and the need for informed consent was waived. In total, 70 consecutive patients with known or suspected CS underwent pretreatment 18F-FDG-PET/CT scan from April 2012 to December 2022. Their clinical records were reviewed to identify patients who should be evaluated.

In a previous study [15], the usefulness of Patlak Ki images extracted from dynamic 18F-FDG-PET/CT scan for evaluating the risk of clinical events in CS was examined. The previous study enrolled 21 patients with CS who underwent 30 18F-FDG-PET/CT scan, which included pretreatment, undertreatment, and follow-up scans, between April 2019 and January 2020. However, analyses using ML approaches for predicting the risk of ACEs in patients with CS using pretreatment 18F-FDG-PET-based radiomic features were not performed. Thus, among 21 patients, 8 with pretreatment 18F-FDG-PET/CT scans were included in the current study. The inclusion criteria were as follows: (1) patients diagnosed with CS according to the Japanese Society of Sarcoidosis and Other Granulomatous Disorders guidelines [16], (2) those without a history of steroid treatment, and (3) those with visible cardiac 18F-FDG uptake on PET/CT scan. The exclusion criteria were patients with a history or coexistence of other cardiac disorders.

Of 70 patients, 12 without cardiac 18F-FDG uptake were excluded. Among the remaining 58 patients, 11 were further excluded because of hypertrophic cardiomyopathy (n = 2), dilated cardiomyopathy (n = 2), ventricular aneurysm (n = 1), and lack of CS evidence (n = 6).

Finally, 47 patients (38 women and 9 men; mean age: 61 ± 10 [age: 39–81] years) were eligible for the analyses. Immunosuppressive treatment was adopted for these patients after the pretreatment 18F-FDG-PET/CT scan according to the recommendations of the Japanese Society of Sarcoidosis and Other Granulomatous Disorders guidelines [16]. The loading dose of prednisolone was 30 mg/day, which was tapered to a maintenance dose and administrated to all patients during the follow-up period.

Imaging protocols

All patients were instructed to follow a high-fat and low-carbohydrate diet for 1 day, and followed by a fast of at least 18 h before 18F-FDG-PET/CT scan, which resulted in a mean plasma glucose level of 102 (range: 71–154) mg/dL immediately before intravenous 18F-FDG administration.

All 18F-FDG-PET/CT scan procedures were performed using two whole-body PET/CT scanners. The Discovery 600M PET/CT scanner (GE Healthcare, Milwaukee, WI, the USA) was used from April 2012 to January 2018 and the Discovery MI scanner (GE Healthcare) from February 2018 to December 2022. The emission scan was performed 1 h after the administration of 18F-FDG (mean: 223 ± 30 [155–277] MBq) after CT data acquisition (slice thickness: 3.75 mm, pitch: 1.375 mm, 120 keV, auto mA: 40–100 mA, based on body mass, and reconstructed matrix size: 512 × 512). The acquisition time was 2.5 min per bed position (total: 7–11). Attenuation-corrected data were acquired. Using the Discovery 600M scanner, images were reconstructed with a three-dimensional ordered subset expectation–maximization algorithm (image matrix size: 192 × 192, 16 subsets, two iterations, voxel size: 3.125 × 3.125 × 3.27 mm3, and VUE Point Plus). Using the Discovery MI scanner, a Bayesian penalized likelihood reconstruction algorithm was used (image matrix size: 192 × 192, voxel size: 2.60 × 2.60 × 2.78 mm3, penalization factor: 700, and Q. Clear) with the point spread function. Each scanner used a consistent reconstruction setting and matrix.

Image and radiomic feature analyses

Two radiologists (with 12 and 20 years of 18F-FDG-PET/CT scan experience) who were knowledgeable about the study purpose but were blinded to the clinical information read the 18F-FDG PET/CT scan images. The radiologists visually assessed each 18F-FDG-PET/CT scan image as negative (myocardial visibility lower than or similar to that of the liver) or positive (myocardial visibility higher than that of the liver) 18F-FDG uptake [17] in the left ventricle (LV) and right ventricle (RV) myocardium. In case of a disagreement, they reached a consensus.

A third radiologist (18 years of 18F-FDG-PET/CT experience) performed quantitative analyses of the visible myocardial lesions. The third radiologist generated the volume of interest (VOI) by manually placing a region of interest on a suitable reference-fused axial image, and defined the craniocaudal and mediolateral extents encompassing the whole positive myocardial lesion, excluding any avid extracardiac structures. Next, the maximum standardized uptake value (SUVmax) threshold was set at 40%, which was commonly used in previous studies [18], to automatically delineate a VOI equal to or greater than the 40% threshold of SUVmax. The LIFEx package (version 6.00) [19] was used to extract 49 radiomic features from PET images (Supplemental Table 1). The LIFEx package is used to calculate textural features only for VOIs of at least 64 voxels. These 49 radiomic features were included in five categories (shape and first-order characteristics, gray level co-occurrence matrix, neighborhood gray-tone difference matrix [NGTDM], gray level run length matrix [GLRLM], and gray level zone length matrix). The VOI and SUV were resampled into discrete bins using absolute resampling to minimize the correlation between textural features and reduce the impact of noise and matrix size [20]. Sixty-four bins were used for the PET component with the minimum and maximum bounds of the resampling interval set to SUVs of 0 and 20, respectively. Moreover, the voxel size was resampled to 3.0 × 3.0 × 3.0 mm3. Therefore, a bin size with an SUV of 0.3 was used to analyze the PET component. Voxels with an SUV of > 20 were grouped in the highest bin [20].

As we used two different PET scanners, post-reconstruction harmonization was performed for all PET parameters using the ComBat harmonization method for R software (https://github.com/Jfortin1/ComBatHarmonization) [21], which is effective in PET scans [22].

Confirmation of ACEs

Echocardiography was performed within 2 months of 18F-FDG-PET/CT scan (mean ± standard deviation: 13 days ± 14 [range: − 50 to + 58 days]). The echocardiography report was used as the reference standard for cardiac function. Cardiac dysfunction was defined as a LV ejection fraction (LVEF) of < 50% [23]. Further, twelve-lead or Holter echocardiography was performed within 2 months of 18F-FDG-PET/CT scan (mean ± standard deviation: 17 days ± 15 [range: − 50 to + 58 days]). Moreover, patients were assessed to determine the presence of arrhythmic events, including sustained VT and AVB. AVB was characterized as either second- or third-degree AVB or trifascicular block [23, 24].

Medical records were used to obtain information on patient prognosis. The last follow-up was conducted in December 2023. ACE was defined as the reduction in LVEF with cardiac dysfunction (LVEF of < 50%), hospitalization due to cardiac arrhythmia such as recurrence or onset of sustained VT and AVB or heart failure, and death [25, 26]. Change in LVEF was determined by comparing the findings between echocardiography studies performed nearest to the pretreatment PET study and the last echocardiography studies of the follow-up period. Decrease in LVEF was defined as a negative change in LVEF.

ML approach

We adopted 49 radiomic features and the visibility of RV 18F-FDG uptake to predict ACEs using the ML approaches. Data were stratified according to event and were randomly assigned into the training (80%) and testing (20%) cohorts. Based on the ML analysis for predicting ACEs, decision tree, random forest (RF), neural network, k-nearest neighbors (kNN), Naïve Bayes, logistic regression (LR), and support vector machine (SVM), which are popular ML algorithms, were used for binary classification [27, 28].

The parameter selection for each ML method in this study was carefully made based on the specific clinical challenges and the characteristics of our dataset. For the decision tree, we limited node levels and split thresholds to prevent overfitting, and consequently we selected an induce binary tree with two minimum number of instances in leaves, a split greater than 5, with maximum 100 node levels for depth of classification tree and stop splitting the nodes after majority reach 95%. In the RF, a moderate number of trees were chosen to balance the model’s generalizability and computational efficiency, and consequently we selected 10 trees and did not split subsets smaller than 5. The neural network settings were optimized with rectified linear unit (ReLU) activation function and Adam optimization for efficient learning and good convergence, and consequently we selected 1000 neurons, alpha = 0.00001 and maximum iterations 1000. For kNN, setting the number of neighbors to 5 with metric Euclidean and weight uniform ensured suitable accuracy for our dataset size. The parameters for LR and SVM were chosen to optimize the tradeoff between model complexity and the risk of overfitting. Consequently, we selected a ridge with a coefficient score of 1 for LR. For SVM, we selected the Kernel radial basis function with cost 1 and regression loss epsilon 0.10, and the two optimization parameters, tolerance and iteration limit were set to 0.0010 and 500, respectively. In the case of Naïve Bayes, its simplicity and effective learning ability based on the distribution of data were valued. These parameter choices enabled us to construct robust and reliable predictive models aligned with the objectives of our study.

To overcome imbalanced data, the synthetic minority over-sampling technique was used in the training cohorts [29]. In this study, the sample size was small, and the set of features was reduced to prevent the influence of overfitting. The ranking-based method was only applied on the training cohort to reduce set features based on the decrease in Gini impurity. As a rule of thumb, it is necessary to use < 10% of the sample size as the number of features for classification problem [30]. The final sample size of this study was n = 47; thus, we selected the 4 top ranking features for constructing each ML model. Moreover, the use of a resampling technique referred to as k-fold cross-validation is one of the solutions of overfitting [31, 32]. Tenfolds are a common choice for k-fold cross-validation, particularly if the dataset is not extremely large or small [32]. In this study, a tenfold cross-validation was used to minimize the negative influence of overfitting on the training cohort.

Receiver operating characteristic curve (ROC) analysis was performed to compare the predictive performances of the models, and the area under the ROC curve (AUC) was calculated. The computed performance measures were AUC, accuracy, F1 score, precision (positive predictive value), and recall (sensitivity) for average over classes. The F1 score (F score or F measure) is the harmonic average between precision and recall [33]. Each ML algorithm was used to calculate each probability score (range: 0–1) of ACEs. The predictive performance of each machine model was independently estimated in the testing set by quantifying the AUC, accuracy, F1 score, precision, and recall.

The diagnostic indices including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the testing cohort were also calculated. The importance of features in the ML modeling process was calculated using the decrease in AUC [34]. A higher decrease in AUC for a feature indicates that such a variable has a higher importance [34].

The ML analysis was performed using Orange version 3.24.1 (Bioinformatics Laboratory, University of Ljubljana, Ljubljana, Slovenia), an open-source data-mining and visualization package [35].

Statistical analysis

The Mann–Whitney U test or the Chi-square test was used to appropriately assess differences between two quantitative variables or compare categorical data. The DeLong method was used to analyze the statistical significance of differences between AUCs [36]. The diagnostic indices including sensitivity, specificity, PPV, NPV, and accuracy were compared using the McNemar’s test or Chi-square test.

Data were presented as medians and interquartile ranges (IQRs). A p value of < 0.05 was considered statistically significant, and all p values were two-tailed. The MedCalc statistical software (MedCalc Software Ltd., Acacialaan 22, 8400 Ostend, Belgium) was used for statistical analyses.

留言 (0)

沒有登入
gif