We included 3629 unique adult patients in this multi-site study. The study included participants from three different sites. At Site 1, MRI exams were acquired from 2076 patients with a mean ± standard deviation (SD) age of 53.8 ± 13.9 years, of whom 1076 (51.8%) were female. At Site 2, MRI examinations were acquired from 1226 patients with a mean ± SD age of 51.5 ± 15.4 years, of whom 563 (50.4%) were female. At Site 3, MRI examinations were acquired from a total of 327 patients with a mean ± SD age of 54.9 ± 13.8 years, of whom 174 (53.2%) were female. Additional demographic information is listed in Table 1. No significant difference was found in age (p = 0.23) or sex (p = 0.47) between the three sites. A total of 3857 examinations were retrieved. Among them, 2304 were from Site 1, 1226 were from Site 2, and 327 were from Site 3.
Table 1 Detailed demographics of the cohort from three study sitesEffectiveness of ComBat Harmonization on Radiomic FeaturesTable 2 shows the number of features with significant differences and Cohen’s F scores for radiomic features before and after ComBat harmonization in all 6 experiments. Of 172 radiomic features, 76.7% (Siemens), 52.9% (GE), and 26.7% (Philips) of features were significantly different between 1.5 T and 3 T in Experiments 1–3. Among three manufacturers at 1.5 T, 155/172 (90.1%) radiomic features were significantly different in Experiment 4. Among three system manufacturers at 3 T, 129/172 (75.0%) radiomic features were observed to be significantly different in Experiment 5. In Experiment 6, we found that 117/172 (68.0%) radiomic features were significantly different between systems from different manufacturers and field strengths. After ComBat harmonization, no significant difference was observed on radiomic features among manufacturers or field strengths. In all (6/6) experiments, Cohen’s F scores for radiomic features were reduced significantly after ComBat harmonization.
Table 2 The number of features with significant differences and Cohen’s F score for radiomic features before and after ComBat harmonizationTo visualize the batch effects before and after ComBat harmonization, we generated heatmaps of z-score normalized radiomic features from all MRI exams in Experiment 6 (Fig. 2). The heatmap of original radiomic features shows distinct color variations across study sites (manufacturer, field strength), indicating that batch effects exist in radiomic features (Fig. 2A). In contrast, the heatmap of ComBat harmonized features exhibits uniform variability across patients from different groups (Fig. 2B). This suggests that the harmonization process was successful in mitigating batch effects.
Fig. 2Heatmaps of radiomic features of all 3857 MRI exams in Experiment 6 before and after harmonization. A Original radiomic features and B ComBat harmonized radiomic features. Each row corresponds to individual T2W MRI exams, while each column corresponds to individual radiomic features. Z-score transformation was applied to normalize individual radiomic features. MRI exams are grouped based on study site (manufacturer, field strength)
We also separated liver and spleen radiomic features before performing all five experiments (Tables 3 and 4). Between different field strengths, the numbers of significantly different liver radiomic features with significant differences were 84/86 (97.7%) for Siemens, 33/86 (38.4%) for GE, and 31/86 (36.0%) for Philips, respectively. No significantly different radiomic features of the liver were found after the ComBat harmonization. The numbers of significantly different spleen radiomic features were 48/86 (55.8%) for Siemens, 58/86 (67.4%) for GE, and 15/86 (17.4%) for Philips, respectively. Similarly, no significantly different spleen radiomic features were found after the ComBat harmonization.
Table 3 The number of features with significant differences and Cohen’s F score for liver radiomic features before and after ComBat harmonizationTable 4 The number of features with significant differences and Cohen’s F score for spleen radiomic features before and after ComBat harmonizationFor 1.5 T, 80/86 (90.3%) liver radiomic features were significantly different among the three different manufacturers, while for 3 T, 67/86 (77.9%) liver radiomic features were significantly different. Meanwhile, for 1.5 T, 75/86 (87.2%) spleen radiomic features were significantly different among three different manufacturers, while for 3 T, 62/86 (72.1%) spleen radiomic features were significantly different. In Experiment 6, 57/86 (66.3%) and 60/86 (69.8%) radiomic features were observed significantly different for the liver and spleen, respectively. No liver or spleen radiomic features were found significantly different after ComBat harmonization. Cohen’s F scores for both liver and spleen radiomic features decreased significantly after ComBat harmonization.
We visualized the distributions of several radiomic feature examples using kernel density plots before or after ComBat harmonization. Figure 3A illustrates distributions of intensity-distance matrix non-uniformity (IDMN) of the liver between 1.5 T and 3 T. Visually, the distributions of the original IDMN showed a distinct difference between 1.5 T and 3 T systems from Site 1 (Siemens) and Site 3 (Philips). Compared to this, harmonized IDMN exhibited very similar patterns between 1.5 T and 3 T across all sites. At Site 2 (GE), since the distributions of original IDMN were similar between 1.5 T and 3 T, ComBat had only a minimal adjustment on the distribution of IDMN. Figure 3B visualizes the distributions of inverse variance (IV) features of the liver across three manufacturers. By comparing the original and harmonized features, it is shown that ComBat was able to successfully eliminate distribution differences among the three manufacturers.
Fig. 3A Kernel density plots of intensity-distance matrix non-uniformity of liver by field strength. The top row is for the original data, and the bottom row is for ComBat harmonized data. Columns are plots for Site 1 (Siemens), Site 2 (GE), and Site 3 (Philips), from left to right. B Kernel density plots of inverse variance of the liver by system manufacturers. The top row is for the original data, and the bottom row is for ComBat harmonized data. The left column is for data from 1.5 T systems, and the right column is for data from 3 T systems
The kernel density plots of the mean absolute deviation (MAD) of the spleen between 1.5 T and 3 T are shown in Fig. 4A. We observed that ComBat had successfully harmonized the distributions of MAD feature from Site 1 (Siemens) and Site 2 (GE). Apparently, ComBat was not as effective for Site 3 as for the other two sites. This is likely because the MAD feature of 3 T systems at Site 3 had a bimodal distribution in shape (i.e., two prominent peaks), which is much wider spread than MAD feature of the 1.5 T systems before harmonization, resulting in additional difficulties in aligning two distributions well. The kernel density plots of IV features of the spleen among three manufacturers are presented in Fig. 4B, where ComBat has successfully harmonized the distributions of IV features from Site 1 (Siemens), Site 2 (GE), and Site 3 (Philips). Visually, the distributions of spleen IV feature were perfectly aligned among different manufacturers’ 1.5 T systems, while these were only sub-optimally aligned on 3 T systems, even though significant differences cannot be detected among different manufacturers.
Fig. 4A Kernel density plots of mean absolute deviation of the spleen by field strengths. The top row is the original data, and the bottom row is ComBat harmonized data. Columns are plots for Site 1 (Siemens), Site 2 (GE), and Site 3 (Philips), from left to right. B Kernel density plots of inverse variance of the spleen by system manufacturers. The top row is for the original data, and the bottom row is for ComBat harmonized data. The left column is for data from 1.5 T systems, and the right column is for data from 3 T systems
Effectiveness of ComBat Harmonization on Deep FeaturesTable 5 lists the number of features with significant differences and Cohen’s F scores for deep features before and after ComBat harmonization in all 6 experiments. Of 1024 deep features, we observed that 89.0% (Siemens), 56.5% (GE), and 0.1% (Philips) of features have significant differences between 1.5 T and 3 T. Among the three manufacturers at 1.5 T, 914/1024 (89.3%) of deep features had significant differences. Among 3 manufacturers at 3 T, 861/1024 (84.1%) of deep features had significant differences. In Experiment 6, 858/1024 (83.8%) deep features were significantly different between systems from different manufacturers and field strengths. After ComBat harmonization, deep features among manufacturers or field strengths had no significant difference. Lower Cohen’s F scores were observed consistently for all (6/6) experiments after ComBat harmonization.
Table 5 The number of features with significant differences and Cohen’s F score for deep features before and after ComBat harmonizationWe illustrated the batch effects in deep features before and after ComBat harmonization using heatmaps of deep features from all MRI exams in Experiment 6 (Fig. 5). Similar to radiomic features, we also applied z-score transformation to normalize individual deep features. Figure 5A shows the heatmap of original deep features, where distinct color variations exist across subjects from different study sites, manufacturer, and field strength. This suggests that there were batch effects in original deep features. The heatmap in Fig. 5B exhibits uniform variation across all MRI exams, indicating reduced batch effects among deep features.
Fig. 5Heatmaps of deep features of all 3857 MRI exams in Experiment 6 before and after harmonization. A Original deep features and B ComBat harmonized deep features. Each row corresponds to individual T2W MRI exams, while each column corresponds to deep features. Z-score transformation was applied to normalize individual deep features. MRI exams are grouped based on study site (manufacturer, field strength)
The distributions of deep features before or after ComBat harmonization were also visualized using kernel density plots. Figure 6A illustrates distributions of a random deep feature (component index = 72) between 1.5 T and 3 T. The component index was randomly selected from 1024 components of deep features from Swin Transformer. The original features from Site 1 (Siemens) had a distinct difference between 1.5 T and 3 T systems, while the original features from Site 2 (GE) and Site 3 (Philips) had no significant difference between field strengths. After harmonization, this deep feature (component index = 72) at Site 1 (Siemens) had very similar patterns between 1.5 T and 3 T. Figure 6B shows another random deep feature (component index = 104) across three manufacturers. We noted that ComBat can also successfully harmonize distribution among three manufacturers with the same field strengths.
Fig. 6A Kernel density plots of deep feature (component index = 72) distributions by field strengths. The top row is the original data, and the bottom row is ComBat harmonized data. Columns are plots for Site 1 (Siemens), Site 2 (GE), and Site 3 (Philips), from left to right. B Kernel density plots deep feature (component index = 104) distributions by system manufacturers. The top row is for the original data, and the bottom row is for ComBat harmonized data. The left column is for data from 1.5 T systems, and the right column is for data from 3 T systems
留言 (0)