Effect of the identification group size and image resolution on the diagnostic performance of metabolic Alzheimer’s disease-related pattern

ADRP is a metabolic brain biomarker for AD and has potential for translation to clinical practice [8, 22, 23]. So far, various ADRP patterns have been identified and validated on different cohorts, from brain images acquired with different scanners, reconstructed with various algorithms, and smoothed with various filters. However, the effect of variation of these technical parameters on ADRP has not yet been thoroughly investigated. In this study, we systematically evaluated the impact of the size of the identification group and the impact of the image resolution of both identification and validation images on the ADRP’s diagnostic performance. The purpose of conducting such study was also to enable other researchers to estimate the ideal sample for their future SSM-based projects. The 2-[18F]FDG-PET images were randomly selected from the ADNI database. Since the ADNI database is an extensive multisite dataset, it is ideal for the derivation and validation of network biomarkers.

We identified 750 versions of ADRP and studied their AUC values for the discrimination of AD patients from healthy control subjects. All newly identified patterns had hypo- or hypermetabolic regions similar to those reported in previously published ADRPs [6, 7, 20, 24, 25]. Visually, the differences between patterns identified from identification groups that differed in sizes and resolutions were minor.

Nevertheless, we observed a wide spread of the AUC values among the 25 replications of ADRPs that were identified from the identification groups with the same sizes and image resolutions. These variations among AUCs were decreasing with the growing size of the identification group (at 20 AD/20 CN identification group size standard deviation of AUC was up to 9%, whereas at 60 AD/ 60 CN and 80 AD/80 CN, it dropped to 2% and 3%). This most likely implies that the biological variability among the subjects is less expressed in larger groups. By averaging AUC over the replications of the ADRPs identified from the same group size and image resolution of the identification images, we noticed that by increasing the size of the identification group, the best average AUC increased only slightly (from 20 AD/20 CN to 80 AD/80 CN subjects the AUC increased for about 3%). Generally, AUC values were the highest for medium resolution of the identification images (range from 8 to 15 mm) and were at these values also not sensitive to minor variations in image resolution. Larger groups favored worse identification image resolutions (for groups > 40 AD/40 CN the highest AUC is for FWHM > 10 mm, but for smaller groups, the highest AUC is at FWHM < 10 mm). Considering only the results of the average AUCs would mean that groups of 20 patients and 20 healthy controls can be sufficient for successful pattern identification. However, to reduce the effect of biological variability, using an identification group 30 AD/30 CN shall be beneficial. Our results also imply that smoothing out more details in the images when increasing the group size may produce a more robust pattern.

We further examined the average of the lowest five AUC values from the ADRPs with equal sizes of identification cohorts. For group sizes of 40 AD/40 CN and above, we could see that the behavior was very similar to the average AUC. Nevertheless, for smaller identification groups, especially for the 20 AD/20 CN one, the average of the lowest five AUC values is lower (about 10%) compared to the average AUC value. This suggests that the 20 AD/20 CN group size may be too small for a reliable pattern performance and that identification groups of size at least 30 AD/30 CN tend to have better performance. It also implies that due to biological variability or possible poor selection of subjects, a pattern identified from small cohorts could have compromised performance.

The performance of ADRP was not affected significantly by the resolution of validation images in prospective analyzes, implying that a properly identified ADRP can be used successfully with new images even if they have a different resolution than the identification images, as long as it is in the range between 6 and 15 mm.

In this study, we had a chance to observe an effect of biological variability demonstrated with group repetitions, and this could be compared to the performance of the previously published patterns. Perovnik et al. [8] report AUC values of 0.95 and 0.98 for ADRP identified from 20 AD/20 CN subjects and validated with two different sets of subjects. Iizuka et al. [25], with 50 AD/50 CN identification subjects, report AUC values between 0.80 and 1 for ten different samplings of identification and validation group. Mattis et al. [24] identified ADRP on 20 AD/20 CN ADNI data and obtained an AUC value of 0.86 for the identification and 0.87 for the validation cohort (personal communication). Two other authors report AUC values for the ADRP pattern but calculated only for identification subjects. Habeck et al. [4] compared the performance of five replications of ADRP, identified with different groups of 20 AD/20 CN subjects, and obtained AUC values between 0.87 and 0.97. Meles et al. [7] report an AUC value of 0.95 for ADRP identified on a small sample of 15 AD/18 CN subjects.

Our AUC values, calculated for validation subjects, are comparable to Iizuka et al. [25] and Mattis et al. [24] but lower than Perovnik et al. [8], whose ADRP has been identified in AD patients pathologically confirmed with a cerebrospinal fluid biomarker. Other authors report AUCs for the identification groups, which are expectedly higher than for validation groups. It could be seen that results that stem from ADNI data reach similar AUCs, probably due to the multisite nature of the data, and are different from the results obtained from single-center scans. Habeck et al. [4] and Iizuka et al. [25] also confirm the spread of AUC values for group repetitions and imply that the selection of the subjects may have a significant impact on the AUC value. The reason for somehow lower AUC values in our study and their wider spread, especially in smaller identification groups, is possibly caused by the random selection of subjects from the ADNI database. In the selection process, we considered patients’ diagnoses but not their detailed clinical data (i.e., disease duration, cognitive status), which could improve subject selection. It should also be noted that subjects in our study were scanned on four different scanners with considerably different configurations (whole-body PET scanner vs. dedicated brain PET scanner) and different scanner generations. Due to these differences, images were likely more heterogeneous regarding noise level, and these differences could not be fully corrected with scanner-specific image smoothing. Additionally, it should be emphasized that our reported AUC values stem from the validation images, while others generally reported AUC values that stem from the identification image sets, which may cause a possible overfitting bias.

Our findings about the appropriate resolution of the ADRP identification images can be roughly compared to the image resolutions chosen by the authors in previous studies. Smoothing with 10 mm [6, 7, 20, 24] or 12 mm [4, 5] FWHM Gaussian kernel is reported, leading to the image resolution of about 11.4 mm or 13.2 mm FWMH for Siemens Biograph mCT and worse for other scanners. This is compliant with the highest AUCs for patterns identified from images with resolutions between 8 and 15 mm FWHM in our study.

Nevertheless, we found that the effect of image resolution on the pattern’s diagnostic performance is rather small, which is in accordance with previous research exploring the influence of other technical parameters of the 2-[18F]FDG-PET images on other disease-specific metabolic patterns. In Tomše et al. [15], we estimated the effect of 2-[18F]FDG-PET reconstruction algorithms on the expression of the Parkinson’s disease-related pattern (PDRP). The reported AUC values for differentiation between patients and healthy controls are stable regardless of the reconstruction algorithm; around 0.95. Even so, it was stressed that for the prospective determination of pattern expression in a new subject, scaling with a group of CN subjects’ images scanned and preprocessed with the same protocol is needed. Similarly, stable AUC values were achieved for PDRP also by Peng et al. [18], who studied the effects of PET scanners and spatial normalization with different softwares and reported AUC values between 0.96 and 0.97 for different SPM versions. Wu et al. [16] studied PDRP scores in two populations scanned with different scanners and reconstruction algorithms and obtained AUC values between 0.98 and 0.99. Moeller et al. [17] also confirmed a high reproducibility of the PDRPs across four independent pattern identification populations each scanned with a different PET scanner.

Our study has some limitations. Firstly, random machine selection of patients’ images from the publicly available ADNI database might differ from the recruitment of patients to the prospective studies with the original purpose of disease-specific pattern identification. From the database, the patients were selected according to their clinical diagnosis, which may be wrong in around one-third of patients [26]. Additionally, our data on the scanner’s characteristics were limited to the type of scanner and the reported effective resolution, which we then used to calculate the filter’s FWHM needed to reach the final image resolution. Other factors can play a role in the quality of the images, such as image count rates, depending upon injected radioactivity, body mass, blood glucose level, medications, possible sleeping, room lighting, ambient sound levels, and staff interactions with the subjects. Image resolution can also be affected by motion artifacts.

Further assessments of ADRP’s diagnostic performance should check whether AUCs from a single site testing set are systematically greater than analogous values from the multisite data, e.g., ADNI, and whether AUCs would increase over time as AD progresses.

留言 (0)

沒有登入
gif