Method to determine the statistical technical variability of SUV metrics

This study describes a method to estimate the statistical technical variability of SUV metrics and to compare the variability of SUV metrics between different lesion sizes. The proposed method determines the statistical technical variation of SUV metrics, which is a part of the total variation in PET imaging. The method's value lies in enabling the estimation of the influence of lesion size and choice of SUV metric on the total variation in a simple way.

In Fig. 1, the calculated values of the SUV metrics are shown for all spheres and for all reconstruction lengths. We report higher average SUV values for shorter reconstruction lengths. When the images are noisier, the chance is bigger than a single voxel or group of voxels will have a higher value due to a higher statistical variation. This effect occurs with SUVMax and SUVPeak, but not with SUVMean, where all the voxels in a region are used for calculation. The values in Fig. 1 could also be translated into recovery ratios in order to include the effect of background activity.

Figures 2, 3, 4, 5 and 6 show the coefficient of variation of the SUV metrics as a function of the reconstruction length for the different spheres. The higher variation at shorter reconstruction lengths reaches values up to 30% for the sphere with a 10 mm diameter. This suggests that when performing quantification of PET images on the small lesion and low counts, the effect of the statistical technical variability might not be negligible when compared with the variation used for diagnostic purposes. In this regard, the proposed method could be used to define the minimum required acquisition length: when the statistical technical variation of the SUV has become negligible to the test–retest variation, a longer acquisition time might not add value.

Figures 7, 8, 9, 10 and 11 show that the value of the measured and estimated coefficients of variation is comparable and that the standard deviation of the coefficients of variation is relatively small, indicating that this method can be applied to estimate the coefficient of variations at different reconstruction lengths. Figures 12 and 13 show that the choice of the reconstruction length of the subset used for the estimation is not creating a bias in the estimation of the coefficient of variation of the full-length dataset.

In Table 1, we report the result of the estimated standard deviation of the SUV metrics. We report significant differences in statistical technical variation for different sphere dimensions. The difference is always significant for each SUV metric for the smallest (diameter of 10 mm) and the largest (diameter of 37 mm) sphere. The difference is also significant between all spheres for the SUVMean. The coefficients of variation are typically ranging from 5% for the 10 mm sphere to 1% for the 37 mm sphere, in accordance with the range reported for simulated data [13]. In smaller spheres with lower recovery coefficients, we expect a larger influence of the noise on SUV metrics. Furthermore, for smaller spheres, a partial volume effect can introduce an extra source of variation in the quantification of SUV [14]. The maximum expected variation between images, for any estimated metric, did not exceed 6% for the smallest object (sphere of diameter 10 mm) and 2% for the largest object (sphere of diameter 37 mm) for a reconstruction length of 150 s. This provides an indication of the contribution of the statistical technical variation when the same scanner is used, with equal reconstruction length and activity, and can be compared with the variation measured in FDG PET test–retest studies reporting a typical variation of approximately 10% [15,16,17]. Our study is not a test–retest study, and it aims to quantify the variation obtained when (ideally) repeating the exact same acquisition, without changing any external factors, if not the statistical ones related to the characteristics of the emitters. The variation measured in our study is, therefore, smaller than the one typically measured in a test–retest study due to the fact that we do not have to deal with other factors such as repositioning of the patient or phantom and reinjection of the activity.

Nevertheless, it is important to notice that the value of the estimated statistical technical variation calculated for our scanner and reconstruction method is not directly translatable to other centres. The variability in the calculation of SUV metrics inhibits the direct comparison of these values [18]. Other factors introducing technical variability are, for example, acquisition settings, voxel size, reconstruction protocols, gating settings, analysis methods and scan duration, and their influence is too prominent for a direct comparison of the absolute values of the variation between scanners [7, 19].

For a given PET scanner, using advanced image reconstruction algorithms [20] will significantly improve the image quality in terms of noise and lesion detectability. Iterative PET reconstruction methods have been proven superior to filtered back projections (FBP) for their superiority in detecting focal regions and in reducing noise [21]. Furthermore, studies have shown that the noise correlation in FBP reconstruction might be object dependent and, therefore, it could not be possible to apply general statistical methods when estimating a coefficient of variation in different regions of interest in an image. [22] In our method, we have used an OSEM algorithm (an advanced Bayesian iterative reconstruction technique) because it is the usual choice when reconstructing whole-body F18 images. One of the advantages of an OSEM algorithm is that it aims to consider all the physical and statistical processes happening during data acquisition. On the other hand, Bayesian reconstruction algorithms penalize the formation of noisy images based on the hypothesis that large local variations in voxel intensity in the images are most likely due to noise. In OSEM algorithms, the degree of this penalization is unregulated, and the number of iterations is often reduced in order to control noise but at the cost of reducing contrast and lesion detectability [23]. Other reconstruction algorithms have shown better performances in noise reduction and lesion detectability. For example, point spread function (PSF) reconstruction algorithms can also be applied to PET images and have been shown to reduce noise and increase contrast in the reconstructed images identifying the potential for further reduction of the coefficient of variation in SUV metrics [24]. Furthermore, deep learning techniques have shown positive results in PET reconstruction applications [25], opening the possibility of reducing scan time or injected activity by up to 50% compared to OSEM algorithms [26]. Once again, we would like to underline that this paper only presents an example of the application of the method we describe for a specific Philips Gemini TF PET/CT system and its OSEM reconstruction algorithm.

The degree of statistical technical variation of an image is strongly dependent on imaging and reconstruction settings and needs to be evaluated for the specific scanner and algorithm in use. When evaluating the technical statistical coefficient of variation, standardization of the complete acquisition, from scanner and acquisition settings to PET reconstruction settings, is therefore strongly advised [7, 14, 27]. A simple method as the one described in this article can be routinely implemented to identify the contribution of the statistical technical variation in PET images. Once the degree of statistical technical variation is known, a user can evaluate its relevance to the total variation of SUV between clinical studies.

The SUV metrics (Max, Mean and Peak) present some significant differences for the same sphere diameter. For what concerns the smaller spheres (d = 10, 13 mm), the averaging step introduced in the calculation of SUVMean and Peak does not provide a significant difference in the coefficient of variation in our measurements. For larger lesions, the difference between the variation in SUV Mean 3D and SUV Peak is significant, suggesting that the dimension of the ROI used for averaging has a significant effect on SUV quantification and that a too large ROI might flatten the results. It is worth reminding that our definition of SUVMean was based on the knowledge of the measured objects, with a ROI defined as a sphere of diameter equal to the nominal diameter of the imaged sphere. This is not always possible during the analysis of images for diagnosis purposes. In that case, another definition of SUVMean must be used, and the variation between measurements might be expected to increase [12, 15, 16]. Furthermore, the biological factors present in clinical practice, such as glucose blood levels, rate of FDG uptake in the lesions or weight recording, can increase the SUV variation in diagnostic images [8, 28,29,30,31].

Another method to estimate the statistical technical variation would be to acquire a dataset with a longer acquisition time in comparison with the acquisition time used for diagnostic and generate subsets of the long acquisition with a time length similar to the one used for diagnostic. This could be a more direct way to measure the variation, possibly less susceptible to low photon statistics. A similar approach has been shown in [7] for SUVMax and Mean for reconstruction lengths of 5 min with variation between 11.2 and 1.2% depending on the filter, type of acquisition (2D or 3D) and metric (SUVMax or Mean) used. Another way to quantify voxel-based variation in SUV metrics has been reported in reference [32], showing higher sensitivity for SUV Mean (91.4%) than for SUV Max (82.0%).

For this study, we worked with a foreground-to-background activity ratio of 10:1. In order to verify the method further, it could be possible to repeat the evaluation with other ratios, for example, 5:1 and 2.5:1. Furthermore, the acquisitions could be repeated after a certain amount of hours in order to analyse the variation with other levels of noise. As previously discussed, a higher coefficient of variation can be expected for noisier images.

留言 (0)

沒有登入
gif