Two datasets were used to conduct experiments to evaluate the performance of the proposed diffusion model in terms of PET image quality enhancement and to further explore the feasibility of enhancing the quality of conventional PET images with the aid of HQ images derived from long-axis PET equipment. Table 1 shows general descriptions of the two datasets used for the two experiments.
Table 1 Patient characteristicsOne of the datasets (Dataset 1) was obtained from the Sun Yat-sen University Cancer Center and included a total of 137 patients. The inclusion criterion was age (between 15 and 90 years), and the exclusion criterion was waiting time after 18F-FDG injection (> 75 min). All the data were collected by a total-body PET/CT uEXPLORER (United Imaging Healthcare, Shanghai, China) scanner with a mean injection dose of 0.11 mCi/kg. Approximately 5 min after 18F-FDG injection, we collected raw data for approximately 5 min. The institutional ethics committee approved this study, and all the subjects provided written informed consent before participating in this study.
Another part of the data (Dataset 2) was obtained from The Cancer Imaging Archive (TCIA)[38,39,40]. A total of 43 cases were selected, with the following inclusion criteria: age (between 15 and 90 years) and reconstruction via the OSEM algorithm; the exclusion criterion involved images that did not include the abdominal or thoracic cavities.
For Dataset 1, 50 patient datasets were used for training, and 87 datasets were used for testing and validation (see T1 in the supplementary file for details). The normal-dose PET images of 50 patients were used to train the standard DDPM, and the trained model was directly plugged into the null space-constrained diffusion model imaging method proposed in this study. The noisy low-dose PET images of the remaining 87 patients were used to test this imaging method. The low-dose dataset was generated by extracting a portion of events from the normal-dose list-mode dataset, and the PET images reconstructed with 1% of the extracted event data were used as the low-dose data in this experiment. All the PET images were reconstructed via TOF-OSEM with the following parameters: PSF modeling, 3 iterations, 20 subsets, a matrix size of 256 × 256, a slice thickness of 2.89 mm, a voxel size of 2.34 × 2.34 × 2.89 mm3, Gaussian postfiltering (3 mm), and all the necessary correction methods, including scattering and attenuation corrections. Dataset 2, containing data from 43 patients, was only used to test the proposed method and was not used to train DDPM to verify its effect and performance in cross-center and cross-device scenarios. All PET images in the experiments were converted to SUV units, and the size of each image was resampled to a matrix of 128 × 128 pixels so that the deep learning model used in this study was runnable on our computational workstation. Figure 1 illustrates the application processes employed for both datasets.
Fig. 1Application processes employed for the two datasets used via the proposed method. HQ PET images acquired by high-end scanners were used to train the diffusion model, after which the trained model was used to process the low-quality (LQ) PET images under null-space constraints to improve their quality
Null space-constrained diffusion modelThe DDPM defines a T-step forward process and a T-step reverse process. The forward process gradually adds Gaussian noise to the input data, whereas the reverse process reconstructs the desired data samples from the noise [25]. The forward process can be defined as a Markov diffusion process \(q(\cdot )\) that adds Gaussian noise with a variance of \(^\) to \(}}_\) in each step:
$$\beginc} }_ }_ } \right) = \mathop \prod \limits_^ q\left( }_ }_ } \right)} \\ \end$$
(1)
$$\beginc} }_ }_ } \right) = N\left( }_ ;\sqrt } }_ ,\beta_ }} \right)} \\ \end$$
(2)
where \(}}_\) is the noise image at time step \(t\), \(_\) is the predefined scale factor, and \(}\) represents the Gaussian distribution. The reverse denoising process \(_(\cdot )\) is defined as follows:
$$\beginc} \left( }_ } \right) = p\left( }_ } \right)\mathop \prod \limits_^ p_ \left( }_ }_ } \right)} \\ \end$$
(3)
$$\beginc} }_ }_ ,}_ } \right) = N\left( }_ ;\mu_ \left( }_ ,}_ } \right),\sigma_^ }} \right)} \\ \end$$
(4)
where \(_\left(}}_,}}_\right)=\frac_}}\left(}}_-_\frac_}}_}}\right), _=1-_, }_=\prod_^_\) and the variance \(_^=\frac}_}}_}_\). \(\Theta\) represents the noise contained in \(}}_\), which is the only uncertain variable in the inverse process. The DDPM uses a neural network \(}_\) to predict the noise at each time step \(t\), i.e., \(_=}_(}}_,t)\), where \(t\) denotes the estimate of \(\Theta\) at time step \(t\). As shown in Eqs. (1) and (2), the noise introduced during forward diffusion gradually destroys the image, leading to the loss of structural details, which cannot be recovered by conventional diffusion models.
Given a linear operator \(}\), its pseudoinverse \(}}^\) satisfies \(}}}^}\) ≡ \(}\). \(}}^}\) can be seen as an operator that projects samples \(}\) to the range space of \(}\). In contrast, \(\left(}-}}^}\right)\) can be seen as an operator that projects samples \(}\) to the null space of \(}\) since \(}\) \(\left(}-}}^}\right)}\) ≡ 0. Any sample \(}\) can be decomposed into two parts: one part in the range space of \(}\) and the other part in the null space of \(}\):
$$\beginc} } \equiv }^ } + \left( } - }^ }} \right)}} \\ \end$$
(5)
Consider the problem of noisy image restoration with the form \(}=}}+}\), where \(}\) denotes additive noise and where \(}}\) represents clean measurements. The DDNM[29] is as follows:
$$\beginc} }_ = \frac_ } }}\left( }_ - }_ \left( }_ ,t} \right)\sqrt _ } } \right)} \\ \end$$
(6)
$$\beginc} }_ = }_ - _ }^ \left( }_ - }} \right)} \\ \end$$
(7)
$$\beginc} \left( }_ }_ ,\hat}_ } \right) = N\left( }_ ;\mu_ \left( }_ ,\hat}_ } \right),_ }} \right)} \\ \end$$
(8)
where \(_\) is utilized to scale the range–space correction \(}}^(}}}_-})\) and where \(_\) is used to scale the added noise in \(\widehat\left(}}_|}}_,}}}_\right)\). In this work, the problem we address includes not only noise suppression but also the recovery of image details. Here, we use the pooling-inverse pooling method for detail loss and blurring, so \(}\) is a combination of the two operators. (see T2 in the supplementary file for details). The red block in Fig. 1 shows the reverse diffusion process in detail.
Implementation detailsThe neural network \(}_\) used to predict the noise at each time step was a UNet structure containing attention and residual blocks, as described in [25]. In addition, to avoid axial artifacts in the coronal and sagittal views, we provide several neighboring axial slices for this UNet as network inputs. The input of the network included multislice data with a size of H × W × S, where H and W denote the height (128) and width (128) dimensions, respectively, and S denotes the depth of successive adjacent multislice data, which prevented axial artifacts in the coronal and sagittal sections. To reduce the computational time and memory consumption of the model, we fix S to 3. The loss function used was the same as that in [25]:
$$L\left( \theta \right) = }_ \right],x_ \sim q\left( } \right),\smallint \sim }\left( \right)}} \left( }_ \left( _ } x_ + \in \sqrt _ } ,t} \right)_^ } \right)$$
(9)
where \(}\) denotes the expectation, the number of time steps \(T\) was set to 1000, \(t\) was chosen uniformly from 1 to \(T\), and \(_\) was set to grow linearly from \(_\hspace\)= 0.0001 to \(_\hspace\)= 0.02, as in [25].
The diffusion network was trained for 400 epochs over approximately 4 days with a batch size of 15, and the initial learning rate was 1 × 10−4. In the experiments, to speed up the sampling process, the DDIM sampling method is adopted, which employs 100 steps for sampling. The network was constructed via the PyTorch deep learning framework on an Ubuntu 16.04 system with an RTX4090 graphics processing unit (GPU) and was optimized via the adaptive moment estimation (Adam) optimizer with an annealing strategy to accelerate the convergence process.
留言 (0)