Improvements of 177Lu SPECT images from sparsely acquired projections by reconstruction with deep-learning-generated synthetic projections

The convolutional neural network

The utilised convolutional neural network was the deep Convolutional U-net-shaped neural network for generation of Synthetic Intermediate Projections (CUSIP), presented by Rydén et al. in 2021 [16]. A schematic illustration of the network structure can be found in the supplementary material (Fig. S1). This network generates 90 SIPs from an input of 30P (down-sampled from the acquired 120P choosing every fourth projection starting with the first, i.e., projections 1,5,9…117). To generate SIPs the CUSIP was trained, using 120P as reference, three times to yield three different sets of SIPs: projections 2,6,10…118; projections 3,7,11…119, and projections 4,8,12…120. These 90 SIPs were added to the 30P input, forming a so-called CUSIP set of 120 projections.

For this study, the network was retrained, with expansion of the training material of 177Lu-DOTATATE images. Training parameters used were the same as Rydén et al. [16], only this time the network was trained for 300 epochs. All imaging time points were mixed during training (with 96% being one day post administration due to the historically used hybrid planar-SPECT/CT protocol). We compared two different loss functions, L1 – Least Absolute Deviations and L2 – Least Square Errors, used to minimize the error between the SIPs and the acquired projections during training. Further, we added 111In-octreotide images to the training material with L2 chosen as the loss function (as it outperformed L1 in the comparison, see the results section), forming the extended L2 network (Table 1).

Table 1 The networks trained in this study and the number (no.) of examinations in the training, validation, and test groupsSPECT reconstruction

The 16 test patients, each with five sets of projection data (Table 2), were reconstructed using the Sahlgrenska Academy Reconstruction Code (SARec), an ordered subset expectation maximization (OSEM)-based reconstruction algorithm that uses Monte Carlo (MC) simulations in the forward projection in the iterative process to correct for photon attenuation, scattering (in the patient and the collimator), and resolution recovery [18]. Resolution recovery correction was also applied in the back projection. The OSEM reconstruction was performed with 10 iterations and 6 subsets using an in-house developed software program. MC simulations within the reconstruction were performed with 200 photons/voxel emitted in an angular range of 0.06 radians.

Table 2 SPECT reconstruction and projection sets, with corresponding terminology. Networks 1, 2 and 3 refer to the networks presented in Table 1Subjects and image acquisition

For this study, we selected 2214 examinations in which SPECT/CT imaging was performed after treatment with 177Lu-DOTATATE or during examination with 111In-octreotide, at Sahlgrenska University Hospital, between 2003 and 2021. The inclusion criterium was SPECT/CT acquisition with 120P. The retrospective use of image data was approved by the Swedish Ethics Review Board, and the need for written informed consent was waived.

The examinations were performed using dual-head Anger cameras of models Millenium VG Hawkeye, Infinia Hawkeye 4, Discovery 670, and two Discovery 670 Pro, all from General Electric Medical Systems (Milwaukee, WI, USA). The crystal thickness was 3/8” for Infinia Hawkeye 4, and 5/8” for the other cameras. Acquisitions were performed with medium-energy general purpose (MEGP) parallel-hole collimators. The 177Lu-DOTATATE acquisitions were performed using the 208 keV photon peak of 177Lu, with an energy window of ± 10%. The 111In-octreotide acquisitions were performed using the 171 keV and 245 keV photon peaks, both with energy windows of ± 10%. These two energy windows were acquired and summed in the same image (not possible to separate in retrospect). Imaging was performed 1 day after administration of 177Lu-DOTATATE or 111In-octreotide. 4% of the 177Lu-DOTATATE acquisitions were from other time points than day 1 (range from day 0 to 7). The acquisition duration was 30 s/frame and 120P in step-and-shoot mode. The matrix size was 128 × 128, with a pixel size and slice thickness of 4.42 mm. For the CT, the matrix size was 256 × 256 for the Infinia Hawkeye 4, and 512 × 512 for the other cameras. The slice thickness was 5 mm for all CT cameras. The pixel size was 2.21 mm for the Infinia camera, 1.10 mm for the Millenium camera, and 0.98 mm for the Discovery 670 and Discovery 670 Pro cameras. All imaging acquisition parameters are presented in Table 3.

Table 3 Imaging acquisition parameters for the cameras used for the training, validation and test dataEvaluation/test group

To evaluate the performance of the networks, a test group including 16 sequential patients, 8 men and 8 women, treated with 177Lu-DOTATATE between 2019 and 2021 was selected. The mean age was 70 years (range 46–86 years). The patients were treated with a mean activity ± standard deviation (SD) of 7601 ± 123 MBq of 177Lu-DOTATATE, and imaging was performed after administration at day 0 (D0, range: 1.3–5.8 h), day 1 (D1, range: 19.5–24.5 h), day 2 (D2, range: 43.8–51.2 h), and day 7 (D7, range: 168.0–173.0 h). Each imaging time point included SPECT/CT with 120P over the abdomen, acquired using the two Discovery 670 Pro cameras described above, yielding a total of 64 image sets in the test group. For the projections, the original 120P served as the ground truth, with which the projection sets from the CUSIPs were compared. Similarly for the reconstructed images, the reconstruction of 120P (120P_rec) served as the ground truth, with which the reconstructions of the CUSIP sets, as well as reconstructions of 30P (30P_rec), were compared. Table 2 defines the terminology of the projection and reconstruction sets.

Quantitative measures

To evaluate the similarity between the acquired projections and the CUSIP sets, as well as 30P_rec and CUSIP_recs compared to 120P_rec, we used four quantitative measures: normalised root mean square error (NRMSE), normalised mean absolute error (NMAE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). The root mean square error (RMSE) is the square root of the mean of quadratic differences between the image (IM) and the reference image (RI) and measures the average magnitude of the error in unit pixels (for projections) or voxels (for reconstructions). To allow comparisons between different acquisition time points with their different count levels, the RMSE was normalised to the mean pixel or voxel value of the RI (Eq. 1).

$$\text=\frac}_^\sum _^\sum _^\left(x,y,z\right)-\text\left(x,y,z\right))}^}}\text}(x,y,z)}$$

(1)

n, m, and l are the number of voxels in each direction; and x, y, and z are the coordinates in the SPECT image. \(\stackrel\text}\) is the mean voxel value in the reference image. For the projection images, IM and RI represent 2D images instead. The mean absolute error (MAE) is the average of the absolute differences between IM and RI (in pixels/voxels) and, like RMSE, is normalised to the mean pixel/voxel value of the RI (Eq. 2).

$$\text=\frac}_^\sum _^\sum _^\left|\text\text\left(x,y,z\right)-\text\text\left(x,y,z\right)\right|}\text}(x,y,z)}$$

(2)

The NRMSE and NMAE are measures of the magnitude of the error in relation to the mean pixel/voxel value, and lower values imply lower difference. The PSNR (in decibels), can be used to compare the image quality between IM and RI, and is derived from the RMSE (Eq. 3):

$$\text=20}_\left(\frac}\right)$$

(3)

MAX is the maximum pixel/voxel value in any of the images. The PSNR describes the maximum possible pixel/voxel value in relation to the noise (in terms of the introduced error, RMSE), and could be considered a measure of contrast. Higher PSNR indicates a better match between the IM and RI in terms of image quality. NRMSE, NMAE and PSNR rely on numeric comparisons, and does not reflect the human visual system. To appreciate the perceived image quality, SSIM assesses perceptual image quality by considering image degradation as perceived change in structural information (Eq. 4). SSIM ranges from 0 to 1, with a higher value implying a higher similarity between the images.

$$\text\left(\text\right)=\frac_}_}+_)(2_}+_)}_}}^_}}^+_)(2_}_}+_)}$$

(4)

µ is the average voxel value, σ2 is the variance, and σIMRI is the covariance of IM and RI. Additionally, c1 and c2 are variables used to stabilize the division and depend on the dynamic range of the pixel or voxel values [19].

Dosimetry

Kidney dosimetry was performed for the 16 patients in the test group, and for all reconstruction sets. Bone marrow dosimetry was performed for 15 patients, among whom 6 patients had confirmed bone metastases. 1 patient with severe bone metastases involvement was excluded due to the strong influence of uptake in the metastases on the nearby bone marrow cavities. The dosimetry was based on reconstructed SPECT images at days 0, 1, 2, and 7 post-administration, and biexponential curve fits were used for the kinetics of the activity concentrations in segmented volumes of interest (VOIs). The kidney VOIs were manually delineated on CT images and for the bone marrow, 4 ml spherical VOIs were used [20, 21], to mitigate the impact of the partial volume effect. The sphere VOIs were placed in the CT images inside the vertebras T9 – L5 (the interval included in the FOV) and were manually modified in a few cases to avoid bone metastases or calcifications. Hemmingsson et al. has shown that the red marrow has a specific uptake of Lu-177-DOTATATE, hence, a volume fraction of 0.57 (mean of men and women, and of lumbar- and thoracic vertebras) was used to scale the activity concentration in the vertebra VOIs (which also contain yellow marrow and trabecular bone) [15]. The VOIs for each time point were used for all reconstruction sets. Calibration factors for each camera were used. For kidney dosimetry, specific recovery coefficients (RCs) for each reconstruction method were estimated to correct the kidney activity concentration for partial volume effect. The RCs were calculated using MC simulations of raw data (120P) from a typical kidney VOI. The VOI was retrieved from a patient and filled with a uniform, known activity and the MC simulations were executed using the patient’s CT images. The data were down-sampled to 30P, and the CUSIPs were used to compile the three CUSIP projection sets. All five projection sets were reconstructed with SARec OSEM, and the activity within the kidney VOI for each reconstruction set was established and compared to the known activity. For the bone marrow the partial volume effect was disregarded as the sphere was placed in a homogenous surrounding. The time-integrated activity concentration was determined by integrating the curve-fitted bi-exponential function from time zero to infinity. When calculating the absorbed dose to kidneys, local energy deposition of the electrons was assumed and the dose contribution from photons was disregarded. The absorbed fraction for the red bone marrow was set to 0,65 (mean of men and women) [15]. All dosimetric calculations and respective figures were performed and produced with MATLAB version: 9.11.0 (R2021b) (The MathWorks Inc, Natick, Massachusetts, United States of America).

Statistical analysis

Statistical analyses were performed using IBM SPSS Statistics version 29 (IBM Corporation, Armonk, New York, USA). For the quantitative measures, one-way ANOVA for repeated measures was performed with adjustment for multiple comparisons (Bonferroni). A P value of < 0.05 was considered to indicate statistical significance. For the kidney and bone marrow dosimetry, dependent samples t-tests were used with Bonferroni adjustment of the significance level (P value < 0.0125) for 4 consecutive tests.

留言 (0)

沒有登入
gif