TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction

Dynamic cardiac positron emission tomography (PET) myocardial perfusion imaging is more accurate in detecting coronary artery disease compared to other non-invasive imaging procedures (Prior et al., 2012). A dynamic frame sequence is obtained after the injection of the radioactive tracer rubidium-82 (82Rb), which lasts several minutes until the myocardium is sufficiently perfused. Regions of interest (ROIs) of the myocardium tissue and left ventricle blood pool (LVBP) are labeled from the reconstructed frames to collect time-activity curves (TACs) for the following kinetic modeling and the estimation of myocardial blood flow (MBF) and myocardial flow reserve (MFR). MBF and MFR quantification have demonstrated improved diagnostic and prognostic effectiveness (Sciagrà et al., 2021).

However, in dynamic cardiac PET, subject motion originating from respiratory, cardiac, and voluntary body movements can seriously impact both voxel-based and ROI-based MBF quantification (Hunter et al., 2016). On the one hand, the intra-frame motion can cause blurriness in reconstructed frames and incorrect activity measurements. On the other hand, the inter-frame motion introduces spatial mismatch across the dynamic frames, resulting in attenuation mismatch, incorrect TAC measurements, and subsequent MBF estimation errors (Koshino et al., 2012, Hunter et al., 2016, Shi et al., 2021). Current inter-frame motion correction approaches for dynamic PET include external motion tracking systems (Noonan et al., 2015, Lu et al., 2018), data-driven motion estimation algorithms (Ren et al., 2017, Feng et al., 2017, Lu et al., 2019, Lu et al., 2020), and non-rigid registration using conventional optimization (Mourik et al., 2009, Jiao et al., 2014, Guo et al., 2022b, Sun et al., 2022b) or deep learning-based methods (Zhou et al., 2021, Guo et al., 2022d, Guo et al., 2022c, Zhou et al., 2023, Guo et al., 2023c). However, few studies have specifically addressed the challenges in inter-frame motion correction for cardiac dynamic PET with the rapid tracer kinetics of 82Rb. In the early phases of tracer perfusion, 82Rb concentrates in the right ventricle blood pool (RVBP) first and subsequently in LVBP; in the late scans, 82Rb is well distributed in myocardial tissue. This significant change in tracer distribution can substantially complicate inter-frame motion correction since intensity-based frame registration typically depends on the resemblance between the two frames that are to be registered, especially between the early and late frames (Lu and Liu, 2020, Shi et al., 2021). In 82Rb dynamic cardiac PET, most existing motion correction research and clinical software focuses exclusively on the later frames in the myocardial perfusion stage (Burckhardt, 2009, Woo et al., 2011, Rubeaux et al., 2017, Lu and Liu, 2020). A blood pool isolation strategy was proposed for the blood pool phase (Lee et al., 2016) but not for the late myocardium phase frames. A frame registration method using normalized gradient fields instead of original intensities was proposed to narrow the gap, but unidentifiable boundaries in the transition phase frames and the usage of the blurred summed tissue phase frame as the reference might introduce additional errors to motion estimation (Lee et al., 2020). A motion correction framework under supervised learning was proposed for 82Rb dynamic cardiac PET under simulated translational motion (Shi et al., 2021), but the network requires the training of two separate models to address the variation between early and late frames, which can be computationally expensive, and inconvenient in clinical practice.

An alternative solution is using frame conversion to generate mapped early frames that appear similar in tracer distribution to the corresponding late frame to assist standard motion correction methods. Recent studies have indicated that utilizing modality conversion through image synthesis can enhance optimization in multi-modality image registration by simplifying it to intra-modality (Iglesias et al., 2013, Xiao et al., 2020). For instance, image synthesis methods have been utilized to convert magnetic resonance (MR) to computed tomography (CT) images (Roy et al., 2014, Cao et al., 2018), MR T1-weighted (T1w) to T2-weighted (T2w) images (Chen et al., 2017, Liu et al., 2019), and MR to X-ray mammography (Maul et al., 2021); however, few studies involve PET images.

Such medical image synthesis tasks have largely utilized convolutional neural networks (CNNs) (Sevetlidis et al., 2016, Liu et al., 2019, Zhou et al., 2020). In molecular imaging, CNNs have been successfully implemented in the generation of attenuation maps (Shi et al., 2019b, Shi et al., 2022, Chen et al., 2022), cross-tracer images (Wang et al., 2021), and parametric Ki images (Miao et al., 2023), mostly deploying the structure of a 3-D U-Net (Çiçek et al., 2016). Generative adversarial networks (GANs) (Goodfellow et al., 2020) and their derivatives are a group of CNNs including a generator and a discriminator which are trained in a process of competition under an adversarial loss. GAN-based image synthesis from MR T1w images has been implemented to generate CT images (Lei et al., 2019, Abu-Srhan et al., 2021), T2w images (Dar et al., 2019), and fluid-attenuated inversion recovery images (Yu et al., 2018). In nuclear medicine imaging, recent works have investigated attenuation map generation for single-photon emission computed tomography (SPECT) (Shi et al., 2020) and the direct SPECT attenuation correction deploying conditional GANs (cGANs), a GAN variation with conditional restraint for specific mapping. A 3-D cGAN with self-attention and spectral normalization mechanism was proposed to synthesize brain PET from MR in an Alzheimer’s disease database (Lan et al., 2021). A cGAN with visual information fidelity loss was developed for generating synthetic CT images for the attenuation correction of small animal PET (Li et al., 2021). A short-to-long acquisition conversion module using a cGAN was included in a motion correction and reconstruction framework for accelerated PET (Zhou et al., 2023). In CT-less attenuation correction, one of the newest works utilizes a vanilla cGAN for SPECT-to-PET translation (Kawakubo et al., 2023). In PET motion correction, one of the newest works adapts a baseline U-Net as their network structure (Reimers et al., 2023). In myocardial perfusion SPECT denoising, current work deploys a vanilla GAN structure, without the awareness of temporal or anatomical information (Sun et al., 2022a). Many of the recent image translation studies in nuclear medicine imaging use image information as the input to U-Net-based or GAN-based models, without considering tracer dynamics.

In dynamic PET synthesis, the state-of-the-art work to improve inter-frame motion correction deployed a cGAN to convert low-count early frames to the high-count late frame of brain (Sundar et al., 2021a) and total-body dynamic scans (Sundar et al., 2021b). In this way, the low-count limitation and the different tracer uptake patterns of early frames are addressed, and the motion correction using the standard multi-scale mutual information method is successfully improved. However, this approach involves training one-to-one mappings between each specific early frame and the reference frame, which may not generalize well to new acquisitions and can be challenging to implement in clinical practice settings. Additionally, the backbone of the generator is a simple 3-D U-Net, and the tracer kinetics and related temporal analysis are not incorporated in network training. This method was originally developed on 2-deoxy-2-[18F]fluoro-D-glucose (FDG) PET scans, while 82Rb kinetics is substantially more rapidly changing than FDG. This is a potential limitation of their approach when directly applied to 82Rb scans.

Recently, feature-wise linear modulation (FiLM) (Perez et al., 2018) has been reported to be effective in encoding conditional information and vision reasoning. FiLM layers have been incorporated into GAN models to encode text or semantic information for natural images, specifically in fashion image generation (Ak et al., 2019, Mao et al., 2019, Ak et al., 2020). A text-to-image synthesis model using GAN with FiLM incorporating text input was proposed (Tao et al., 2022). In 3D-aware image synthesis, FiLM was also used for encoding positional information (Chan et al., 2021). In style transfer, recent works encode language guidance (Günel et al., 2018) as well as style and content latent code (Kwon and Ye, 2021) into the networks. For image-to-image translation, AlBahar and Huang (2019) proposed a bi-directional feature transformation between the input image and the guidance. In the field of medical images, the FiLM layer was proposed to process metadata for MRI generation (Rachmadi et al., 2019). In MRI harmonization, Ren et al. (2021) proposed to employ a semantic extractor to process segmentation masks as the structural embedding for feature modulation. In MRI registration, Dey et al. (2021) proposed to use the FiLM layer as a conditional embedding in a GAN for deformable template generation. However, such a structure has not been proposed for either tracer kinetics information encoding or dynamic PET synthesis. Additionally, in the aspect of structure design, most of the current works use FiLM to introduce conditional embeddings layer-wise, which might largely increase the model size and memory consumption.

Thus, in this work, we introduce a new framework named Temporally and Anatomically Informed GAN (TAI-GAN), an all-to-one mapping approach that converts all early frames into those with the appearance of the last reference frame. To provide the temporal information to the generator, we incorporate a FiLM layer that embeds channel-wise parameters generated from the temporal frame index and blood pool TACs. To inform the network with auxiliary anatomical locators, we utilize rough segmentation masks of RVBP, LVBP, and myocardium with local shifts as the additional image input channel. TAI-GAN is the first work for dynamic cardiac PET frame conversion that addresses the challenges of both high tracer distribution variability and spatial mismatch by incorporating both temporal and anatomical information, effectively handling the variations in tracer distribution over time and ensuring accurate spatial alignment in the generated frames, for a different, novel application in nuclear medicine imaging. Compared to other layer-wise embedding techniques, our work encodes FiLM only at the bottleneck of the U-Net, which is an innovative and efficient way of feature modification. Using a 5-fold cross-validation, we evaluated TAI-GAN in terms of frame conversion similarity, motion correction accuracy, and MBF quantification errors.

留言 (0)

沒有登入
gif