MRI recovery with self-calibrated denoisers without fully-sampled data

Plug-and-play based recovery

The MRI reconstruction entails estimating the underlying image from noisy and potentially undersampled k-space measurements. The measured noisy k-space data are related to the image via

$$\begin \varvec = \varvec\varvec + \varvec, \end$$

(1)

where \(\varvec\in }^N\) is a vectorized N-pixel image, \(\varvec\in }^M\) is the pre-whitened and scaled pMRI data from C receive coils, \(\varvec\in }^M\) is circularly symmetric zero-mean white Gaussian noise with known variance \(\sigma ^2\) that depends on the data scaling, and \(\varvec\in }^\) is a known forward operator that subsumes coil sensitivity maps, discrete Fourier transform, and k-space undersampling.

To reduce acquisition time, the k-space data are often prospectively undersampled to achieve an acceleration rate \(R\triangleq \frac > 1\). At high acceleration rates, the problem in Eq. 1 becomes ill-posed. In that case, a common remedy is to inject prior knowledge about \(\varvec\) using a regularizer, resulting in the optimization problem of the form

$$\begin \hat} = \mathop }\limits _} \frac\Vert \varvec - \varvec\varvec \Vert ^2_2 + \mathcal (\varvec), \end$$

(2)

where the first term enforces data consistency and \(\mathcal (\cdot )\) represents the regularizer. For CS-based MRI reconstruction, popular choices for \(\mathcal (\varvec)\) include total variation [33] and \(\lambda \Vert \varvec \varvec\Vert _1\), where \(\varvec\) represents a linear sparsifying transfrom, and \(\lambda > 0\) controls the regularization strength [3]. It has been shown that a simple sparsity-based regularizer may not fully capture the rich structure in medical images [34]. To leverage more complex priors, Venkatakrishnan et al. proposed an algorithmic framework called plug-and-play (PnP) [35]. In PnP, an off-the-shelf image denoiser, \(\varvec(\cdot )\), is called within an iterative algorithm, e.g., primal-dual splitting (PDS) [36]. A PDS-based implementation of PnP is given in Algorithm 1. In the subsequent sections, we will use this algorithm as a starting point to explain ReSiDe. Note, the implementation of PnP or ReSiDe is not limited to PDS and can be carried out using other algorithms [37].

Algorithm 1figure a

PnP implemented using PDS

ReSiDe-S

ReSiDe-S is a scan-specific technique that operates on a single set of undersampled measurements, \(\varvec\). A PDS-based implementation of ReSiDe-S is given in Algorithm 2. ReSiDe-S differs from PnP in the way the denoiser \(\varvec(\cdot )\) is trained. The denoiser used in PnP is either generic or pre-trained using an additional training dataset. In contrast, the denoiser in ReSiDe-S is progressively trained from the image being recovered. This is akin to blind compressed sensing [38], where both the image and the sparsifying transform are simultaneously learned during the reconstruction process. Following the work by Xu et al. [17], we propose training a DL-based denoiser by adding synthetic noise to the image being recovered. The denoiser training process is described in Line 3 of Algorithm 2. In summary, \(\varvec_t\) is an intermediate image at iteration t, and \(\mathcal _i[\varvec_t]\) represents the \(i^\text \) patch from \(\varvec_t\). For training purposes, \(\mathcal _i[\varvec_t]\) and \(\mathcal _i[\varvec_t] + \mathcal (\varvec, s_^2 \varvec)\) act as “clean” and “noisy” patches, respectively. Here, \(\mathcal (\varvec, s_^2 \varvec)\) represents complex-valued zero-mean white Gaussian noise with variance \(s_^2\). The denoiser \(\varvec(\cdot ; \varvec)\), parameterized by \(\varvec\), is trained using \(P\ge 1\) patches in a supervised fashion by minimizing the loss \(\mathcal (\cdot , \cdot )\), which measures the difference between the denoiser output and clean patches. Once the denoiser is trained, it is then used to denoise the intermediate image \(\varvec_t\) (Line 4 in Algorithm 2), but this time without the added noise. The process of training and applying the denoiser is repeated in each of T iterations or until some convergence criterion is reached.

The strength of the denoiser is controlled by \(s_^2\), which plays a role similar to that of regularization strength in CS, i.e., larger \(s_^2\) leads to smoother images and smaller \(s_^2\) leads to noisier but sharper images. As evident from our preliminary results [32], the value of \(s_^2\) also controls the speed of convergence, with larger values preferred in the earlier iterations to speed up convergence and smaller values preferred in later iterations to avoid over-smoothing of the recovered images. To adapt the value of \(s_^2\) over iterations, we propose using Morozov’s discrepancy principle [39, 40] (Line 6 and Line 7 in Algorithm 2). The discrepancy principle is based on the assumption that the squared residual norm, \(\Vert \varvec\varvec_t - \varvec\Vert _2^2\), monotonically increases with \(s_^2\), i.e., more aggressive denoising takes the image away from the least squares solution. By exploiting this monotonic relationship, the discrepancy principle updates \(s_^2\) such that the value of \(\Vert \varvec\varvec_t - \varvec\Vert _2^2\) approaches \(M\sigma ^2\), which is the expected value of the squared \(\ell _2\)-norm of \(\varvec\). This is accomplished by using \(\frac\varvec_t - \varvec\Vert _2^2}\) as a multiplicative correction term, \(c_t\). In each iteration, the value of \(s_t^2\) is multiplied with \(c_t\) to promote the ratio \(\frac\varvec_t - \varvec\Vert _2^2}\) to be one. The value of \(\alpha >0\) (Line 6 in Algorithm 2) controls the contributions of the corrective term, with larger values leading to a more rapid adjustment of \(\sigma ^2_t\). Optionally, a user-defined scalar \(\tau > 0\) can be used to provide further control over the final strength of the denoiser, with smaller values generating noisy but sharper images. Note, adjusting \(s_t^2\) directly is much more challenging as it needs to be varied over iterations, while the fixed value of \(\tau\) can be selected once and then kept constant for a given MRI application.

Algorithm 2figure bReSiDe-M

There are two major limitations of ReSiDe-S. First, it requires de novo training of a denoising network in each iteration, which is time-consuming and thus unrealistic for clinical deployment. Second, ReSiDe-S is a scan-specific method and strictly operates on a single set of undersampled measurements. However, for most MRI applications, more than one set of undersampled measurements is generally available. To reduce computation time at the time of inference and to leverage the availability of multiple sets of undersampled measurements, we propose ReSiDe-M, which is outlined in Algorithm 3. Here, the tilde annotation on the top of the symbol indicates it is a joint representation of \(K \ge 1\) sets of measurements. For example, \(\tilde}\), \(\tilde}\), and \(\tilde}^2\), indicate the forward operator, k-space measurements, and average noise variance, respectively, from K sets of measurements.

ReSiDe-M is implemented in two stages: training and inference. The training stage is similar to ReSiDe-S, where both image recovery and denoiser training happen in tandem. However, in contrast to ReSiDe-S, the denoiser training in ReSiDe-M is performed using multiple undersampled datasets. For \(K=1\), the training phase of ReSiDe-M is identical to ReSiDe-S. Although the training stage of ReSiDe-M can reconstruct images, its main objective is to store the resulting sequence of denoisers, parameterized by \(\_t\}_^T\). The second stage in ReSiDe-M is inference, where an unseen undersampled dataset is reconstructed using a PnP algorithm, with the denoising in the \(t^}\) iteration of PnP (Line 11 in Algorithm 3) performed using the denoiser \(\varvec(\,\cdot ;\varvec_t)\) trained in the first stage. The computational complexity of ReSiDe-M at the inference stage is similar to that of sparsity-based iterative CS methods. A high-level description of ReSiDe-M is given in Fig. 1.

Algorithm 3figure cFig. 1figure 1

A high-level description of ReSiDe-M. In the training stage (a), a convolutional neural network (CNN)-based denoiser is trained on patches from intermediate images \(\tilde}_t\) (Line 3 in Algorithm 3). The resulting sequence of denoisers is stored. In the inference stage (b), the reconstruction is performed using a PnP method, which conceptually alternates between data consistency and denoising steps. The denoising in b is performed by calling the denoisers from (a), without further training

Study I–brain MRI

In this study, ReSiDe-S and ReSiDe-M were evaluated on T1- and T2-weighted images from the fastMRI dataset [5]. For each contrast, twenty-two sets of multi-coil measurements were used. All T1 images were cropped to \(320\times 320\), and all T2 images were cropped to \(384\times 304\). Since the noise pre-scan was not available for the fastMRI data, noise pre-whitening was not applied. The multi-coil k-space data were compressed to eight virtual coils [41]. The data were retrospectively downsampled at \(R=4\) using two realistic Cartesian sampling masks, i.e., a 1D pseudo-random mask with a 32-line wide ACS region (M1) and a 1D random mask with a 32-line wide ACS region (M2). The M1 mask was kept fixed across training and testing sets of measurements, while a different M2 mask was randomly drawn for each training and testing instance. The coil sensitivity maps were estimated using ESPIRiT [42]. Sixteen out of the 22 sets of measurements were used for network training in ReSiDe-M and SSDU. Five measurement sets were used for performance evaluation and for comparing ReSiDe-S and ReSiDe-M with CS, PnP-BM3D, ConvDecoder [43], and SSDU [25]. The CS reconstruction employed \(\ell _1\) regularization in the wavelet domain and was implemented using SigPy [44]. The PnP-BM3D reconstruction was implemented using Algorithm 1, with \(\varvec(\cdot )\) in Line 3 of Algorithm 1 representing a call to the BM3D denoiser [45]. To benchmark against a well-established supervised learning method, we further compare ReSiDe-M with a VarNet model pre-trained on brain fastMRI data using rate-4 equispaced (ES) sampling and 8% ACS, resulting in a net acceleration rate of 3.2 [7].

For each method, one set of measurements was used for manual parameter tuning to maximize peak SNR (PSNR). These parameters included: regularization strength, \(\lambda\), for CS, denoising level for BM3D in PnP-BM3D, number of iterations and input size for ConvDecoder, cardinality of the loss mask and number of epochs for SSDU, and parameters \(\alpha\) and \(\tau\), number and the size of patches, and total number of iterations, T, for ReSiDe-S and ReSiDe-M. For CS, the reconstruction process was repeated with the regularization set at \(\lambda /2\) and \(2\lambda\) to assess the impact of regularization strength on the image quality.

Study II–MRXCAT perfusion phantom

Twenty-two sets of perfusion image series from the MRXCAT digital phantom were considered in this study [46]. Each set of measurements was cropped to \(112 \times 168\) pixels, with 32 frames and 4 receive coils. All data were retrospectively downsampled at \(R=4\) using a 1D pseudo-random Cartesian sampling mask (M3) [47]. Due to the interleaving nature of M3, the ACS region was not acquired for individual frames, and the fully sampled time-averaged k-space was used to estimate coil sensitivity maps. To simulate realistic data, circularly symmetric zero-mean white Gaussian noise was added to k-space measurements to generate a signal-to-noise ratio of approximately 12 dB. Sixteen sets of measurements were used to train ReSiDe-M, and five sets of measurements were used for performance evaluation and for comparing ReSiDe-S and ReSiDe-M with CS and PnP-BM4D. The CS reconstruction employed \(\ell _1\) regularization in the spatiotemporal wavelet domain [48]. The PnP-BM4D reconstruction was implemented using Algorithm 1, with \(\varvec(\cdot )\) in Line 3 of Algorithm 1 representing a call to the BM4D denoiser [49]. As described in the previous study, one set of measurements was used to manually optimize free parameters in each method.

Study III–first-pass perfusion

This study included 22 first-pass perfusion image series from patients clinically referred for a cardiac MRI exam at our institute. All measurements were performed on a commercial 1.5T scanner (MAGNETOM Sola, Siemens Healthcare, Erlangen, Germany) with a fast low angle shot (FLASH) sequence using echo-planar imaging (EPI) readout. The data were collected in three different views, i.e., short-axis, two-chamber, and four-chamber views. The other imaging parameters were: flip angle 25 degrees, temporal footprint 75.48 to 99.36 ms, matrix size \(144\times 108\) to \(144\times 144\), field of view \(360 \times 270\) to \(420 \times 380\), echo train length 4, echo spacing 6.06 to 6.29 ms, slice thickness 8 to 10 mm, and a number of frames 60. The images were prospectively undersampled in the \(k_x\)-\(k_y\) domain with an acceleration rate of two and uniform undersampling that was interleaved across time. Noise pre-whitening was applied by using the noise pre-scan, and the multi-coil k-space data were compressed to eight virtual coils [41]. Sixteen sets of measurements were used to train ReSiDe-M, and five sets of measurements were used for performance evaluation and comparison with PnP-BM4D and CS with \(\ell _1\) regularization in the spatiotemporal wavelet domain [48]. Again, one set of measurements was used to manually optimize free parameters.

Quality assessment

In Studies I and II, where the fully sampled reference was available, image quality was assessed using the structural similarity index (SSIM) and PSNR in dB, defined as \(20\log _\left( \sqrt|\varvec|_\text / \Vert \varvec - \hat}\Vert _2\right)\), with \(|\varvec|_}\) being the maximum absolute value in \(\varvec\). For Study III, where the fully sampled reference was not available, each perfusion image series was blindly scored by three expert reviewers, including two cardiologists, each with more than ten years of experience in cardiac MRI. Each image series, presented as a movie, was scored on a five-point Likert scale (1: non-diagnostic, 2: poor, 3: adequate, 4: good, 5: excellent) in terms of overall image quality.

Implementation of ReSiDe

In Study I, randomly positioned \(P=576\) patches and \(P=2,\!306\) patches were used to train the denoiser in ReSiDe-S and ReSiDe-M, respectively. For ReSiDe-M, the \(2,\!306\) patches were evenly distributed across the 16 training images. The patch size was fixed at \(64 \times 64\). For Studies II and III, randomly positioned \(P=288\) patches and \(P=4,\!608\) patches were used to train the denoiser in ReSiDe-S and ReSiDe-M, respectively. For ReSiDe-M, the \(4,\!608\) patches were again evenly distributed across the 16 training image series. The patch size was fixed at \(64 \times 64 \times 20\). In all three studies, the locations of patches were randomly shuffled from one epoch to the next. The mean squared error was used as a cost function to train the denoiser. The real and imaginary parts were split into two channels. The architecture of the denoiser is shown in Fig. 2. Each convolutional layer had 128 kernels of size \(3 \times 3\) for Study I and size \(3 \times 3 \times 3\) for Studies II and III. We used Adam optimizer with the learning rate \(10^\) for Study I and \(10^\) for Studies II and III. The training in ReSiDe-M was performed on an NVIDIA RTX 2080 Ti for Study I and an NVIDIA RTX 3090 for Studies II and III. For Study I, the measurement noise variance \(\sigma ^2\) was approximated from the outer fringes of k-space. For Study II, the noise was synthetically added; therefore, the value of \(\sigma ^2\) was precisely known. For Study III, the value of \(\sigma ^2\) was inferred from the data scaling factor applied after noise pre-whitening. The value of \(\tau\) was set at 0.65, 0.9, and 1.15 for Studies I, II, and III, respectively, and the value of \(\alpha\) was set at 0.1 for all studies. The value of \(s_0^2\) was set such that the initial SNR for the denoiser training was 5 dB. The number of iterations, T, for ReSiDe-S and ReSiDe-M was set at 80 for all three studies. Within each iteration, the denoiser was trained for a total of ten epochs. The code for ReSiDe-S and ReSiDe-M is available here https://github.com/OSU-MR/reside

Fig. 2figure 2

The denoiser architecture used in ReSiDe-S and ReSiDe-M. Here, “conv” represents 2D convolutional layer for Study I and 3D convolutional layer for Studies II and III

留言 (0)

沒有登入
gif