Progressive 3D biomedical image registration network based on deep self-calibration

Introduction

The development of high-resolution light microscopy, sparse labeling techniques, neuronal tracking methods, and several advanced microscopic imaging pipelines have made it possible to map the entire mammalian brain at single-cell resolution, such as the fluorescence micro-optical sectioning tomography (fMOST), the light-sheet fluorescence microscopy (LSFM), and the serial two-photon tomography (STPT). However, those biological scans captured by different imaging pipelines at different locations or periods show certain differences in voxel intensity, image texture, and brain anatomy, which make it difficult to explore biological working mechanisms from multiple sources of information. In order to make full use of these precious data resources, it is important to align the anatomical structures of those scans into one coordinate system by image registration. In particular, deformable image registration (DIR) can handle local non-linear deformation of biological organs and has become an important technology in biomedical image processing and analysis.

A practical image registration generally includes a global linear registration and a local non-linear registration. The global linear registration always consists of a series of linear transformations, such as scaling, translation, and affine transformation, to achieve the alignment of the main structure of images. Then, the remaining local non-linear deformations are aligned by DIR algorithms. In the past few years, researchers have proposed many DIR methods. The traditional registration methods such as Elastix (Klein et al., 2010) and ANTs (Avants et al., 2009) aimed at optimizing a pair of images by continuous iteration and promoting the smoothness of the registration mapping relationship at the same time. They are effective and the registration task of various biological organs can be completed without training specific models. Recently, with the widespread application of deep learning in the field of computer vision, deep learning-based biomedical image registration algorithms have become a hot and attractive research direction (Litjens et al., 2017; Haskins et al., 2020). The current deep learning-based biomedical image registration algorithms can be divided into supervised learning methods and unsupervised learning methods on whether the training labels (ground truth) are required. The registration accuracy of supervised learning-based registration methods can be improved with the accuracies of supervised labels. However, labeling biomedical images is usually time-consuming and laborious, which promotes the development of unsupervised learning methods.

Unsupervised registration methods always utilize the training data after global linear registration to learn the local non-linear deformation displacement vector field (Balakrishnan et al., 2018; Zhang, 2018; Qu et al., 2021), which usually focuses on describing small local deformations. Most of the existing methods cannot handle large deformation due to the smooth constraints imposed on the displacement vector field during the training phase. Although some cascaded registration methods (Cheng et al., 2019; Zhao et al., 2019a,b) and multi-scale registration methods (Kim et al., 2021; Shao et al., 2022) have been proposed to alleviate this issue, these still have some problems. For example, the multi-stage alignment process of cascaded methods may cause a serious information loss problem. Figure 1 shows the registration process of a three-stage cascaded registration network. The moving image is aligned onto the fixed image step by step through three registration networks by which the large deformation is transformed into three smaller ones. Although the large deformation problem is alleviated by this method, the information loss caused by interpolation is unavoidable at each stage of the warping process. As shown in Figure 1, as the number of interpolation increases, the structural boundary of the warped image in the orange box becomes more and more blurred. Furthermore, the multi-stage cascaded approach also tends to cause error accumulation. In addition, existing progressive registration methods (Cheng et al., 2019; Zhou et al., 2020; Kim et al., 2021; Zhang et al., 2021) integrate the displacement vector field by direct addition. However, the same position on the two adjacent displacement vector fields may not be the displacement of the same corresponding point, so it is unreasonable to obtain the total deformation field by directly summing the multiple deformation fields.

FIGURE 1

www.frontiersin.org

Figure 1. The example of three-stage cascaded network registration methods. Three different colored modules represent registration models with the same structure but no shared parameters.

In this article, we aim to leverage the progressive registration idea of cascaded networks to deal with large deformations without causing information loss and increasing network parameters. Specifically, we propose a progressive registration network based on a deep self-calibration strategy. The main contributions of this article can be summarized as follows:

(1) We propose a novel progressive registration strategy to reduce the cascaded information loss, in which an iterative registration strategy is implemented to decompose large deformation into small ones in multiple iterations. To avoid the cascaded information loss, we design a new displacement vector field integration algorithm to integrate the learned displacement vector field in each iteration into the total displacement vector field, and then the finally warped image can be generated with this total displacement vector field in one interpolation process.

(2) We propose a new hierarchical registration strategy to achieve a fast and coarse-to-fine progressive registration to improve registration accuracy. The progressive registration strategy is implemented on a cascaded network with one low-resolution and one original-resolution network. With this hierarchical registration deployment, the images can be aligned on multiple scales to improve registration accuracy. Moreover, the low-resolution deformation field quickly learned by the low-resolution network can be upsampled and used as the initial value for the high-resolution network. Since the coarse-grained alignment of images is already done by the low-resolution network, using this deformation field can greatly improve the convergence speed of the high-resolution network.

(3) The proposed progressive registration strategy can generate abundant training data during the training phase. By setting the number of iterations, the amount of training data generated can be controlled to achieve a dynamic data augmentation, thus effectively alleviating the limitations caused by the lack of training data.

The rest of the article is organized as follows. We review the related works about deformable biomedical image registration in Section Related works, followed by specifically introducing our proposed method and the detailed network structures in Section Method. Then the datasets, comparison methods, implementation, and evaluation metrics are described in Section Experiments configurations. The results are presented and analyzed in Section Results and analysis. Finally, Section Conclusion gives the conclusion of this work.

Related works Traditional image registration methods

Traditional image registration methods here mainly refer to those non-learning-based algorithms, including intensity-based methods and landmark-based methods. Several popular non-learning-based methods have been designed for deformable registration such as elastic mapping (Bajcsy and Kovačič, 1989), Demons (Thirion, 1998), and HAMMER (Shen and Davatzikos, 2002). Diffeomorphic methods have also made a significant achievement in different computational tasks while preserving topology such as the large diffeomorphic distance metric mapping (Oishi et al., 2009; Zhang et al., 2017) and standard symmetric normalization (SyN) (Avants et al., 2008). Moreover, there were many existing biomedical registration pipelines such as the aMap (Niedworok et al., 2016), ClearMap (Renier et al., 2016), MIRACL (Goubran et al., 2019), ANTs (Avants et al., 2009), Elastix (Klein et al., 2010), and BIRDS (Wang et al., 2021).

Most of the abovementioned methods solve the registration task by iteratively exploring the space of potential transformation parameters based on a predefined objective function, which are always computationally intensive and time-consuming, and such characteristics also prevent these methods from being used in real-time clinical applications.

Learning-based image registration methods Supervised registration methods

Supervised learning-based registration methods need real or generated deformation fields as training labels to guide the network training. Dosovitskiy et al. (2015) proposed the FlowNet for 2D MRI brain registration, which utilizes statistical appearance models to generate ground truth to guide network training. Cao et al. (2017) extracted the patches of image pairs and used the spatial transformation relationship of corresponding patches generated by SyN (Avants et al., 2008) as labels to train the network. Ito and Ino (2018) utilized a convolutional neural network (CNN) to learn plausible deformations for ground truth generation. Fan et al. (2019) proposed the BIRNet which uses hierarchical and dual-supervised learning to predict the deformation field for 3D brain MR registration. Ni et al. (2021) proposed the DeepMapi by designing a sub-feedback strategy and a hierarchical registration strategy for 3D fMOST mouse brain registration.

While supervised learning methods have achieved considerable registration accuracy under the guidance of ground truth and high-quality synthetic labels, the difficulty in collecting label information greatly restricted their applications.

Unsupervised registration methods

The proposal of the spatial transformer network (STN) (Jaderberg et al., 2015) has led to the rapid development of unsupervised registration methods. Balakrishnan et al. (2018) proposed the VoxelMorph framework to implement a fast unsupervised end-to-end registration network. Zhao et al. (2019b) developed the VTN which achieves both end-to-end affine and non-rigid registration by cascading affine subnetworks. Zhao et al. (2019a) designed the recursive cascaded network which improves the registration performance by sequentially warping the image pairs through multiple cascaded sub-registration networks. Cheng et al. (2019) employed the U-Net cascaded separable convolution to complete large deformation and small deformation step by step. Zhang (2018) proposed the ICNet which adds the inverse-consistent constraint and anti-folding constraint to the loss function of the network in order to avoid local overlapping of deformation fields. Kim et al. (2021) devised CycleMorph which utilizes the topology consistency before and after registration to constrain the network and adopts the multi-scale and global-to-local registration strategy. Qu et al. (2021) designed the TIUNet to apply the inter-image deformation consistency constraint to network training.

Avants et al. (2008) declaimed that dice scores above 0.6 for smaller structures and 0.8 for larger structures are considered to be well-aligned. On this basis, we define the initial dice score of small structures in the data lower than 0.6 as large deformations and higher than 0.6 as small deformation. We call the initial dice score of larger structures in the data lower than 0.8 as large deformations and higher than 0.8 as small deformations. Existing unsupervised registration methods focus on aligning small local deformations between pairs of images. Therefore, these methods generally require global linear registration pre-processing to eliminate large linear deformations. However, there inevitably exist some large non-linear deformations in the pre-processed images due to imperfect pre-processing. Solving this large non-linear deformation is critical to improve the performance of the registration algorithm.

Method

The goal of the learning-based unsupervised registration method is to estimate the non-linear transformation between two or more pairs of images without supervised information. The general objective function of the unsupervised registration method can be formulated in Equation (1),

L(M,F,DVF)= Lsim((M∘ϕ),F)+ α * Lsmooth(DVF)    (1)

where M and F represent the moving and fixed image, respectively. DVF indicates the displacement vector field and ϕ represents the deformation field. The relationship between DVF and ϕ is formulated in Equation (2),

ϕ(x,y,z)=(x,y,z)+ DVF(x,y,z)    (2)

Additionally, Lsim represents the similarity loss between the warped image M ∘ ϕ and the fixed image F, and the smoothness constraint Lsmooth is utilized to prevent the over-distortion of the area and too sharp displacement vector field. α is a weight parameter that controls the proportion of Lsmooth in the total loss function. Since the smooth constraint is achieved by constraining the difference between adjacent points in the displacement vector field, it constrains the displacement between the large deformed anatomical region and its surrounding smooth region, thus limiting the ability of general one-shot deformable registration methods to perform large deformation registration.

The overview of our proposed method is shown in Figure 2. In order to solve the large non-linear deformation, we have designed three tailored registration strategies, including: (1) the progressive registration strategy, (2) the hierarchical registration strategy, and (3) the matching loss function. We will detail these novel designs in the rest of this section.

FIGURE 2

www.frontiersin.org

Figure 2. The pipeline of the proposed method. (A) Schematic diagram of the progressive registration strategy. (B) The structure of the yellow shared parameter module in (A). (C) Schematic diagram of orange AF module in (A). (D) Schematic diagram of total registration strategy.

Progressive registration strategy

Figure 2A demonstrates the progressive registration framework of the proposed method. A group of convolutional neural network (CNN) modules with shared weights are conducted to generate multi-stage deformation fields. Meanwhile, the aggregate flow (AF) module is designed to integrate multi-stage deformation fields into a total deformation field.

Shared weight module

Each yellow module in Figure 2A is a UNet-based registration module with shared parameters. The shared parameters are used iteratively in the non-training stage. The structure of the shared weight module is shown in Figure 2B. The displacement vector field between the input image pair can be learned by this module. The backbone of this module is a UNet as shown in Figure 3 which has been successfully applied to a variety of biological image segmentation and registration networks. The network consists of a decoder and an encoder with skip connections. In the encoder, each resolution stage has one 4 × 4 × 4 convolution layer, and the stride is set to 2 to down-sample the feature maps between each stage. In the decoder, each resolution stage has one upsampling layer and 3 × 3 × 3 convolution layer with a stride of 1. And the last two stages have two 3 × 3 × 3 convolution with a stride of 1 for finer estimation in detail. Each CONVblock is followed by a LeakyReLU. Finally, a 3 × 3 × 3 convolution without LeakyReLU is utilized to estimate a DVF with three channels for the deformation of each voxel in the x, y, and z-directions.

FIGURE 3

www.frontiersin.org

Figure 3. Overview of the CNN framework used in our proposed method.

Aggregate flow module

Inspired by the invertibility loss formula in VTN (Zhao et al., 2019b), we propose the aggregate flow module. The detailed structure of the AF module is shown in Figure 2C. The inputs of the Aggregator in Figure 2C include Flow(n) and Flow(n+1)(n ≥ 0) which stand for the displacement vector fields output by the network for two adjacent iterations. The function of the Aggregator is to integrate the displacement vector fields obtained from two adjacent iterations into a total displacement vector field. In the following formula, we use Flow_Agg(n, n+1) to represent the total displacement vector field formed by integrating Flow(n) and Flow(n+1) with the Aggregator. And we adopt F(x, y, z) and M(x, y, z) to indicate the pixel point with the spatial coordinate of (x, y, z) on the fixed and moving image, respectively. I represents the displacement vector field (flow), and I(x, y, z) indicates the pixel point in the displacement vector field whose spatial coordinate is (x, y, z). ϕ represents the deformation field, and ϕIn indicates the deformation field obtained by inputting Flow(n) into STN. The symbol ∘ represents the warp operation, and M ∘ ϕ indicates the moving image warped by the deformation field ϕ. The aggregator can be expressed as (In+1+(In ∘ ϕn+1)), and its specific derivation process is detailed as follows.

Ideally, the moving image can be warped by the deformation field (ϕ) to obtain the same image as the fixed image as formulated in Equation (3),

F(x,y,z)=(M∘ϕ)(x,y,z).    (3)

Combining Equations (2) and (3), we can obtain,

      (M∘ϕn)(x,y,z)=M((x,y,z) ′),      (x,y,z) ′=(x,y,z)+In(x,y,z),(M∘ϕIn)(x,y,z)=M((x,y,z)+In(x,y,z)).    (4)

And from Equation (4), we can derive,

                ((M∘ϕIn)∘ϕn+1)(x,y,z)            =(M∘ϕIn)((x,y,z)+In+1(x,y,z))=M[(x,y,z)+In+1(x,y,z)+In((x,y,z)+In+1(x,y,z))].    (5)

The combination of Equations (4) and (5) can finally be rewritten as Equation (6),

M[(x,y,z)+In+1(x,y,z)+(In∘ϕIn+1)(x,y,z)]=M[(x,y,z)+[In+1+(In∘ϕIn+1)](x,y,z)]       =M∘ϕ([In+1+(In∘ϕIn+1)](x,y,z))    (6)

Therefore, from the above formula derivation, the Flow_Agg(n, n+1) can be equivalent to (In+1+(In∘ϕn+1)).

The above formulae are extended to integrate the output Flow of n iterations of the network into a total displacement vector field Flow_Agg(0, …, n), as shown in Equation (7),

Flow_Agg(0,1)=I1+(I0∘ϕ1),Flow_Agg(0,1,2)=I2+(Flow_Agg(0,1)∘ϕ2),…Flow_Agg(0,…,n)=In+(Flow_Agg(0,…,n-1)∘ϕn).    (7)

Therefore, the Flow_Agg(0, …, n) can be gradually obtained with n-1 AF Module integrations.

Elaboration of strategy

Figure 2A is the flow chart of our proposed progressive registration strategy. We iteratively use the registration module with shared parameters and complete the large displacement in space by multiple iterations without adding network parameters. The progressive registration strategy is detailed as follows.

Initial registration (zero iteration of the registration): Input the moving image and the fixed image into the shared weight module to generate Flow(0). Then, the moving image and Flow(0) are inputted into the STN to get the initial registration result, which is denoted as Warped(0).

The first iterative registration: The Warped(0) is utilized as the new moving image and inputted into the shared weight module with the fixed image again to generate Flow(1). Then, the AF Module is conducted to integrate the displacement vector fields Flow(0) and Flow(1) to synthesize a total displacement vector field Flow(0, 1). Subsequently, the moving image is warped to Warped(0, 1) by the Flow(0, 1).

The second iterative registration: The Warped(0, 1) and the fixed image are inputted into the shared weight module again to obtain Flow(2). Then, the Flow(0), Flow(1), and Flow(2) are integrated into a total deformation field Flow(0, 1, 2) by the Equation (7). Same as the previous iteration, the moving image is warped into Warped(0, 1, 2) by the Flow(0, 1, 2).

By extending the above procedure to a general form, we can obtain Warped(0, …, n) by a total deformation field Flow(0, …, n) in one interpolation process. We need to set the number of iterations of n_train and n_test of the network before the training and testing procedures which are independent of each other. The optimal numbers of n_train and n_test in this article are determined by comparing the experimental results.

Hierarchical registration strategy

Figure 2D shows the overall registration architecture of our proposed method. We conduct two iterative registration networks with the same structure (as shown in Figure 2A) to build a cascade and multi-scale training. This multi-scale architecture has two advantages for fast and accuracy registration. First, the low-scale network has a larger field of perception, so it is capable of handling large deformations. Additionally, the smaller input size of the low-scale network leads to a smaller number of parameters, which could achieve coarse-grained fast registration. Second, we up-sample the deformation field generated from the low-scale network as the initial value of the original-scale network, which will facilitate the speed convergence of the original-scale network.

Specifically, as shown in Figure 2, the Down Sample (M) and Down Sample (F) are the images obtained by down-sampling the moving and fixed images. We cascade the Down Sample (M) and Down Sample (F) into the low-scale registration network, that is, the blue Base Module in Figure 2D. Then, the low-scale image pairs are registered by the progressive registration strategy as proposed in Section Progressive registration strategy. We denote the final output of the low-scale network as flow-low scale. Then, the flow-low scale is upsampled two times and each pixel value is enlarged to two times the original value to obtain the flow-up sample, as formulated in Equation (8), where Up2 represents the double upsampling,

FlowUp Sample=Up2(FlowLow Scale)×2    (8) Loss function design

The proposed loss function consists of similarity loss and smooth loss. Specifically, the similarity loss is utilized to restrict the alignment between the warped moving image and its corresponding fixed image. We employ the local normalized cross-correlation loss (LLNCC) as the similarity loss. The smooth loss is utilized to prevent excessive deformation. We utilize spatial gradient regularization as the smoothness loss (Lsmooth) of the displacement vector field.

We use the function G to indicate the shared weight module, whose input is the registration image pair, and the output is the three-channel displacement vector field (Flow). During the training process, assuming that the number of iterations is n (n > 0), then, the smooth loss of Flow(n) can be formulated in Equation (9),

Flow(n)=G(Warped(0,…,n-1),F ),Lsmooth(Flow(n))=∑p∈Ω∥∇Flow(n)(p)

留言 (0)

沒有登入
gif