Domain generalization for retinal vessel segmentation via Hessian-based vector field

Deep learning approaches have shown extraordinary potential in delineating patterns in high-dimensional data. However, the deep models are not guaranteed to work well on out-of-distribution (OOD) data, which soundly confines their applications. This problem attracts more attention in medical image analysis since even the data of the same modality acquired from multiple sites may distribute diversely caused by different scanners and imaging protocols. Such variation can result in distinct contrast and resolution for the output images. Moreover, the discrepancy in imaging modality has even more impact on data distribution. In Fig. 1, we show examples illustrating such domain shifts. All these factors severely degrade the performance of deep models trained for downstream tasks such as semantic segmentation and disease diagnosis. Therefore, a domain generalization (DG) (Zhou et al., 2022) method is needed to enhance the robustness of the deep learning model on different data distributions.

Unlike other strategies such as transfer learning (Zhuang et al., 2020) and domain adaptation (Guan and Liu, 2021) that have been leveraged to tackle this problem, DG is defined under a more strict condition that the data from the target domain is entirely inaccessible during training. This turns out to be more realistic for medical images since we are unlikely to have any sample data from other clinical centers. Hence, some methods have been presented to solve the DG problem for medical image analysis, and most of these fall into three major categories. Firstly, there are data augmentation/generation based methods (Zhang et al., 2020, Lyu et al., 2022, Zhou et al., 2020). The training domain is expanded by applying hand-crafted perturbations to the training data or by leveraging adversarial models to generate new data that is out of the current domain distribution. The second approach is domain alignment, which can be done on either image or feature space. The image space alignment refers to harmonization, which is usually achieved by image-to-image translation (Zuo et al., 2021). In the latter type of approach, additional constraints, such as KL divergence (Li et al., 2020a) and adversarial regularization (Aslani et al., 2020), are imposed to align the feature space. The last type of method is meta-learning which is a general training strategy. After its introduction by Finn et al. (2017), the application of the episodic training paradigm has been extended to medical image analysis (Dou et al., 2019, Khandelwal and Yushkevich, 2020). In general, it mimics the condition when the model confronts data from a distribution unseen during training by dividing the source domains into meta-train and meta-test subsets. In this study, we investigate a workflow that marries the first two approaches to achieve DG for retinal vessel segmentation.

Although the retinal vessels are visualized with various appearances by different modalities such as fundus, OCT angiography, and fluorescein angiography, the tubular shape of the vessels remains a domain-agnostic feature that makes them recognizable for humans. The tubular shape, termed vesselness, has been mathematically modeled by a Hessian-based expression in Frangi et al. (1998). Even though the learning-based models have outperformed the traditional Frangi filter in many aspects, the vesselness feature remains relevant since it can describe the essential character of vessels regardless of data distribution. Bridging the conventional handcrafted approaches, such as the Frangi filter, and the completely data-driven deep learning algorithms can be a good solution to the domain generalization problem. From a higher level standpoint, for both human and deep models, having some well-established prior knowledge involved in the training can be a better solution than learning from scratch.

Therefore, we propose to leverage a handcrafted feature map inspired by the Hessian description of the tubular morphology so that the model can discern the vessels by recognizing their shape in addition to intensities. Unlike the scalar vesselness feature computed from the eigenvalues in the Frangi filter, we implement the secondary eigenvector of the Hessian at each pixel as our geometric feature. In this way, we transform the original intensity image into a vector field. Ideally, the vectors within the vessel will be homogeneously oriented along the vessel direction, mimicking the blood flow. While computing the Hessian, we optimize the standard deviation of the Gaussian filter by maximizing the vesselness value presented in Frangi et al. (1998) to adapt the Hessian to vessels of various thicknesses. We regard this vector field as a common domain for data in different distributions. Such a vectorized feature is particularly suitable for the transformer model based on the cosine similarity attention mechanism. Hence, we introduce a specific model architecture called vector field transformer (VFT) to effectively leverage the correlation between eigenvectors in different ranges of context for vessel segmentation in 2D images.

Additionally, to enhance the robustness of the VFT, we introduce a novel data augmentation method that can generate synthetic angiograms in various distributions without modifying the anatomical structure of interest. We implement a full-resolution variational auto-encoder (f-VAE) by setting the latent space to have the same width and height as the input image. This latent image is usually regarded as the extracted anatomical representation. This idea is widely leveraged in unsupervised segmentation (Liu et al., 2020, Hu et al., 2021) and representation disentanglement (Dewey et al., 2020, Ouyang et al., 2021). Since there is no direct supervision on the latent feature, we observe that the style of the latent images varies every time we re-train the f-VAE, while the vasculatures remain stable. This randomness is solely induced by the stochastic gradient descent (SGD) in training. Therefore, we treat the encoder as a synthetic network that generates augmented vessel maps.

In this paper, we implement both data augmentation and domain alignment to improve the generalizability of retinal vessel segmentation. To show the effectiveness of the proposed method, we test it on both cross-resolution and cross-modality scenarios. The experiments are conducted on six public datasets in different modalities, including color fundus and OCT angiography. A preliminary version of this work was presented at International Conference on Medical Imaging with Deep Learning (Hu et al., 2022). In this extended version, we present more in-depth analysis of our method and evaluate it on additional datasets. Our main contributions are:

A full-resolution variational auto-encoder (f-VAE) network that generates synthetic latent images for data augmentation (Section 3.1).

A Hessian-based vector field that serves as an aligned image space that delineates the morphology of vessels (Section 3.2).

A novel architecture with paralleled transformer blocks that helps to learn the local features in different scales (Section 3.3).

A comprehensive evaluation on public datasets which shows superior cross-resolution and cross-modality generalization performance (Section 4.5).

留言 (0)

沒有登入
gif