Self-supervised learning for medical image data with anatomy-oriented imaging planes

Medical image analysis (MedIA) has important clinical applications, including diagnosis (Silveira et al., 2009), quantitative analysis (Wei et al., 2013), prognosis (González et al., 2018), therapy planning (Jackson et al., 2018), and risk assessment (Klifa et al., 2010). Since manual analysis of medical image data in big amounts can be labor-intensive, time-consuming, and subjective, computer-aided automated methods are of great value. Benefiting from the progress of deep learning techniques, especially the deep neural networks (DNNs), automated MedIA methods have advanced remarkably in recent years (Litjens et al., 2017). However, to achieve satisfactory performance, DNNs often need large amounts of labeled data for effective training, which can be difficult and costly to obtain in practice.

Transfer learning is an effective technique when the data is insufficient for training DNNs from scratch (Tan et al., 2018), where the model is first pretrained on tasks with sufficient data and then fine-tuned on the target task with limited data and annotations. For transfer learning on natural images, pretrained models on the ImageNet (Russakovsky et al., 2015) are available for a variety of popular DNN structures and are routinely used nowadays. However, as the distance between the pretraining and target tasks plays a crucial role in transfer learning (Zhang et al., 2017), these models may become less effective when transferred to MedIA tasks due to the large gap between the two types of images. Therefore, researchers are confronted with a dilemma. On one hand, for effective transfer learning on medical images, models pretrained with medical images of highly relevant tasks are preferred. On the other hand, it is difficult to obtain large quantities of annotations for medical images for the pretraining. Fortunately, an emerging subfield of deep learning known as self-supervised learning (SSL) suggests a way out.

In SSL, the training tasks and supervision signals are defined by inherent properties of the data without any manual annotation (Jing and Tian, 2020). Therefore, SSL can pretrain the models on a big amount of unlabeled data with proper pretext tasks. To exploit unique properties of medical image data, various pretext tasks have been proposed. Jamaludin et al. (2017) proposed to train a Siamese network based on patient identity. More recently, Models Genesis (Zhou et al., 2019a) and Rubik’s cube series (Zhu et al., 2020) presented generic pretext tasks for medical image data, especially volumetric images. However, previous works rarely paid attention to medical image data with anatomy-oriented imaging planes, e.g., magnetic resonance imaging (MRI) of various organs and body parts such as heart, knee, and shoulder,2 which constitute a large portion of medical image data besides full 3D volumetric (e.g., CT) and 2D (e.g., X-ray) images. As such imaging planes are defined with respect to the anatomy of the imaged organ, pretext tasks that can effectively utilize this information are expected to be more relevant to potential target tasks (also known as downstream tasks) on the organ of interest than the generic ones.

In this work, we propose two pretext tasks for medical image data with anatomy-oriented imaging planes based on the spatial alignment relationship among multiple imaging planes. The first is to learn the relative orientation between the imaging planes. In clinical imaging, it is common to define anatomy-oriented view planes for obtaining comparable biometric measurements across populations. These are called standard view planes for a specific organ. For example, the neuro-imaging community defines the mid-sagittal plane in evaluation of pathological brains by estimating the departures from bilateral symmetry in the cerebrum (Stegmann et al., 2005). Similarly, standard views are used in cardiac magnetic resonance (CMR) imaging for quantification of cardiac volumetry, function, and blood flow (Kramer et al., 2020). A key component of acquiring these standard views is the identification of specific anatomical landmarks to prescribe the imaging planes. As a result, these imaging planes often intersect with each other at anatomically meaningful landmarks, and the intersecting lines can provide strong cues about the imaged organ. Therefore, we propose to predict the intersecting lines between the imaging planes by regressing a distance-based heatmap.

Our second pretext task is complementary to the first, which exploits the spatial relationship among parallel imaging planes (in contrast with that between intersecting ones). Specifically, we propose to regress the relative locations of the slices within a stack. To solve this task, the network must gain an understanding of the within-slice content (focused on the imaged organ) and the cross-slice context, thus is better prepared for potential downstream tasks on the specific organ. Closely related to our work, Zhang et al. (2017) proposed pair-wise ordering of slices extracted from volumetric scans, for the downstream task of fine-grained body part recognition. However, their pretext task may encounter difficulty in handling objects with symmetrical structures, whereas we solve this problem with a small yet effective alteration, i.e., centrosymmetric mapping, for wider applicability. Another difference is that we directly regress the relative slice locations with a single network, instead of ordering paired slices with a Siamese architecture. In addition, we further investigate multi-task SSL combining both of the proposed pretext tasks, to fully exploit the two types of complementary spatial relationships among the imaging planes.

In summary, the prominent contribution of this work is the proposal of two complementary pretext tasks for self-supervised learning of medical image data with anatomy-oriented imaging planes. We hypothesize that these tasks are more relevant to potential downstream tasks than general transformation-and-recovery based pretext tasks (Zhou et al., 2019a, Noroozi and Favaro, 2016) on such data, thus expected to lead to better transfer learning. Besides, they are conceptually straightforward and easy to implement. For evaluation, we conduct thorough experiments for imaging-based analysis of two different anatomical structures (heart and knee) and two representative downstream tasks (semantic segmentation and classification). We not only investigate the impact of the proposed pretext tasks on the downstream tasks but also study their learnability. The results indicate that the proposed pretext tasks lead to better transfer learning than other recently proposed competitors, empirically confirming our hypothesis.

留言 (0)

沒有登入
gif