Unsupervised model adaptation for source-free segmentation of medical images

Semantic segmentation is an area of Computer Vision dedicated to identifying and labeling different parts of an image (Garcia-Garcia et al., 2018). Segmentation datasets often contain multi-channel images that need pixel-level labels assigned from several available semantic categories. In the presence of sufficient annotated images, a classification model can be trained to generalize the task of labeling. However, given the complexity of pixel-level labeling, deep neural networks have proven to be particularly effective in this task, especially convolutional neural networks (CNNs) (LeCun et al., 1995).

CNNs (Convolutional Neural Networks) are designed to exploit the structure of images through convolution filters. They have been successfully applied to various vision tasks, such as object tracking across scenes (Yilmaz et al., 2006, Bertinetto et al., 2016, Zhu et al., 2018), street image segmentation and autonomous vehicle control (Kim and Canny, 2017, Hecker et al., 2018, Stan and Rostami, 2021), and in the medical field, for tissue analysis and diagnosis (Ker et al., 2018, Shen et al., 2017, Ayache, 2017, Kazeminia et al., 2020). CNNs have properties similar to the visual system (Morgenstern et al., 2014) and offer human or super-human performance in vision tasks. However, this performance relies heavily on the availability of large amounts of annotated data. Data annotation, a laborious manual task, poses a major challenge in using deep learning (Rostami et al., 2018). The difficulty in training such networks arises from their large number of parameters. Empirically, deeper neural networks have outperformed their shallower counterparts, albeit at the cost of increased training difficulty (He et al., 2016). This reliance on training data makes CNNs sensitive to changes in input image distribution. Techniques like dropout (Baldi and Sadowski, 2013) and image augmentations (Shorten and Khoshgoftaar, 2019) have been developed to mitigate overfitting. However, the effectiveness of these approaches is limited to scenarios where the application domain (target domain) of the network is the same as the training domain (source domain). When the segmentation tasks remain the same but the source and target domains have different input distributions, such as between computer-generated and real-world street images, CNNs suffer considerable performance degradation.

Differences between source and target distributions, known as domain shift, can be naively addressed by accessing more labeled data. However, while this approach may be feasible for image classification tasks, semantic segmentation demands pixel-level annotations for labeled images, making the process both expensive and time-consuming (Liu et al., 2011). Furthermore, labeling medical image data necessitates the involvement of trained professionals, adding to the cost and complexity of the process. Unsupervised Domain Adaptation (UDA) is a field within AI that focuses on the challenge of model generalization to unseen input distributions. Its goal is to maintain performance without the need for re-labeling samples (Ghifary et al., 2016, Venkateswara et al., 2017, Saito et al., 2018, Zhang et al., 2018b, Rostami, 2021b, Liu et al., 2021b). To achieve this, UDA approaches strive to create a shared feature space between source and target embeddings. If successful, a classifier trained on the latent features from the source domain would be able to generalize to the unannotated target data. The creation of a domain-invariant feature space has been explored using adversarial learning (Hoffman et al., 2018, Dou et al., 2019, Tzeng et al., 2017, Bousmalis et al., 2017, Jian and Rostami, 2023), where source and target feature extractors are trained alongside a GAN discriminator loss (Goodfellow et al., 2020). Additionally, direct distribution minimization between the latent features of the source and target has also been employed to produce a shared feature space (Chen et al., 2019b, Sun et al., 2017, Lee et al., 2019, Rostami, 2022).

UDA benefits from having access to both source and target domains simultaneously (Choi et al., 2019, Sankaranarayanan et al., 2018, Rostami et al., 2023). This assumption however cannot always be met. In healthcare applications, sharing data is often difficult due to privacy regulations. To maintain the benefits from UDA, source-free adaptation has been developed to bypass the need for direct access to a source domain at adaptation time. While source-free UDA has been previously explored for image classification (Ganin and Lempitsky, 2015, Long et al., 2016, Kang et al., 2019, Rostami and Galstyan, 2023b) and street semantic segmentation (Van Gansbeke et al., 2021, Zou et al., 2018), there are few works addressing this same problem for medical images analysis (Bateson et al., 2022). Medical images are produced by electromagnetic devices such as MRI or CT scanners, which directly impact the input distribution of the image data. For example, CT imaging utilizes X-rays to create cross-sectional images, while MRI uses strong magnetic fields and radio waves (Ponsaing et al., 2007, Cho and Chang, 2011). While CT is better at visualizing bone and dense tissues, MRI is superior in visualizing soft tissues and organs, making it valuable for imaging the brain, spinal cord, and muscles. Despite its advantages, MRI is more expensive and less widely available than CT. Additionally MRI is a safer imaging method because CT exposes patients to ionizing radiation. As a result, the generated images by these and other modalities are different in terms of observed distributions. However, when the semantic classes are shared across two modalities, we may be able to benefit from transfer learning because both modalities measure the same physical work. Compared to natural image semantic segmentation, large portions of medical images are unlabeled. This makes directly applying segmentation algorithms designed for street semantic segmentation unsuitable, as we later show in our experimental section.

Contributions: We propose a new source-free semantic segmentation algorithm for medical images that relies on the idea of distributional distance minimization. After source training, we learn a sampling distribution to approximate the source latent embeddings. During adaptation we lose access to the source data, and utilize this distribution as a surrogate. We perform direct distributional alignment between target embeddings and the sampling distribution by using an optimal transport based metric.

We support our approach with a theoretical analysis demonstrating how our algorithm minimizes an upper bound on the target domain error, thereby ensuring robust adaptation performance.

Additionally, we evaluate our method on two challenging medical image segmentation tasks – cardiac and abdominal imaging – and demonstrate its competitive performance compared to current state-of-the-art methods, including those that do not operate under source-free constraints. Although our experiments consider CT and MRI data, our approach is more general and can be applied two any two modalities that share the same semantic classes, subject to using a proper segmentation architecture.

We note that this work is partially based on our prior work Stan and Rostami (2022a), presented at the British Machine Vision Conference.

Our source free UDA approach allows for several beneficial patient and workflow outcomes when applied in practical clinical settings. Among these:

We offer clinicians a way to leverage previously trained models that does not require extensive input from experts for data labeling. This may significantly reduce operational costs, and would allow for better deployment of human capital.

Our approach addresses one of the major concerns in healthcare data analytics: patient privacy. By eliminating the need for direct access to source data during the adaptation process, our method ensures that patient-sensitive information remains confidential. This feature is crucial for complying with stringent healthcare regulations like HIPAA and GDPR, making our approach highly suitable for clinical applications.

Our method allows for clinicians to receive the benefits of deep vision models in patient diagnostics even in resource-limited settings where annotating medical data may not be feasible. By leveraging pre-trained models and adapting them to local imaging conditions without the need of extensive labeled datasets, healthcare providers can harness the benefits of powerful vision models in improving diagnostic processes and patient care.

留言 (0)

沒有登入
gif