Standardization of ultrasound images across various centers: M2O-DiffGAN bridging the gaps among unpaired multi-domain ultrasound images

Medical ultrasound imaging, a noninvasive, radiation-free and ubiquitous modality, has proven to be an important component in the investigation of organs, tissue functionality and health (Carovac et al., 2011, Linte et al., 2010). Recent advances in deep learning have significantly improved the state-of-the-arts across medical image analysis tasks, such as lesion detection, segmentation and classification (Guo et al., 2022, Kim et al., 2021a, Kim et al., 2021b, Liu et al., 2019, Qi et al., 2021). Computer-aided analysis greatly facilitates clinical diagnosis. These achievements, to a great extent, are attributed to the availability of large-scale labeled data for supervised learning (Xu et al., 2018). The data-driven researches render higher requirements on the uniformity of image datasets.

However, due to biases caused by imaging devices, imaging frequencies and etc., variation including gray distribution, contrast, and etc. is commonplace among ultrasound datasets acquired from different medical centers. To wit, they distribute in diverse domains (Xia et al., 2022). As illustrated in Fig. 1, due to the presence of domain shift (Quiñonero-Candela et al., 2009), the performance of an optimized model tailored for a specific center (the target domain) tends to degrade heavily in newly introduced protocols or scanners (source domains), because the image feature differences caused by imaging settings may exceed that induced by the pathological characteristic itself (Gao et al., 2019). One natural solution is to redesign traditional workflows or retrain deep learning networks for the source domain datasets, but it is burdensome and time-consuming. To mitigate the model damage caused by the domain shift, multi-source domain transformation (MSDT) provides a promising solution. It eliminates gaps between different domains by translating the data from multi-source domains into realistically target-domain-looking images thus can be handled by target-domain-specific models. By minimizing domain distribution discrepancy, the models trained on the target domain can then be directly applied to source instances. Besides, as extensive uniformly distributed data can be obtained through MSDT, many problems, such as the mode collapse and overfitting, caused by training models with limited and not uniformly distributed datasets can be alleviated. Hence, we should perform MSDT to reduce discrepancies between ultrasound images from various medical centers, which benefits data-driven studies and the reliability and generalization of leaning based algorithms.

MSDT focuses on preserving detailed information while reducing intensity discrepancies between images from various medical centers. Compared to the conventional single-source domain adaptation, it is a more practical and challenging problem, which achieves multiple domain standardization in an integrated model without repetitively designing or training single-source domain adaptation. In this problem, there are two requirements for MSDT to perform effective many-to-one domain transformation in ultrasound images. First, the multi-center ultrasound images are unlabeled as massive annotations are time-consuming and unavailable, thus an effective unsupervised domain combination scheme is highly required to model multiple unlabeled source domains and target domain jointly to obtain uniformly distributed transformed ultrasound images. Second, to ensure the accuracy of subsequent ultrasound image analyze tasks, the fidelity and detailed information preservation of the generated ultrasound images is of significant importance. In clinical ultrasound diagnosis, tissue structures, details and speckle of ultrasound images play a critical role in extracting image features to provide diagnostic information. Hence, the translated images should completely maintain detailed structures (i.e., intrinsic contents) from its counterpart source domain image while being transferred into similar intensity distribution and textures (i.e., extrinsic styles) with target domain images (Xia et al., 2022). But in a real-life scenario, ultrasound images are usually with low resolution, indistinct tissue boundaries and the presence of speckle noise, raising difficulties in extracting feature information to generate ultrasound images with well-preserved details.

MSDT involves the synthesis target-domain-looking images under the guidance of acquired source modalities and target domain. Existing approaches involve single-source and multi-source domain transformation algorithms. The single-source-based methods (de Bel et al., 2021, Hoffman et al., 2018, Huang et al., 2022, Isola et al., 2017, Kong et al., 2021, Liu et al., 2017, Özbey et al., 2023, Zhu et al., 2017, Xia et al., 2022) can only perform one-to-one domain transformation, which limits the application in multi-source ultrasound images. Therefore, some multi-source-based methods have been proposed to handle many-to-one domain transformation. For example, GAN-based algorithms (Gao et al., 2019, Saito et al., 2018, Tzeng et al., 2017) align the source and target domains by adversarial learning, in which the implicit characterization susceptible to learning biases may lead to poor fidelity and detail distortion. Approaches (Xu et al., 2018, Peng et al., 2019) adopt category shift and moment matching to achieve domain alignment, respectively. Most of them regard the MSDT problem as transferring the knowledge learned from multiple annotated source datasets to one unlabeled target domain. They ignore the practical situation where most ultrasound images are without annotations, which necessitates heavy manual labor costs. Hence, the main challenges in the research of MSDT are that: (1) the source data acquired from multiple ultrasound domains hampers the effectiveness of mainstream single-source image-to-image translation methods; (2) conventional multi-domain adaption approaches are based on the assumption of massive labeled ultrasound data, which are not applicable to our MSDT task owing to the lack of extensive annotations; (3) poor sample fidelity and quality resulted by learning biases hinders the development of GAN models. The limited feature information of ultrasound images and difficulties in extracting features raised by the poor resolution, tissue boundaries etc., makes it harder for models to preserve detailed structures during the domain transformation process.

In this paper, we propose a novel unsupervised many-to-one domain transformation model called M2O-DiffGAN to tackle MSDT task for ultrasound image synthesis using multi-source unlabeled data. The M2O-DiffGAN integrates a domain combination scheme using a many-to-one adversarial learning skeleton and conditional adversarial diffusive generation to achieve efficient and high-fidelity multi-source domain standardization in ultrasound images. It can be adopted for normalizing images from multi-centers thus allowing existing dataset-specific ultrasound image analysis methods generalizable to different datasets. Images domain-adapted via the M2O-DiffGAN are highly consistent in intensity distribution with target domain, facilitating large uniformly-distributed ultrasound datasets acquisition.

In summary, major contributions can be summarized as follows:

We dispose complex multiple unlabeled ultrasound domains jointly modeling problem through a many-to-one adversarial learning procedure. A unified forward GAN and multiple domain-specific backward GANs compose multi-way cycle-consistent GANs for each source domain and produce paired target–source samples to train subsequent diffusive generation module.

Considering the dilemma in extracting features of ultrasound images, a conditional adversarial diffusive generation module with more stable inference process is adopted to generate high-fidelity ultrasound images, which incorporates an adversarial projector to capture reverse transition probabilities over large step sizes to increase the image sampling efficiency.

Besides, considering the inherent limited feature information of ultrasound images, an ultrasound-specific content loss constraint is designed to extract more perceptual information and provide more accurate guidance for the MSDT task.

Massive qualitative and quantitative results of the domain transformation on six clinical ultrasound datasets (including thyroid, carotid and breast ultrasound images) are provided to demonstrate the state-of-the-art performance of our proposed M2O-DiffGAN.

留言 (0)

沒有登入
gif