A survey of label-noise deep learning for medical image analysis

The intelligent analysis of medical images can leverage deep neural networks (DNNs) to help doctors locate, analyze, and study diseases (Litjens et al., 2017). It has the advantages of objectivity, low cost, and high performance compared with the traditional medical image analysis. With rapid advances in deep learning, many methods based on DNNs have been developed to achieve promising performance in various computer vision tasks. The tasks range from low-level denoising (Cheng et al., 2021), enhancement (Liu et al., 2021), and reconstruction (Akçakaya et al., 2022), to high-level image analysis tasks such as segmentation (Luo et al., 2022), detection (Hardalaç et al., 2022) and classification (Ghasemi et al., 2021). Although many factors are associated with the success of deep learning, one of the most important is the availability of large-scale datasets with clean annotations (Han et al., 2020a, Tasci et al., 2022) . However, collecting such large-scale datasets with clean labels is expensive and time-consuming. One solution is to employ non-experts or automated systems to label the datasets. The resulting datasets typically suffer from low-quality annotations with label noise, limiting their applicability to medical imaging.

In the medical image domain, it is particularly difficult to obtain datasets with accurate labels (Wang et al., 2020a, Ashraf et al., 2022). First, the labeling procedure requires additional work by experts, which is time-consuming and resource-intensive. Experts often do not record precise labeling during clinical diagnosis. However, a large amount of accurately labeled data is required for algorithm developments. More studies are required to accumulate useful datasets with accurate labels. For example, a typical CT scan consists of tens or hundreds of slices, each of which must be labeled individually. This heavy labeling task makes the labeling prone to errors, leading to noisy labels. Second, the labeling procedure relies heavily on expert experience and domain knowledge. Although crowd-sourcing has the potential for many applications in natural image domain, it has some limitations in the medical image domain. For many disease datasets that require a medical background, non-expert annotations have low confidence and may provide noisy labels. Ambiguity, low contrast, and small size of target objects are very common in medical image applications, making medical images more likely to contain label noise. By contrast, providing relatively accurate labels for natural images with clearer boundaries and higher contrast (e.g., apples or cars) is easier. Third, multiple annotators are typically adopted in the medical image domain to improve annotation accuracy. Different experts provided labels influenced by their preferences and competence levels. Even for experienced experts, the variability in lesion location, size, and shape across patients makes accurate labeling difficult. This process is notoriously subject to high inter-observer variability.

In short, the reliability and consistency of medical image labeling are poor compared with the labeling of natural images. Medical image datasets often contain low-quality annotations and exhibit ubiquitous label noise. Many previous studies have shown that label noise can significantly affect the generalization and accuracy of learning models. Medical image models trained on noisy labels are prone to suboptimal treatments. Hence, algorithmic approaches that can effectively handle label noise are highly desirable. We must note that noisy labels affect not only the training of the model but also its performance evaluation. Data with clean labels are required to determine the effectiveness of any trained model. This finding raises an interesting question. The provided “gold” labels are likely to be noisy for many medical image datasets.

After reviewing the current studies on handling label noise based on deep learning for medical image analysis, we found that this challenging problem has not received sufficient attention. However, when considering intelligent applications of medical images, exploring deep learning with noisy labels is an urgent problem to be solved. To better understand related problems, we present recent advances in deep-learning techniques to overcome noisy labels in medical image analysis. We hope that this survey will increasingly attract the attention of researchers in this field.

Nowadays, label-noise learning for natural images has attracted the attention of many researchers, and several pioneering surveys have been conducted on label noise for classification tasks. For example, Frénay and Verleysen (2013) focused on label-noise statistical learning, such as support vector machines and Bayesian learning, rather than label-noise deep learning. Nigam et al. (2020) summarized a brief introduction in which various machine learning algorithms were used to reduce noisy environments. However, they reviewed only the negative effects of classification tasks and were limited to polluted noise types. Recently, Han et al. (2020b) innovatively provided a formal definition for label-noise representation learning and were the first to categorize the extensive literature in terms of data, objectives, and optimization. Song et al. (2022) also demonstrated the problem of label-noise learning from the perspective of supervised learning. The difference between the works of Song et al. (2022) and Han et al. (2020b) lies in the categorization of philosophy. All these studies concentrated on reviewing noise-robust classification tasks for natural images.

From the perspective of medical image domain, Karimi et al. (2020) reviewed the techniques for handling label noise in deep learning. They also conducted some experiments and provided some insights into the performance of several medical label-noise learning methods. In recent years, this domain has developed rapidly, and a large amount of new work has emerged, which needs to be summarized and organized. Our survey reviewed recent studies on medical images with noisy labels. We use Medical Label-Noise Learning (MLNL) to represent learning with noisy labels based on deep learning in the medical image domain. Specifically, we provide a comprehensive review of recent robust training methods for MLNL, all of which are categorized into several groups based on their methodological differences. In addition, we discussed the applicability of typical methods designed for natural images in the medical domain. Our survey has demonstrated a recent and in-depth review of state-of-the-art medical label-noise learning. The main contributions of this study are as follows: (1) We review the extensive literature on MLNL and present a unified taxonomy of methodological differences. We demonstrate some typical studies and analyze their corresponding advantages and disadvantages. (2) We conduct a methodological comparison with the existing MLNL. We also present the discussion for noise in “gold” labels and how to reduce label noise on the annotation side. Datasets commonly used for MLNL are also introduced. (3) We propose several promising future directions based on the characteristics of medical images and their corresponding tasks.

MLNL aims to develop robust learning models to mitigate the negative impact of noisy labels on the algorithm performance. In previous studies, label noise has been used to refer to various forms of label imperfections and contamination. In the medical image domain, label noise manifests in various forms. We clarify the meaning of label noise and draw on the scope of the literature review.

In this review, we focus on label noise. We consider a pair of medical images and their corresponding labels, where we assume that the image is intact, but the label is corrupted (Van Engelen and Hoos, 2020). We do not discuss the data noise problems in this study. One approach for dealing with imperfect data labels is semi-supervised learning, which trains models on both labeled and unlabeled data. The main goal is to use a large amount of unlabeled data to improve the performance of the models. This is not in the scope of our study. Another form of label imperfection is weakly labeling. The corresponding study is termed weakly supervised learning (Zhang et al., 2021), which is also beyond the scope of our study. In this survey, we reviewed only studies related to learning with noisy labels.

Our goal is to provide a comprehensive survey of label-noise deep learning for medical image analysis. We attempted to retrieve as many recent publications as possible at the time of submission (May 2023), and have reviewed the most representative works and influential developments in this field.

The remainder of this paper is organized as follows. In Section 2, the problem analysis of the MLNL is described. Section 3 presents the typical methods that have addressed label noise in medical image datasets. In Section 4, we present an in-depth analysis and the methodological comparison. This section also presents the typical experimental datasets. Future work on MLNL is discussed in Section 5, and conclusions are presented in Section 6. A concise description of the organization of this study is presented in Fig. 1.

留言 (0)

沒有登入
gif