Detection of Masses in Mammogram Images Based on the Enhanced RetinaNet Network With INbreast Dataset

Introduction

Breast cancer is one of the most common and deadly cancers in women and accounts for about 15% of all cancer deaths in women.1,2 The World Health Organization reported that 67,0000 women died of breast cancer in 2022.2 The death rate of breast cancer continues to increase globally. In order to improve the prognosis and survival rate of breast cancer, early diagnosis of patients has become crucial. Although they have been used for more than 100 years, mammograms are still the primary screening tool for the clinical diagnosis of breast cancer.3 Doctors usually make a diagnosis by examining a patient’s mammogram and combining it with the patient’s clinical information. However, making a correct diagnosis is a challenging task because it is dependent on the doctor’s experience. Therefore, computer-aided diagnosis (CAD) systems for medical images are often used to assist medical doctors in detecting and diagnosing lesions in breast imaging examinations by providing them with auxiliary advice and thus reducing their burden.4–6

However, traditional CAD systems have several limitations that can impact the accuracy and efficiency of breast cancer detection. Firstly, the traditional CAD systems are usually dependent on manually extracted features. This requires people to select those useful features based on prior knowledge of the data, which is inefficient and computationally intensive. Secondly, the features can be affected by external factors such as patient breast density and imaging equipment.7,8 As a result, poor diagnosis results are often produced by these types of approaches. In addition, traditional CAD systems may be less effective in detecting breast cancer in women with dense breast tissue, as the dense tissue can obscure the visibility of tumors.9,10 This can lead to a higher risk of missed diagnoses in these patients.

Deep learning methods have recently advanced to an unprecedented stage and have attracted much research attention because of the rapidly emerging big data era and vast improvements in computer hardware. Introducing deep learning technology in CAD systems can potentially address these limitations and offer several benefits. By using a standardized deep learning, the variability in interpretation among radiologists can be minimized, leading to more consistent and reliable results. Furthermore, they can avoid the limitations of traditional CAD systems due to their automatic feature extraction capabilities.11,12 In this way, deep learning methods have attained breakthrough achievements in image recognition,13 natural language processing,14 bioinformatics,15 object detection,16 and many other fields17,18 without relying on manually extracted features. For example, we have used deep learning to classify and analyze breast cancer histopathological images, and we obtained the best results so far.19 Although we can successfully classify breast cancer histopathological images into benign and malignant classes and further into eight classes, including Adenosis, Fibroadenoma, Phyllodes Tumor, Tubular Adenoma, Ductal Carcinoma, Lobular Carcinoma, Mucinous Carcinoma, and Papillary Carcinoma, our study was limited to these breast cancer histopathological images. Therefore, it cannot significantly assist medical doctors in early breast cancer screening using mammograms. Furthermore, most studies have focused on the classification of various types of breast cancer,13,20–23 but there have been few studies about lesion region detection in mammogram images.

Our research in this paper is dedicated to deep learning-based detection of lesion regions in mammogram images with the aim of enhancing the efficiency and accuracy of breast cancer screening and treatment, as well as providing assistance for medical doctors to administer clinical treatment to breast cancer patients in a timely manner. Specifically, we aim to address the class imbalance problem, reduce the high false positive and false negative rates in existing models, and focus on detecting small masses and multiple targets in mammogram analysis. To achieve these goals, we initially adapt the end-to-end one-stage object detection model RetinaNet24 to the mammogram masses detection problem and conduct experiments on the INbreast database.25 Subsequently, we propose improvements to the RetinaNet network and combine transfer learning and fine-tuning techniques to further enhance RetinaNet model and our experimental results. Ultimately, our approach achieves superior performance on the INbreast database compared to previous studies in the literature, thus contributing to more effective breast cancer screening and clinical treatment.

The enhanced detection capabilities of our proposed method hold significant implications for patient care. By improving the accuracy of mass detection in mammogram images, we can reduce the number of false negatives, which is of utmost importance as missed detections can lead to delayed diagnosis and treatment, negatively affecting patient prognosis. At the same time, minimizing false positives is equally crucial as it reduces unnecessary patient anxiety and further diagnostic procedures. Our approach aims to strike a balance between these two aspects, ultimately leading to more reliable and effective breast cancer screening. This improvement not only benefits individual patients by providing more accurate diagnoses but also has the potential to enhance the overall efficiency and effectiveness of breast cancer screening programs.

Related Works

There have been many classification studies on the INbreast database. For example, in a 2015 study by Carneiro et al,26 researchers divided the images in the INbreast database into two categories (benign and malignant) by using a convolutional neural network pre-trained on the ImageNet dataset (a very large and popular dataset in the field of computer vision). They attained good experimental results in terms of an AUC (Area Under the ROC Curve) of 0.91.

However, compared with the classification task, it is much more important to detect the location of breast lesions in the mammogram images so that medical doctors can make a correct diagnosis and administer further appropriate clinical treatment to patients. As a result, there have been some studies published that used the INbreast dataset and focused on detecting breast lesions. In 2015, Dhungel et al27 proposed a new method for detecting masses in mammography images in the INbreast database. They adopted R-CNN (Region-based Convolutional Neural Network), a two-stage object detection method and deep learning technique cascaded with a random forest classifier, and obtained experimental results of [email protected] (TPR@FPPI). TPR refers to True Positive Rate and FPPI denotes False Positive Per Image. In 2017, they28 combined the deep learning technique with Bayesian optimization to perform further research with the INbreast database and achieved results of [email protected] in mammography mass detection. Also in 2017, Akselrod-Ballin et al29 proposed to use Faster R-CNN (Faster Region-based Convolutional Neural Network) to detect masses in the INbreast database and achieved experimental results of [email protected]. In 2018, Ribli et al30 used Faster R-CNN to do more challenging research on the INbreast database. They tried to detect all types of lesions in the mammography images, such as masses and calcification, and obtained experimental results of [email protected] (TPR@FPPI). They also found that masses are the most common type of lesion in mammography images. Furthermore, a 2018 study by Jung et al31 proposed to use the one-stage object detection model RetinaNet to detect lesions in images in the INbreast database due to the advantages offered by its speed and high precision. They obtained results of [email protected] on INbreast and also used the parameters obtained by training RetinaNet on their inhouse database GURO. These parameters will be helpful for related research in the future. The study on INbreast was done by Agarwal et al32 in 2019, in which the InceptionV333 network was combined with transfer learning to obtain results of [email protected] for mass detection in mammography images. They further proposed a fully automated framework to detect masses in full-field digital mammograms based on the Faster R-CNN network in 2020, and they obtained [email protected] for malignant masses and [email protected] for benign masses on the INbreast dataset.34 At the same year, Gao et al35 proposed a multi-task deep learning model enabled by feature transfer and obtained [email protected] and 0.91@5 for malignant and benign masses on the INbreast dataset, respectively. They also tested Mask R-CNN on the INbreast dataset by obtaining [email protected] and 0.85@5 for malignant and benign masses, respectively.35 Li et al36 proposed a convolutional neural network-based bilateral image analysis method for mass detection in mammograms and obtained [email protected] on the INbreast dataset. Min et al37 put forward a scheme based on the sifting architecture for detecting mammographic lesions, obtaining the [email protected] on INbreast dataset. Yan et al38 proposed the combined matching and classification network (CMCNet) which was used on the INbreast dataset, obtaining results of [email protected] for detecting masses in mammograms. Zhang et al39 proposed to use the YOLOv3 network for mass detection in mammogram images and obtained TPR@FPPI results of [email protected] for the INbreast dataset. Elmoufidi40 developed a framework for automatizing the analysis of mammograms’ suspicious areas. It utilizes BEMD (Bidimensional empirical mode decomposition) to decompose ROIs (Regions of interest) into BIMFs (Bidimensional intrinsic mode function) and SVM (Support vector machine) for classification, significantly improving diagnostic accuracy compared to recent literature. In the past two years, Aslan and colleagues41 introduced two types of end-to-end deep learning models tailored for the analysis of mammography images, comprising a CNN (Convolutional neural network) and a more complex model integrating CNN with BiLSTM (Bidirectional long short-term memory) to leverage temporal feature extraction. Their findings revealed that these models achieved commendable classification accuracies for benign samples, with 0.95 and 0.971 on the INbreast dataset, respectively, underscoring the efficacy of these deep learning architectures in breast imaging diagnostics. A deep learning-based model was introduced for the Severity Classification of Breast Cancer in Chakravarthy’s work.42 It employs transfer learning for feature extraction and enhances the wKNN (Weighted k-nearest neighbor) algorithm using optimization algorithm for transforming non-linear input features into linearly separable feature vectors. The proposed model achieved the best performance in INbreast dataset, with TPR of 0.807 and accuracy of 0.832, when CSO algorithm transformed features are using with wKNN classify. Chakravarthy et al43 proposed a CAD framework leveraging deep learning for breast cancer diagnosis. Four distinct experiments are conducted to identify the most effective classification strategy, including pre-trained Deep CNNs, feature extraction with SVM using various kernels, deep feature fusion for improvement, and PCA to reduce computational burden. The INbreast dataset achieved an impressive classification accuracy of 0.966 through the fusing of deep features. Prinzi et al44 developed an automated data-driven model using Yolo-based architecture for breast cancer detection in mammograms. The comparison of various Yolo models highlighted the efficacy of the YoloV5s model with transfer learning, achieving a mean TPR of 0.742±0.146 in INbreast dataset.

Although masses are very common lesions in mammography, the aforementioned studies show that the best TPR@FPPI for mass detection in the mammography images of the INbreast database is no more than [email protected], which was achieved using the InceptionV3 network. In addition, although it was reported that the best TPR@FPPI of [email protected] was achieved using Faster R-CNN for malignant masses on the INbreast dataset, the TPR@FPPI for benign masses was only [email protected].34 Therefore, it is necessary for us to advance the study of mass detection in mammography. However, masses in mammography images are typically small, which makes detecting them a very challenging task. It is reported that, in addition to having high speed and accuracy, the one-stage object detection network RetinaNet has advantages that enhance its ability to detect small targets. Although Jung et al31 proposed to use RetinaNet to detect lesions in images from the INbreast database and obtained a TPR @ FPPI of [email protected], their results are not better than the [email protected] achieved using the InceptionV3 network reported in Agarwal’s work.32

With the advantages of RetinaNet and the challenges of mass detection in mammography in mind, we prefer to adopt the RetinaNet network to perform mass detection on the mammography images in the INbreast database. The choice of the RetinaNet network was driven by its unique advantages. Its one-stage architecture provides a balance between speed and accuracy, making it suitable for clinical applications. The focal loss function it proposed effectively tackles the class imbalance issue prevalent in mammogram images, where normal regions are abundant compared to mass regions. Additionally, the hierarchical feature maps in RetinaNet enable better detection of small and multiple sizes of masses, which is essential given the diverse size range of masses in mammograms. These characteristics distinguish RetinaNet from other commonly used models, such as Faster R-CNN and YOLO, and make it a particularly apt choice for our mammogram analysis task. Here, we try to improve the TPR while also reducing the FPPI for mass detection using the techniques introduced in the next section.

Materials and Methods

This section introduces the data and the networks used in this paper. For constructing the datasets in this study, we used the publicly available INbreast image database. The network architectures used to train our models include the existing RetinaNet and our improved RetinaNet.

Data

The INbreast dataset used in this paper is a mammographic dataset, which can be acquired at a breast center located in a university hospital (Centro Hospitalar de S. João [CHSJ], Breast Centre, Porto) with the permission of the Portuguese National Committee of Data Protection and Hospital’s Ethics Committee.25 It consists of 410 X-ray breast images from 115 patients, depicting the craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts for each patient. There are 90 patients who have not had a mastectomy (they have two breasts) and thus can provide four breast X-ray images. The remaining 25 patients have had a mastectomy (they have only one breast) and can only provide two breast X-ray images. Figure 1a–d show four breast images from a patient that did not have a mastectomy. The size of the INbreast images is 3328×4048 or 2560×3328 pixels, which, based on the size of the breast being imaged, depends on the size of the compression plate the researchers used to collect the images.

Figure 1 Detail information of the INbreast dataset used in this study. Subfigures (a)-(d) show the four mammograms of a patient from two views,25 (a) The CC view of the patient’s right breast, (b) The CC view of the patient’s left breast, (c) the MLO view of the patient’s right breast, (d) The MLO view of the patient’s left breast; Subfigure. (e) Presents the number of images with and without masses included in the training and test subsets, and the number of unlabeled images, respectively.

There are six categories in the INbreast database, including asymmetry, calcification, distortion, mass, multi-cases, and normal status. Since masses are the most common lesions, we focus on detecting masses in the mammogram images in this paper. The experts provided accurate contour annotation in XML file format for the lesion sites in 343 mammogram images in the INbreast dataset. However, there were 67 mammogram images that we could not use in our experiments because they were left without any labels. A total of 116 masses can be found among 107 images, with an average of about 1.1 masses per image.25 According to this description of the INbreast dataset in the reference 25, it can be guessed that all images labeled as masses are included within the 343 labeled images. In other words, none of the 67 mammogram images without labels contain masses. The detail information of the images in the INbreast dataset is displayed in Figure 1e.

As a result of the differences between the INbreast dataset and the PASCAL VOC (Visual Object Classes) dataset that RetinaNet was originally tested with, we first needed to process these 343 images before they could be used. Specifically, we needed to convert the INbreast dataset’s annotations for the masses in the images, which are in the form of an array of points representing the contour line centered around regions of interest, to the rectangular bounding box target format used by RetinaNet and the PASCAL VOC dataset. In order to accomplish this, we used the point with the minimum horizontal and vertical coordinates in each mass contour line in the mammogram images as the upper-left point of each rectangular bounding box and the point with the maximum coordinates as the bottom-right point. By doing this, the rectangular bounding box target format abides by the rules of the PASCAL VOC dataset. It should be noted that the directions of the horizontal and vertical coordinates we used here are toward right and down, respectively. In addition, we normalize images following the normalization protocol utilized in the original RetinaNet.

RetinaNet Network

Deep learning-based object detection algorithms can be divided into two families.45 The first family is composed of two-stage detection algorithms, such as R-CNN46 and its variants. This kind of algorithm divides the object detection task into two stages that involve generating candidate regions before performing classification and regression. As a result, they typically can achieve high detection accuracy, but they often have very slow detection speed. The second family contains one-stage deep learning-based object detection algorithms, such as YOLO47 and its variants. These algorithms simplify object detection to an end-to-end classification and regression task without generating candidate regions. As a result, they are typically fast but are often less accurate, compared with two-stage networks. How to achieve both high speed and high accuracy has been a challenging task in the advancement of deep learning-based object detection studies, and many researchers have devoted themselves to making improvements in this area. In 2018, Lin et al24 pointed out that this challenge mainly comes from the extreme imbalance typically found in sample category distributions. They proposed a novel loss function named Focal Loss (FL) as an alternative to the traditional cross-entropy loss, and they developed the RetinaNet network to test this FL function. Focal Loss is defined in equation (1)-(3).

(1)

(2)

(3)

Here, y is the label, y=1 denotes that the sample belongs to the positive class. for binary classification represents the probability that a sample will be predicted as a positive sample. represents the model’s estimated probability p for positive class, otherwise is the 1-p for negative class. is a weight factor used to adjust the contributions to the FL from positive and negative samples. Similar to the definition of , is the for positive class, otherwise is the for negative class. is the modulation factor used to suppress the weights of samples that are easily classified and to lead the model to focus on samples that are more difficult to classify. We set and in our experiments. Thanks to improvements from the FL function, it was finally proved that the RetinaNet network was capable of real-time object detection with both high accuracy and speed.

Figure 2 shows a diagram of RetinaNet and comprises the four parts: (a) is the residual network used to extract features from input images. It uses layer jumping connections to avoid the problem of gradient disappearance caused by increasing network depth. (b) is the FPN (Feature Pyramid Network)48 used to generate multi-scale feature maps. (c) is the full convolution subnetwork for classification. It is used to classify the feature maps generated by (b). (d) is the fully convolutional regression subnetwork. This section is parallel to the subnetwork in (c) and shares the same structure, but its function is to perform regression on the feature maps generated by (b). To understand the details of the RetinaNet network, so as to improve it, we present its detailed structure in Figure 3.

Figure 2 The diagram of the RetinaNet network structure.24

Figure 3 The diagram of the refined network structure of RetinaNet.

The residual network shown in Figure 3a is composed of five convolutional residual blocks, each of which can transfer the output information of the first few layers of the network to a deeper network layer by jumping with a layer connection. This helps the network avoid the gradient disappearance problem and ensures good performance while training deeper layers of the model. Figure 3b is the feature pyramid network, which can achieve accurate location detection via adding a connection between the reconstruction layer and the corresponding feature map. Finally, there are five different scale feature maps (P3, P4, P5, P6 and P7) both for the classification subnet in Figure 3c and the regression subnet in Figure 3d. This allows for target classification and regression to be performed simultaneously. The whole network is an end-to-end and one-stage object detection model. Once an image of an object is used for input, the network will output both classification and regression results after a series of forward propagations. Small objects can be detected by the RetinaNet network using the multi-scale feature maps produced by the embedded feature pyramid network. Therefore, it is a suitable network for detecting masses in breast X-ray images. Additional details concerning the RetinaNet network can be found in Lin’s work.24

The Improved RetinaNet Network

To improve the ability of RetinaNet to detect very small masses in breast X-ray images, we propose the improved RetinaNet network shown in Figure 4. The improved part of the network is depicted in red font. Specifically, the feature map M5 will first be processed by the nonlinear function ReLU (Rectified Linear Unit) before being processed by the original 3 × 3 convolution kernel. This improvement was made to prevent the resolution of features for very small masses in the X-ray images of breast from becoming too low, resulting they cannot be detected out, though they are still informative after being processed by ResNet50. The ReLU function can strengthen some features of feature map M5, so as to enhance their resolution and guarantee that very small masses will be detected in the X-ray images. Therefore, we use the ReLU function to enhance the feature information of the high-level feature map M5 of RetinaNet network, so as to achieve the effect of enhancing resolution. Then a convolution operation is performed to improve the detection accuracy of the very small masses in breast X-ray images.

Figure 4 The diagram of the improved RetinaNet network structure.

Evaluation Metrics

Although sensitivity (recall), specificity, precision and F1-score are popular metrics to evaluate the performance of a model, this study focus on the detection of masses in mammogram images, so the metrics which are popular and more comprehensive in object detection will be utilized to value the performance of the model. The evaluation metrics used in this paper are mAP (mean Average Precision)49 and FROC (Free-Response Receiver Operating Characteristic Curves).50 mAP is the most popular metric used in the object detection field. It is defined as the mean of the AP (Average Precision), which is the area under the precision-recall curve (PR curve), and is within the interval of [0, 1]. Each point in the curve corresponds to a specific threshold, the models with better overall performance typically exhibit higher mAP values. FROC is a common metric for cancer diagnosis that uses the x-axis to denote false positives per normal case and the y-axis for sensitivity. It is used to evaluate the performance of a model for each image by generating a series of points derived by changing the threshold that determines the True Positive Rate (TPR) and the False Positives per Image (FPPI). Finally, the performance of the model is evaluated by the minimum FPPI value (TPR@FPPI) when the TPR has reached the maximum value. Our experiments used the implementations of these two metrics that are available as library functions included with Python Sklearn and Matlab2017b evaluateDetectionMissRate, respectively.

Results

It is well known that RetinaNet was very successful on the PASCAL VOC database, and its parameters have been used by many researchers across a wide variety of different experiments thanks to the proliferation of transfer learning techniques. We mostly used the default parameters of RetinaNet, but we chose to tune a few of them, such as batch_size and num_epoch (set to 1 and 50, respectively, in our experiments. Here, the batch_size parameter was set to be 1 because of the memory limits of our computers). In addition, Adam (Adaptive Moment Estimation) optimizer is adopted to train our model. The parameters for Adam are β1=0.9 and β2=0.999. The learning rate is 1e-6. We performed experiments on the original INbreast database and a modified version, which we obtained by using image augmentation techniques. All of our experiments were carried out using the Tensorflow and Keras frameworks.

Experiments on the Raw Dataset

We randomly partitioned the 343 mammogram images in the INbreast dataset into training and test subsets in a 7:3 ratio under the condition that the distributions of the test and training subsets are similar. Finally, the training and test subsets include 226 and 117 mammogram images respectively. The number of images with and without masses included in the training and test subsets is detailed in Figure 1e. Figure 3 shows that the residual network is the first part of RetinaNet, which comprises several stacked convolutional layers. ResNet50,51 ResNet101,51 and ResNet15251 are three of the most popular residual networks that we found in our literature review. Therefore, we conducted our experiments by using these three residual networks as the backbone of the RetinaNet network for training and testing. First, we used the 226 mammogram images in the training subset to train the RetinaNet network with ResNet50, ResNet101, and ResNet152 backbones. Then, we tested the network with the 117 mammogram images in the test subset. The test results are recorded here as train_raw_50, train_raw_101, and train_raw_152, respectively. Figure 5 displays the curves depicting the change in training loss over each iteration (epoch) observed when using ResNet50, ResNet101 and ResNet152 as the backbone networks. Figure 6a displays the PR curves with the values of mAP for the three resulting models. Figure 6b displays the FROC curves of these models and their TPR@FPPI values.

Figure 5 The training loss curves of the three RetinaNet models.

The results in Figure 5 show that all the three training loss curves descend as the number of iterations increases. The loss decreases rapidly in the early stages of training, while it gradually stabilizes in the later stages of training. These results also show that the ResNet50 backbone exhibits the most rapidly decreasing training loss curve, while the curve belonging to ResNet101 decreases at the slowest rate. This is one of the reasons why we choose ReatinaNet with ResNet50 backbone for detecting masses in the INbreast mammogram images.

The results in Figure 6 show that the mAP and TPR@FPPI values of each model are good regardless of the backbone used for RetinaNet (ResNet50, ResNet101, or ResNet152). Specifically, the results in Figure 6a show that the mAP values of the three RetinaNet models are 0.9971, 0.9966 and 0.9939, respectively, which are inversely proportional to the depth of the RetinaNet network. These results demonstrate that the rule of thumb, ie the deeper the network, the better the performance of a deep learning network, is not always correct. Therefore, we need take the data properties, domain-specific target metrics, and task-specific empirical optimization into consideration to design the proper architecture of a network for a specific task.

Figure 6 Performance comparison of three RetinaNet models in terms of PR and FROC curves, respectively. (a) PR curves with mAP values, (b) FROC curves with TPR@FPPI values.

The results in Figure 6b show that the minimum FPPI values of three RetinaNet models are not same when they reached the same maximum TPR values. Although the minimum FPPI value of the RetinaNet using ResNet50 backbone is higher 2% than that of the other two RetinaNet models with ResNet101 and ResNet152, respectively, as the backbones, its minimum FPPI value is acceptable, considering its maximum TPR value and its efficient. Therefore, our models based on RetinaNet are well suited for detecting masses in the INbreast mammogram images.

To further compare the detection capability of our three models, Figure 7 shows the specific results for two mammogram images in the test subset. Here, the green box represents the ground truth location of the mass, and the red box depicts the location of the mass predicted by our three models. To display the results clearly, the ground truth and the detected results are enlarged and displayed in the up-right corner of each image.

Figure 7 The ground truth (green boxes) and predicted (red boxes) locations of the masses in two mammogram images from the test subset generated using different backbones for RetinaNet. (a) ResNet 50, (b) ResNet101, (c) ResNet152.

The results in Figure 7 show an example of how the RetinaNet model with the ResNet50 backbone outperformed the other two models in the INbreast mass detection test. This was due to the ground-truth box and the predicted box for the masses in the test images nearly completely overlapping. In addition, the results in Figures 5 and 6 show that, for the INbreast database, our RetinaNet model using ResNet50 as the backbone was faster and more accurate than our RetinaNet models that used ResNet101 and ResNet152 for their backbones. This is likely caused by the images in the INbreast database having a relatively simple black background. Therefore, this well-defined sample space does not need a very deep network to learn the features for images, and our RetinaNet model with ResNet50 as the backbone is capable of learning the features used to detect masses in the INbreast database. However, our RetinaNet models with the ResNet101 and ResNet152 backbones may be too deep to learn these features and most likely succumbed to overfitting. Furthermore, these two deeper networks are more complex and will require too much training time. Therefore, we will conduct further experiments on RetinaNet with ResNet50 as the backbone network.

Experiments on the Augmented Dataset

To avoid the probable overfitting caused by the limited 226 X-ray breast images in the training subset, this subsection focuses on experiments that we performed using our augmented training subset. The mammogram images in the training subset were augmented by randomly applying the following operations: flipping horizontally, flipping vertically, rotating up to 90 degrees counterclockwise, and rotating up to 180 degrees counterclockwise. Finally, the training subset is augmented to comprise 1,130 mammogram images including the original ones in the training subset and their augmented counterparts. These 1,130 images were used to train the RetinaNet network with ResNet50 as the backbone. The 117 images in the original test subset were used to test this trained RetinaNet model, and the test results were recorded as train_aug_50.

Figure 8 compares the training loss curves of the original training subset and the augmented training subset. Figure 9a shows the PR curves and the mAP values of the two RetinaNet models trained by the original training subset and the augmented training subset. Figure 9b depicts the FROC curves and the TPR@FPPI values of these two RetinaNet models. Figure 10 compares the performance of the RetinaNet models trained using the original and augmented training subsets on one image having multiple masses, from the test subset, where the green boxes represent the ground-truth and the red boxes represent the predicted results. The ground truth and the detected results are enlarged and displayed in the up-left corner of each image, so as to display the predicted results clearly.

Figure 8 The training loss curves of the RetinaNet models for the original and augmented training subsets.

Figure 9 The PR and FROC curves of the RetinaNet models trained using the original and augmented training subsets, respectively. (a) PR curves, (b) FROC curves.

Figure 10 The test comparison of two RetinaNet models on one image in the test subset. (a) RetinaNet model trained on the original training subset, (b) RetinaNet model trained on the augmented training subset.

The results in Figure 8 show that the RetinaNet model’s training loss on the augmented training subset converges faster than that of the original training subset. Furthermore, this model converges to a smaller training loss value, which indicates this model is more accurate than the RetinaNet model trained on the original training subset.

The results in Figure 9a show that the RetinaNet model trained on the augmented training subset obtains a higher mAP on test subset than the model from the original training subset. Furthermore, the results in Figure 9b show that the RetinaNet model trained on the augmented training subset obtains a better TPR@FPPI on test subset than the model trained with the original training subset. The maximum TPR value of the model from the augmented training subset is higher than that of the RetinaNet model trained on the original training subset, while its minimum FPPI value is lower than that of the model from the original training subset when they reach the maximum TPR, respectively.

The results in Figure 10 show that the RetinaNet model trained on the augmented training subset has more accurate detection capability than the model trained on the original 226 breast X-ray images in the INbreast database, especially for very small masses, which demonstrates important role of image augmentation in training the RetinaNet model.

The rule of thumb in here is that the image augmentation guarantees the number of images containing small masses which make the RetinaNet have more chances to extract the features of small masses, which leads to the capability of the model to detect out small masses correctly. But the RetinaNet trained on raw training subset may not extract the features of small masses extensively, which subsequently affects the capability of the model in detecting the small masses. Therefore, we will train our proposed RetinaNet using the augmented training subset.

Test Experiments With Our Improved RetinaNet Trained on the Augmented Dataset

This subsection will test the power of our improved RetinaNet network shown in Figure 4. We trained this improved RetinaNet on our augmented training subset because we have proved that the original RetinaNet network can obtain better performance when trained on this subset.

In addition, it is well known that pre-training datasets can influence the model’s performance, such as the size and diversity, quality, imbalance, and domain of the pre-training datasets will lead different influence to the model. A larger and more diverse pre-training dataset will provide the model with a broader understanding of the real world, which can be beneficial when transferring knowledge to a new task. However, larger datasets also require more computational resources for pre-training. The quality of the data, including the absence of noise and the presence of high-quality labels, is crucial because high-quality data can lead to better-learned representations, which can transfer more effectively to new tasks. Class imbalance of pre-training data can lead to biased models that perform poorly on minority classes, therefore, balancing the pre-training datasets or using techniques to mitigate imbalance can improve transferability. The domain of a pre-training dataset can either help or hinder transfer learning. If the domains of the pre-training dataset and target tasks are similar, the transfer is likely to be more effective. However, if the domains are different, the model may need to “unlearn” some irrelevant features.

Considering these aforementioned aspects of pre-training datasets on the performance of the model, we choose to utilize the specific pre-training dataset to pre-train our improved RetinaNet model, so as to enhance its capability in detecting the masses, particularly the very small masses in the X-ray images of breast. Here, we introduce the specific transfer learning techniques that we used to advance the training process of RetinaNet. First, we applied the weights of RetinaNet trained by Jung et al31 using their inhouse database called GURO to our improved RetinaNet in Figure 4 as its initial weights. Then, we performed fine-tuning on our RetinaNet model using the 1,130 mammogram images in our augmented training subset. We recorded these test results as improved_train_aug_50.

Figure 11 displays the training loss curves of the original model and our improved RetinaNet model. Figure 12a shows the PR curves and the mAP values of the obtained two RetinaNet models on the test dataset. Figure 12b shows the FROC curves and the TPR@FPPI values of the two RetinaNet models. Figure 13 compares the performance of the original RetinaNet and our improved RetinaNet on some images in the test subset, where the green boxes represent the ground-truth and the red boxes show the predicted results. The ground truth and the detected results by models are enlarged and displayed in the up-right or up-left corner of each image, so as to display the detected results clearly.

Figure 11 The training loss curves of the original RetinaNet and our improved RetinaNet.

Figure 12 The PR and FROC curves of the original model and our improved RetinaNet. (a) ResNet 50, (b) FROC curves.

Figure 13 The performance comparison of the original and our improved RetinaNet models on three mammogram images in the test subset. (a) the original RetinaNet, (b) our improved RetinaNet.

The results in Figure 11 show that our improved RetinaNet converges faster than the original RetinaNet. The training loss of our improved RetinaNet is always lower than that or the original RetinaNet during training process. They converge to the same training loss value after training 40 Epochs.

The results in Figure 12 demonstrate that the improved RetinaNet outperformed the original RetinaNet with a mAP of 1 and a TPR@FPPI of [email protected], which means that the improved RetinaNet can detect all the masses present in the 117 mammograms in the test subset. Although the performance of the original REtinaNet is good with the mAP of 0.9985, its minimum FPPI is 0.02 when it achieves its maximum TPR of 1, while our improved RetinaNet obtains the minimum FPPI of 0 when achieving its maximum TPR of 1. We therefore say that our improved RetinaNet is perfect.

Therefore, we tested the unlabeled 67 mammogram images in the INbreast database with our trained and improved RetinaNet model. We found they were all predicted to be without masses. We believe that this information should be useful for future studies using this dataset. For example, we can add the unlabeled 67 mammogram images in the INbreast database to the training subset to improve the robustness of the model.

After performing a comprehensive analysis, we believe that the following reasons lead to our improved RetinaNet model achieving such high performance. First, the features of feature map M5 were enhanced by our extra ReLU activation function, which enhanced the resolution of the extracted features and lead to the model being able to more accurately detect very small masses. Second, we utilized transfer learning techniques by adopting the weights trained using the inhouse GURO dataset from one of the South Korean hospitals as the initial weights of our RetinaNet model before using the INbreast database to fine-tune these weights. We believe that the weights obtained from training with two completely different mammogram databases lead to the good generalization performance of our model and help it to successfully avoid overfitting.

As a result, our new RetinaNet model can be successfully applied even for patients with widely varying types of breast masses. This is especially true for the patients with multiple breast masses, which can be seen in Figure 13.

The results in Figure 13 show that our improved RetinaNet model successfully detects both the instances of very small masses and the cases of multiple masses in the mammography images of the INbreast database. It solved the challenge of making RetinaNet be able to detect multiple masses in these mammography images. Furthermore, in addition to improving the poor multi-mass detection of the original RetinaNet, it also improved the detection accuracy for specific masses in the INbreast database, which can be seen from the experimental results in the third mammography image in Figure 13.

The results in Figure 13 also demonstrate the significance of our improvement to RetinaNet. It has obtained the capability to detect small and multiple masses by adding of Relu function to enhance features in feature map M5 before its being processed by the original convolution kernel.

Experimental Comparisons

This subsection compares our experimental results with those from other researchers on the same public database (INbreast). Since the mAP metric was not used by other researchers, we only compare the results in terms of TPR@FPPI or TPR. Table 1 displays the comparison of the existing experimental results with our own results. The value in bold font in Table 1 refer to the best experimental result. Three studies34,35,52 presented the two types of TPR@FPPI for detecting malignant and benign masses on the INbreast dataset, respectively.

Table 1 The Comparison of the Existing Experimental Results on the INbreast Database With Our Own

The results in Table 1 show that our experimental results obtained on the INbreast dataset in this paper are all superior to that of the available studies that we have found. Even our RetinaNet model trained on the original 226 mammogram images in the training subset can outperform the majority of studies in terms of TPR@FPPI with its value of [email protected]. However, the performance of the models built by Agarwal et al,34 Zebari et al,55 Muduli et al,57 Ayana et al60, Adedigba et al61 was better on the INbreast dataset and could correctly identify more patients. Although this TPR value is the same as the results of Agarwal et al,32 it was obtained with a considerably lower FPPI value. In addition, although the study by Agarwal et al34 achieved the TPR@FPPI of [email protected] for detecting malignant masses, its TPR@FPPI for benign masses is only 0.85@1.

The results in Table 1 also show that the experimental results can be further improved in terms of the metric of TPR@FPPI when the original training subset is augmented by our augmentation techniques. In addition, the results show that our improved RetinaNet model can achieve the best performance in terms of TPR@FPPI ([email protected]) among all the studies we have found, which demonstrates that this model can successfully avoid over-fitting and easily handle several of the challenging tasks of performing mass detection on the INbreast database, such as very small mass detection and multi-mass detection. Furthermore, thanks to improvements in the model’s generalization capabilities granted by our use of transfer learning techniques, it can be applied to different kinds of mass detections.

Discussion

This paper performed a deep learning-based approach by introducing RetinaNet and transfer learning to detect masses in mammograms. A series of experiments were conducted to compare the detection performance of the RetinaNet network and our improved RetinaNet on the INbreast dataset and its augmented dataset, respectively.

Our study’s innovations lie in the comprehensive optimization and improvement of the RetinaNet model for mammogram image analysis. By carefully adapting the model to the specific characteristics of the INbreast dataset, we have achieved a level of performance that surpasses previous studies. The selection of the ResNet50 backbone, based on the analysis of training loss curves and the nature of the dataset, was a crucial step in enhancing the model’s efficiency and accuracy. This approach challenges the traditional notion of simply relying on deeper network architectures and emphasizes the importance of customizing models to fit the data at hand.

The implementation of data augmentation techniques further distinguished our work. By expanding the training subset through horizontal and vertical flipping and various rotations, we provided the model with a more diverse set of samples. This not only improved its ability to detect small masses but also enhanced its overall generalization capabilities. The augmented dataset enabled the model to learn more robust and discriminative features, leading to better performance in terms of mAP and TPR@FPPI. Moreover, our proposed improvements to the RetinaNet model, such as the addition of the ReLU activation function to enhance the feature map M5 and the utilization of transfer learning with pre-trained weights from the GURO dataset, were innovative strategies. These modifications allowed the model to converge faster and achieve a perfect mAP of 1 and a TPR@FPPI of [email protected], successfully detecting all masses in the test subset. This level of performance is a significant milestone in the field, as it addresses the challenges of detecting small and multiple masses, which are often difficult to identify with traditional methods. In comparison with existing research on the INbreast dataset, our results stand out. We have demonstrated that our model outperforms previous studies in terms of both accuracy and detection capabilities. This not only validates the effectiveness of our proposed techniques but also sets a new benchmark for future research in this area.

Our research has the potential to impact the field of breast cancer screening in several ways. The improved detection accuracy can lead to earlier diagnosis, which is crucial for improving patient outcomes. By providing more reliable and detailed information to medical doctors, our model can assist in more informed clinical decision-making, potentially reducing the need for unnecessary biopsies and improving the overall efficiency of the screening process.

It is noteworthy that when these 67 unlabeled images were utilized as an independent test dataset, our proposed enhanced RetinaNet model demonstrated that there were not masses in the unlabeled 67 mammogram images in INbreast database. This re

留言 (0)

沒有登入
gif