This paper reviews the literature on the automated segmentation and/or classification of breast abnormalities from sequential mammograms, using feature-based ML and DL techniques. The first part is devoted to the importance of including prior views in the interpretation of mammography, with studies that compare the performance of radiologists with and without the use of prior mammographic images. Following, image registration techniques, which are of critical importance in the comparison of sequential mammograms, are summarized. The following section is devoted to temporal analysis of sequential mammograms for the diagnosis of breast MCs and masses. Subsequently, the implementation of detection and classification of breast abnormalities using the subtraction of temporally sequential mammograms is described. Finally, a description of the open access mammography datasets is provided. This review concludes with an overall discussion.
2. Review MethodologyThe bibliographic literature was thoroughly searched to identify all the relevant studies. This search was limited to articles published between 2000 and 2022, written in English. Articles that included sequential data from screening methods other than mammography were excluded. Articles were also excluded if any important information regarding the algorithm’s performance was missing, making the study nonreproducible. Several review articles related to the diagnosis of breast cancer using mammograms appear in the literature [2,6,7,20,21,22,23]. However, unlike this review, none of these articles is devoted to the analysis of sequential mammograms for the detection and classification of breast cancer.After the selection process was completed, the articles were split into two major groups, based on the approach that was used to exploit the sequential information: (a) temporal analysis, which uses both the current and prior images to extract relevant features and, then, combines them, and (b) temporal subtraction, where the prior image is subtracted from the recent one before further analysis. Subsequently, the articles in each group were further divided into two subcategories based on the breast abnormality under investigation: (a) MCs or (b) masses. Finally, each subcategory was further divided according to the classification approach: (a) not employing ML, (b) feature-based ML, or (c) DL.
Overall, there is no straightforward way to directly compare all the studies or to definitely conclude which is the most successful algorithm. The main reason is that each study is using different datasets, processing techniques, validation methods, evaluation metrics, etc. Thus, all their methodological differences must be considered when comparing results. In this review, the area under the receiver operating characteristic curve (AUC) is often used as a metric of performance to compare the various algorithms. The AUC shows the efficacy of the classification model in separating the classes; thus, the higher the AUC, the more successful the model.
3. Importance of Prior ViewsComparison between recent and prior mammographic views is a practice that has been employed by radiologists since the establishment of mammography as the standard screening procedure for breast cancer. During the visual inspection of images, the evolution of disease can be better assessed using sequential information, which makes any change easier to visualize. Comparisons between the images increase the effectiveness of the diagnosis and reduce the recall rates (20–50% of women recalled are found to have a malignancy [24]).Gelig et al. evaluated the effect of the availability of prior mammograms on the performance of the radiologists during mammographic screening [25]. Three experienced radiologists assessed 150 sets of sequential mammograms twice: once without seeing the prior view (using only the most recent mammogram) and once using both the recent and prior mammographic views. The radiologists detected an average of 40 cancers with 87% specificity using only the most recent mammograms, as opposed to 37 cancers with 96% specificity when using both sequential mammograms. The increase in specificity was statistically significant, proving that the addition of the prior views reduced the recall rate. Five years later, Varela et al. also verified the importance of including prior mammograms for the classification of benign and malignant breast masses [26]. In that study, five senior and one resident radiologist evaluated 198 sequential mammograms. The mammograms were evaluated twice: once without and once with the prior images. The use of prior views increased the classification performance from 0.76 to 0.8 AUC, which was statistically significant.Hadjiiski et al. compared the performance of eight accredited radiologists and two breast imaging fellows, with and without the use of a, so-called, interval change CAD system [27]. The software used information from prior and recent mammograms to estimate a malignancy rating. A total of 90 pairs of sequential mammograms were gathered, with 47 malignant and 43 benign masses. The introduction of the interval change analysis CAD algorithm increased the AUC from 0.83 to 0.87, proving that the analysis of prior mammograms could significantly improve the performance of the radiological assessment. Timp et al. compared the effect of a single independent reading, with a CAD system with independent double readings, for the diagnosis of breast abnormalities on 198 cases of sequential mammograms (Figure 2) [24]. Six radiologists participated in the study and three reading scenarios were considered: single reading, single reading with CAD, and independent double readings. The CAD algorithms, which included temporal information, statistically improved the diagnostic performance (0.83 vs. 0.81 AUC). 4. RegistrationFor the development of algorithms that can effectively compare sequential mammogram pairs, accurate matching between the prior and recent images, with image registration, is of critical importance. Image registration can be defined as the process of aligning two images, where one image is the reference and remains fixed and the other is the registered or moving image. The main objective is to find the optimal transformation that aligns the points of interest in the moving image to better match the fixed image. However, registration cannot be easily applied to mammograms due to the significant variations of the breast tissue between screenings, variations in breast compression, and operating factors at the time of imaging [28]. Several algorithms have been developed to address the challenges of image registration, with some approaches specifically formulated for medical images [29,30,31] and mammograms [32].Overall, registration algorithms can be divided into “global” or “local” based on the extent of the image information used. An algorithm is classified as global if all the pixels presented in an image are used. Rigid and Affine transformations (translation, rotation, shearing) are considered global registration techniques, whereas all pixels undergo the same transformation [29]. On the other hand, an algorithm is classified as local if only some of the pixels included in a region of interest (ROI) are used at a time. Local methods, also known as deformable methods, operate on local similarities and positions and include B-spline free-form deformations [33], polyrigid transformation [34] and the Demons algorithm [35]. Registration techniques also vary with regard to the features used. Techniques based on pixel intensity are called “intensity-based”, whereas the geometrical structures of the images are known as “feature-based”. Usually, intensity-based methods are global, and feature-based methods are local. Although these methods are often applied independently, combining two or more approaches can improve the performance in terms of accuracy and robustness [31]. The combination of global and local registration algorithms, for example, can recover the main (global) scale differences but will also account for the localized nonlinear deformations (local) [32].Various image registration techniques have been specifically applied to mammograms (Table 1). van Engeland et al. compared four different methodologies for the registration of temporally sequential mammograms. Overall, the use of mutual information provided the best performance for global mammogram registration [36]. Vujovic and Brzakovic and Marti et al. developed local registration algorithms that identified and used control points or common structures between prior and recent images, in order to establish a correspondence between those points [37,38]. Sanjay-Gopal et al., Hadjiiski et al., and Filev et al. designed computerized methods for interval change analysis, using a regional registration technique to identify corresponding lesions on temporal pairs of mammograms [19,39,40]. In a relatively recent study, Ma et al. introduced a method that incorporates fuzzy sets, based on spatial relationships, along with graph matching [41]. Hybrid registration techniques for mammogram matching have also been proposed by Wirth et al., Timp and Karssemeijer, and Li et al. [42,43,44]. A temporal mammogram registration methodology, based on the curvilinear coordinates, was proposed by Abdel-Nasser et al. (Figure 3). This method combined global and local deformations in the breast area in order to improve the registration performance [45]. Recently, Sharma et al. proposed a technique for the segmentation of breast regions using a combination of data-driven clustering and deformable image registration. This approach combines traditional segmentation approaches with ML techniques and clustering, for improved registration results [46]. Furthermore, Mendel et al. exploited B-splines and multi-resolution registration to evaluate architecture changes for cancer risk assessment [47]. 6. Temporal SubtractionTo address some of the limitations of temporal analysis, temporal subtraction was developed by Loizidou et al. for the detection and classification of breast abnormalities [57]. The key difference between temporal analysis and temporal subtraction, is that the later exploits the entire prior image by subtracting its registered version from the entire recent image. Direct subtraction of the mammograms effectively removes the regions that have remained unchanged between screenings and enhances the contrast of new changes.Temporal subtraction was first applied for the detection and classification of breast MCs [58]. For that purpose, 100 pairs of digital mammograms were collected with precise annotation of each individual MC (benign and suspicious), as assessed by two expert radiologists (Figure 8). That dataset is now available online with open access [59]. Pre-processing, registration, subtraction, and segmentation effectively detected all ROIs that could be MCs. Machine learning was then used to reject falsely detected regions using several shape, texture, and intensity features extracted from all the ROIs. Subsequently, the correctly detected MCs were classified as BI-RADS benign or suspicious, using leave-one-patient-out cross-validation. The classification performance increased by approximately 7% in terms of accuracy (90.3% vs. 82.7%) when using temporal subtraction, as compared to using temporal analysis on the same dataset (Table 6).Temporal subtraction was also applied to the detection and classification of breast masses. A new dataset was collected by Loizidou et al., consisting of 80 pairs of digital temporally sequential mammograms [60]. This dataset is also available online with open access [61]. The algorithm consists of three steps: (a) detection of the masses, which includes pre-processing, image registration, subtraction, and segmentation (Figure 9); (b) FP elimination, where falsely detected ROIs are rejected using feature extraction and ML; and (c) classification, where the detected breast masses are classified as benign or suspicious. The classifiers were trained using leave-one-patient-out cross-validation, per patient. The classification performance reached 98% accuracy when using temporal subtraction, as opposed to 92.7% when using temporal analysis on the same dataset (Table 7). 8. DiscussionThis review summarizes the recent advances in the automated detection and/or classification of breast abnormalities using temporally sequential mammograms. Unfortunately, comparing all the existing techniques is very challenging. Although the main steps of these algorithms are similar, there are several possible approaches to implement each step and, further, to analyze the images. Another important parameter that makes the comparison between the studies difficult, is the datasets used. Open-access mammographic databases do not include sequential mammograms; thus, each research group has independently resorted to collecting sequential data. Unfortunately, only two datasets are available with open access [59,61]. Furthermore, even if the same dataset and classifier are exploited, other parameters can significantly vary. One such example is the validation method, which can significantly affect the outcome. The majority of the studies used k-fold cross-validation, with the number of k varying depending on the number of subjects. However, the construction of the training and test sets is crucial. Introducing information from the same patient in both sets, by performing cross-validation per image, or per ROI, instead of per patient, artificially increases the performance of the algorithm. All the ROIs or images of the same patient should be included either in the training or in the validation set, to avoid such bias. Unfortunately, that practise is not always adhered to, resulting in approaches that fail in real-world applications.Despite its limitations, temporal analysis, clearly, offers an advantage in the detection and/or classification of breast masses and MCs, with a significant increase in the performance (0.77 vs. 0.90 AUC for the classification of masses [55] or 0.87 vs. 0.81 AUC for the classification of MCs [56]). However, temporal analysis offers no benefit when a newly developed abnormality appears, with few traces in the prior mammogram. To address some of the limitations, subtraction of temporally sequential mammograms exploits the whole prior screening, by subtracting the registered version of the prior images from recent ones. With direct subtraction of the mammogram pairs, ROIs that remained unchanged between screenings are effectively removed, which improves the detection and classification performance (90.3% accuracy and 0.87 AUC for the classification of MCs [58] or 98% accuracy and 0.98 AUC for the classification of masses [60]).
留言 (0)