Inter-reader agreement of the BI-RADS CEM lexicon

The CEM BI-RADS lexicon provides standardized and formal definitions for the description of breast lesions and facilitates communication between breast specialists [9, 10]. Its clinical utility depends on the ability of individual readers to agree on interpretations, to ensure consistency in clinical recommendations. In our experience, inter-reader agreement on the CEM lexicon was moderate-substantial for most features, with some exceptions for RC descriptors and BI-RADS assessment, which did not affect diagnostic performance.

Focusing on RC images only, the strongest agreement was found for the assessment of the type of enhancement, particularly for the identification of masses (ĸ = 0.73). The inter-reader agreement was moderate for non-masses, while it was only fair for the definition of enhancing asymmetry. To the best of our knowledge, this is the first study to assess inter-reader agreement for the CEM BI-RADS lexicon. Therefore, our results about post-contrast features can be compared only with previous studies that focused on MRI, particularly the study by Grimm et al [13]. The agreement on the assessment of the type of enhancement obtained in their study on MRI is lower than that obtained from our study on CEM; these findings could be related to the reader’s greater familiarity with these CEM descriptors as they are similar to those used in breast MRI. Considering the descriptors separately, however, the trend of the results reverses, with the agreement for each descriptor separately being slightly higher in the study by Grimm et al on MRI [13]. In our analysis, the highest agreement was found for mass enhancement, namely moderate on shape and margin, and substantial on internal enhancement pattern. These levels of agreement are slightly lower than those found for MRI, which were substantial for all three mass descriptors [13]. The agreement on CEM non-mass descriptors was overall lower, especially regarding non-mass distribution. The study by Grimm et al [13] found a substantial agreement, despite the fact that the descriptors provided in CEM evaluation were the same as those used in MRI. As is already known for MRI, non-mass lesions are more complex to detect and characterize [14, 15]; this seems to be true also for CEM, and to be reflected by a lower inter-reader agreement for CEM, in which these difficulties are increased by the lack of 3D data.

A comparison between CEM and MRI was made in the study of Knogler et al [16], who, before the release of the CEM BI-RADS lexicon, used the MRI BI-RADS descriptors for the evaluation of CEM images. They demonstrated that enhancement characteristics were similar in the malignant cases, findings confirmed by our study in the analysis considering benign and malignant lesions separately.

The lowest agreement was found for the definition of enhancing asymmetry. As this descriptor was not present in previous imaging modalities and it is a newly introduced term with which readers are still unfamiliar, this might explain the results. Another aspect that might justify the low agreement for the descriptors of non-mass enhancement and enhancing asymmetry is the difference between two “similar” terms belonging, however, to different breast imaging lexicons, namely asymmetries in the BI-RADS MG lexicon [10] and enhancing asymmetry in the BI-RADS CEM lexicon [9]. In the BI-RADS MG lexicon, the term asymmetry refers to an area of fibro glandular-dense tissue, not conforming to the definition of a radiodense mass, that is visible on only one mammographic projection. In addition to asymmetry, however, there are three other types of asymmetries in the same lexicon, namely global asymmetry, focal asymmetry, and developing asymmetry which are visible in more than one projection. This difference in the spectrum of asymmetries is not present in the CEM BI-RADS lexicon and may have led to confusion in the interpretation of the findings. For example, an enhancement that cannot be defined as a mass, visible in more than one RC projection, should not be defined as an “enhancing asymmetry” but as a “non-mass enhancement with focal distribution”. The low agreement values on the non-mass enhancement and enhancing asymmetry descriptors suggest that they need to be evaluated further in the future.

In 2022, Nicosia et al [17] created a predictive score for the malignancy of a breast lesion based on the main contrast enhancement features (intensity, pattern, margin, and ground glass) on CEM; an experienced breast imaging radiologist was asked to evaluate 377 lesions and assign a score of 0 or 1 for each descriptor, depending on whether the enhancement characteristic was predictive of benignity or malignancy. Then, an overall enhancement score ranging from 0 to 4 was obtained and the histological results were considered the gold standard in the evaluation of the relationship between enhancement patterns and malignancy. Although the study was conducted before the publication of the BIRADS CEM lexicon, some of the descriptors they used are the same as those found in the lexicon (e.g., regular or irregular margin morphology, homogeneous or heterogeneous enhancement pattern). The study by Nicosia et al [17] showed that some features of mass and non-mass enhancements on CEM are important predictors for the differentiation of benign from malignant lesions. Inter-reader agreements for major malignancy predictors such as “irregular and spiculated mass margins” and “heterogeneous enhancement pattern” were notably high in our study, particularly among malignant lesions. Similarly, benign features such as “circumscribed mass margins” and “homogeneous mass internal enhancement pattern” showed moderate inter-reader agreement, especially among benign lesions. These findings suggest readers’ familiarity with these descriptors, particularly in malignancy characterization.

The agreement for the assessment of breast density on LE images was moderate (ĸ = 0.569), consistent with that of earlier studies that have carried out the same analysis [18,19,20]; specifically, in our study, the highest agreement was recorded for the category “scattered areas of fibroglandular density” (ĸ = 0.601), which, in the study by Ciatto et al [19] on DM, did not show a high agreement (ĸ = 0.25). Increased use of the BI-RADS, and consequent readers’ confidence in the classification, might explain this difference. The overall agreement for the description of the type of lesion on LE images was substantial (ĸ = 0.654). Similar results were also reported by Berg et al [18] and Lee et al [21]; in particular, in our study, there was an almost perfect agreement in the identification of microcalcifications (ĸ = 0.820), the same value reported by Lee et al [21]. Regarding the evaluation of the associated architectural distortion, the inter-reader agreement in our study was moderate (ĸ = 0.496), significantly higher than that reported by Lee et al (fair, ĸ = 0.28) [21]. In this instance, together with readers’ experience, improvements in image quality in MG in the last several years might have facilitated the detection of these types of findings. Overall, compared to previous studies conducted on DM, the interobserver agreement on LE image features was equal if not greater, particularly for associated architectural distortions.

The agreement for the final BI-RADS assessment reported in our analysis (LE BI-RADS ĸ = 0.421, CEM BI-RADS ĸ = 0.364) might seem low, but it was not dissimilar from that reported by previous studies on MG [18, 22]. The evaluation of the diagnostic performance of individual readers, with a good performance, especially for sensitivity, which was similar to that in studies from the literature [4, 5, 7, 23], confirms that these results reflect a variation in the assessment threshold of individual readers for categories with similar outcomes (biopsy for BI-RADS 4–5; no biopsy for BI-RADS 1-2-3). The lower agreement in the double evaluation pre- and post-contrast, compared with that in LE, could be related to not only the readers’ greater familiarity in scoring of non-contrast images but also to the greater variability that the findings in RC images add.

Our study has some limitations. The design of our study was retrospective and conducted in a single center, although the three readers were from different institutions and had been trained in different centers; this may have reduced their agreement, although they were all familiar with the BI-RADS lexicon. The three readers had similar levels of experience in CEM, so it was not possible to examine the potential relevance of different levels of experience, even in a two-reader analysis. Despite the fact that the readers had breast imaging experience in interpreting CEM images, the results may have been influenced by the lack of familiarity with the terminology of the CEM BI-RADS lexicon, as it is the first edition and has been released relatively recently. It is likely that the inter-reader agreement could be improved by repeating the study after a longer exposure and regular use of the new lexicon. Finally, the images were acquired on a single vendor scanner and with only one type of contrast medium; agreement might differ between different vendors.

In conclusion, the results of our study showed moderate to substantial inter-reader agreement for most lesion features on both LE and RC images. A lower inter-reader agreement was found for the descriptors of non-mass enhancement and the newly introduced enhancing asymmetry. These might be related to the lack of familiarity of the readers with the new descriptor. Inter-reader agreement for LE and CEM BI-RADS assessment was moderate and fair, respectively. Looking at the diagnostic performance, the disagreement on BI-RADS assessment did not translate into a significant variability in breast cancer diagnosis.

留言 (0)

沒有登入
gif