Traditional versus modern approaches to screening mammography: a comparison of computer-assisted detection for synthetic 2D mammography versus an artificial intelligence algorithm for digital breast tomosynthesis

Study population/case selection

Screening mammographic examinations were collected under an IRB-approved case collection protocol with a waiver for written informed consent. Under this protocol, cases of various dispositions were acquired consecutively between January 2016 and August 2018 from five clinical sites, and a stratified random sample of 764 cases was drawn. Of the 764 cases, 106 were biopsy-proven cancers and 658 were cancer-negative cases (97 biopsy-proven benign findings, 81 recalled but not recommended for biopsy cases, and 480 cases assessed as negative). One-year follow-up data were not available for the cases.

Cases were included for female subjects of any race/ethnicity who underwent a screening mammography examination. Cases were excluded for subjects with symptomatic lesions, pacemakers in the mammography field of view, breast implants, motion during imaging, skin markers, cut-off anatomy, prior surgery and/or presence of biopsy clips visible on imaging, or missing standard views or reports.

CADe and AI systems

All cases were analyzed by CADe using ImageChecker v10.0 (Hologic, Inc.) and AI using Genius AI Detection v2.0 (Hologic, Inc.). For CADe analysis, synthetic 70 µm-resolution 2D images generated by Hologic tomosynthesis systems in for-processing format following DBT acquisitions were used. The CADe algorithm offers one operating point for synthetic 2D images. Per image, the number of marks is limited to four for calcifications, two for masses, and two for masses/calcifications in the same location. Per case, the number of marks is limited to eight for calcifications, four for masses, and four for masses/calcifications in the same location. For AI analysis, 1 mm slice-thickness, 70 µm-resolution tomosynthesis slices were used. The AI algorithm offers a single operating point and implements a simple capping mechanism of five marks per image for each type of mark, mass and calcifications.

Outcomes

Overall performance of 2D CADe and 3D AI was analyzed with the following metrics: area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and the rate of false marks. Both algorithms generated lesion marks with corresponding scores related to the potential likelihood of malignancy. For each algorithm, an overall case score was defined as the highest score of all lesion marks. AUC was determined using overall case scores from both algorithms. Sensitivity was estimated as the percentage of cancer cases correctly marked by CADe or AI. Sensitivity was further divided into lesion-specific and examination-specific analyses. For lesion-specific analysis, sensitivity was estimated as the percentage of cancer cases with at least one lesion mark in the accurate location by AI or CADe. Examination-specific sensitivity was the percentage of cancer cases with at least one lesion mark, regardless of location accuracy by AI or CADe. Specificity was estimated as the percentage of non-cancer cases with no lesion marks. False-positives were defined as the total number of false marks per study calculated for all non-cancer cases.

Performance, with respect to sensitivity and specificity, was also analyzed by breast density and categorized into two groups: non-dense (almost entirely fatty and scattered areas of fibroglandular density) and dense (heterogeneously dense and extremely dense). Similar to overall performance, sensitivity was further categorized into lesion-specific and examination-specific analysis groups.

Ground truth determination

Determination of ground truth overlays on images for all 106 cancer cases was performed by an independent Mammography Quality Standards Act (MQSA)-qualified and board-certified radiologist (30 years of experience) using anonymized copies of clinical reports and associated images available for review. The expert radiologist reviewed the reports for screening and diagnostic imaging that were dictated by radiologists at the clinical sites. The images from screening, diagnostic, and post-biopsy studies were presented when available. The pathology reports of the biopsied lesions were reviewed to identify those particular lesions that were proven malignant by biopsy. The specific biopsied lesions were overlayed and tagged as malignant.

Generation of ground truth overlays was performed using a proprietary tool, which enables simultaneous display of four standard-view tomosynthesis images in Digital Imaging and Communications in Medicine (DICOM) format and facilitates drawing overlays on top of the images. The overlays were generated by the expert on the tomosynthesis slice where the lesion was in-focus and best visualized.

Statistical analysis

The AUCs with 95% CIs for 2D CADe and 3D AI were calculated with a tool that utilizes the Scikit-learn library in Python. The VassarStats p-value calculator for two independent receiver operating characteristic (ROC) curves was used to compare the AUCs between 2D CADe and 3D AI. Overall sensitivity (lesion-specific and examination-specific) and specificity for 2D CADe and 3D AI were analyzed with a Wilson score interval for the 95% CI and a two-tailed Z-test to compare sensitivity and specificity between the two groups. These performance measures were conducted using Statistics Kingdom calculators. A comparison of false-positive rates between 2D CADe and 3D AI was done using a paired t-test in Excel. For all statistical analysis runs, P < 0.05 was considered statistically significant.

留言 (0)

沒有登入
gif