Training within the current study, using remote e-learning and automated formative assessment, improved overall diagnostic accuracy (concordance with the true outcome, reader agreement with the true outcome (kappa), and DOR) and specificity compared to previous FAST MRI training using in-person versions of the same standardised training programme and assessment dataset delivered as one-to-one [13] or small group [11] training. There was, however, lower sensitivity at cancer detection.
Learning curves of increasing overall accuracy (concordance with the true outcome) and of increasing specificity were observed during the formative assessment task but reader sensitivity did not significantly change, and this was observed for all categories of reader. Those who had attended previous FAST MRI interpretation-training reached peak overall accuracy and specificity at 75 scans read but for those new to FAST MRI interpretation, specificity continued to increase.
Maximising the overall diagnostic accuracy of a test is desirable but, for a given overall diagnostic accuracy, there is a trade-off between the test’s sensitivity and specificity. For a diagnostic imaging test, interpretation-training provides an opportunity to improve overall diagnostic accuracy and can also be used to influence the balance between reader sensitivity and specificity. The choice of which metric (sensitivity or specificity) is more important greatly depends on the pre-test probability of the population to be screened. For example, the survival benefit achieved, through screening women with BRCA mutations (high pre-test probability), using fpMRI, is dependent on fpMRI’s high sensitivity for aggressive breast cancers and necessitates the prioritisation of sensitivity over specificity for this relatively small population of women [19,20,21]. In contrast, the specificity for mammographic mass screening that is achieved through double reading in the NHS Breast Screening Programme (NHSBSP), is 96% [22] while reported reader sensitivity is much lower (67–78%) [23]. For population-risk women, who have a low pre-test probability, specificity is arguably the most important diagnostic accuracy parameter to optimise because small changes in specificity can have a large effect on the number of false positive recalls in a population screening programme, with each recall causing harm to the woman screened and also incurring a financial and workforce cost [24,25,26].
FAST MRI was designed as a screening test that would provide a higher sensitivity for aggressive breast cancers than mammography at a fraction of the cost of fpMRI, through shorter acquisition and reading times [4], with the intention that it could be used to screen a wider population than currently benefit from screening with fpMRI [19, 20]. Trials of breast MRI (scans single read by expert fpMRI readers) for women with dense breasts, but otherwise at population risk of breast cancer, have reported results with high sensitivity (95.7% [3] and 95.2% [1]) but lower specificity (86.7% [3] and 92.6% [1]). If FAST MRI is to be provided at scale to a large population of women with low pre-test probability, then both specificity-optimisation and expansion of the workforce of MRI readers will be required. The specificity achieved for FAST MRI by mammogram readers in the current study following 2 days of standardised training (94%) compares well with the results from both these reported MRI screening trials and approaches the specificity of mammography achieved with double reading within the NHSBSP for population screening (96%) [22].
In the current study, readers achieved, at single read, a sensitivity of 83% in a challenging dataset that included a high proportion of lobular carcinomas and of mammographically occult cancers and an invasive cancer size ≤ 25 mm (Additional file 2) [13]. Whilst this level of sensitivity could be considered insufficient to screen a population at very high risk of breast cancer, it could potentially be increased through double reading [27], and could be adequate to screen a larger population with lower pre-test probability, given the significant gains achieved in specificity and overall diagnostic accuracy.
Achievement of reporting benchmarks for fpMRI and literature comparison of diagnostic accuracyTwo days of standardised FAST MRI interpretation-training, undertaken as remote e-learning, enabled NHSBSP mammogram readers, both those experienced in fpMRI interpretation (Group 1) and novice MRI readers (Group 2), to achieve, at single read of an enriched dataset, benchmarks set for fpMRI interpretation in practice by the American College of Radiology’s Breast Imaging Reporting and Data System (BI-RADS) for both sensitivity (Groups 1 (84%) and 2 (82%) vs. > 80% BI-RADS benchmark [28]) and specificity (Groups 1 (94%) and 2 (93%) vs. > 85% BI-RADS benchmark [28]). Of 43 participants, the two-day remote e-learning programme was sufficient for 43/43 (100%) to achieve specificity above the 85% BI-RADS benchmark and for 33/43 (77%) to achieve sensitivity above the 80% BI-RADS benchmark.
Novice MRI readers (Group 2) achieved similar sensitivity to experienced fpMRI readers (Group 1) (p = 0.14) but lower specificity (p = 0.001) although specificity differed between groups by only one percentage point (Group 1: 94% and Group 2: 93%).
The single reading performance at FAST MRI achieved by experienced (Group 1) and novice (Group 2) readers in the current study, reading an enriched dataset, compares well with published figures for diagnostic performance at fpMRI for radiologists experienced in breast MRI interpretation in community screening practice in the USA (Breast Cancer Surveillance Consortium (BCSC) [29]: sensitivity: 84% (Group 1) and 82% (Group 2) vs. 81% (BCSC), and specificity: 94% (Group 1) and 93% (Group 2) vs. 83% (BCSC).
Comparison between the performance of those who had previously attended in-person FAST MRI interpretation training and those who had notWhilst the reader agreement with the true outcome (kappa) and the DOR did not differ significantly between the readers who had previously attended in-person FAST MRI interpretation-training (11/22 in Group 1 and 7/21 in Group 2) and those who had not, the sensitivity for cancer detection was higher and the specificity lower for the “attended” cohort than for the “not attended” cohort. Looking at the individual performance, during a previous study [11], of the 14 participants of the current study who had attended previous small group training, 8 of these participants had a sensitivity in the top 9 sensitivities of participants in the previous study and none were in the bottom 7 sensitivities [11]. Additionally, 8 of these participants had specificity in the bottom 12 for specificity in the previous study and 3 were in the top 11 specificities [11]. Therefore, self-selection bias could have contributed to the within group significant differences of sensitivity and specificity found for attendance vs. non-attendance at previous in person training.
Literature comparison – the effect of batch size on diagnostic performanceThe Co-Ops Study assessed the effect of reading practice, including batch size, on reader diagnostic performance in mammography within the NHSBSP and demonstrated increased specificity with increased batch size up to 40 mammograms per batch with the trend continuing in longer batches [30]. The current study, whilst it showed a trend for increasing specificity with batch size up to 50 FAST MRI scans per batch and decreasing sensitivity with increasing batch size, also demonstrated that concordance with the true outcome (as a measure of overall accuracy) tended to worsen when more than 50 scans were read within one batch. This accords with results from a study of 2,937,312 mammogram reads that demonstrated both small increases in specificity and small decreases in sensitivity for mammograms read at later positions within a batch. The authors of the study suggested that optimal batch-size for reading mammograms could be 60–70 reads per batch [31].
One possible explanation for the optimal batch size for FAST MRI (50 scans per batch) being smaller than that suggested for mammograms could be the difference in complexity between reading FAST MRI scans and mammograms. Reading FAST MRI scans in the current study could more quickly cause fatigue for readers than reading mammograms because FAST MRI reading format requires more images to be reviewed per scan than for a mammogram. However, the reading format of digital breast tomosynthesis (DBT)(2D plus stack of reconstructed slabs) has a similar complexity to that of FAST MRI (MIP plus stack of slices) and although we could find no study that reported the effect of reading batch size on the diagnostic accuracy of DBT, evidence of increasing reader fatigue during the process of reading a batch of 40 DBT scans has been reported [32].
Literature comparison—reading timesThe reading times achieved by readers in this study (56 and 78 s for Groups 1&2) were longer than times reported for NHSBSP mammogram readers to interpret mammograms (35 and 76 s [33, 34]) and about half that reported for NHSBSP mammogram readers to interpret DBT (2.81 min) [32]. However, evidence is emerging that various AI strategies may reduce reading times for DBT without affecting accuracy [35, 36]. In the future similar approaches may prove valuable for FAST MRI.
Limitations of the current studyReaders who had previously attended FAST MRI interpretation-training had interpreted the same test set of 125 FAST MRI scans during the previous study. However, since they had not previously seen the ground truth (true outcome) of the scans in the test set at any time, and there was an average time interval of 24 months (range 17–30 months) between reading the test set in the two studies, it is unlikely that their diagnostic performance was affected by this.
The test set was read outside normal clinical practice and therefore reader performance is likely to have been subject to a laboratory effect [37].
Readers were free to self-select batch length when reading the test set assessment task. Therefore, our conclusions on optimal batch size could potentially have been confounded through self-selection bias. However, similar results were seen with the subset of readers who completed all 125 scans of the test set in a single batch (7 from Group 1 and 8 from Group 2) (Additional file 3), suggesting the effect of self-selection bias, although unquantifiable, is likely to be small.
Implications of the researchThe results of the current study demonstrate that the inclusion of immediate feedback for each scan during test set interpretation in FAST MRI reader training optimised specificity and overall diagnostic accuracy whilst maintaining high levels of sensitivity, which would be suitable for a screened population with low pre-test probability.
留言 (0)