Gradient-Based Saliency Maps Are Not Trustworthy Visual Explanations of Automated AI Musculoskeletal Diagnoses

P. Rajpurkar and M. P. Lungren, “The Current and Future State of AI Interpretation of Medical Images,” N. Engl. J. Med., vol. 388, no. 21, pp. 1981–1990, May 2023.

Article PubMed Google Scholar

A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artif. Intell. Rev., vol. 53, no. 8, pp. 5455–5516, Dec. 2020.

Article Google Scholar

P. Rajpurkar et al., “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” arXiv [cs.CV], 14-Nov-2017.

P. Rajpurkar et al., “Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists,” PLoS Med., vol. 15, no. 11, p. e1002686, Nov. 2018.

Article PubMed PubMed Central Google Scholar

R. Ranjbarzadeh, A. Bagherian Kasgari, S. Jafarzadeh Ghoushchi, S. Anari, M. Naseri, and M. Bendechache, “Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images,” Sci. Rep., vol. 11, no. 1, p. 10930, May 2021.

L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal, “Explaining Explanations: An Overview of Interpretability of Machine Learning,” arXiv [cs.AI], 31-May-2018.

J. R. Zech, M. A. Badgeley, M. Liu, A. B. Costa, J. J. Titano, and E. K. Oermann, “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study,” PLoS Med., vol. 15, no. 11, p. e1002683, Nov. 2018.

Article PubMed PubMed Central Google Scholar

J. Teneggi, P. H. Yi, and J. Sulam, “Examination-level Supervision for Deep Learning–based Intracranial Hemorrhage Detection at Head CT,” Radiology: Artificial Intelligence, p. e230159, Dec. 2023.

N. Bien et al., “Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet,” PLoS Med., vol. 15, no. 11, p. e1002699, Nov. 2018.

Article PubMed PubMed Central Google Scholar

A. Mitani et al., “Detection of anaemia from retinal fundus images via deep learning,” Nat Biomed Eng, vol. 4, no. 1, pp. 18–27, Jan. 2020.

Article PubMed Google Scholar

Z. Kang, E. Xiao, Z. Li, and L. Wang, “Deep Learning Based on ResNet-18 for Classification of Prostate Imaging-Reporting and Data System Category 3 Lesions,” Acad. Radiol., Jan. 2024.

L. Alzubaidi et al., “Trustworthy deep learning framework for the detection of abnormalities in X-ray shoulder images,” PLoS One, vol. 19, no. 3, p. e0299545, Mar. 2024.

Article CAS PubMed PubMed Central Google Scholar

J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity Checks for Saliency Maps,” arXiv [cs.CV], 08-Oct-2018.

J. Zhang, H. Chao, G. Dasegowda, G. Wang, M. K. Kalra, and P. Yan, “Revisiting the Trustworthiness of Saliency Methods in Radiology AI,” Radiol Artif Intell, vol. 6, no. 1, p. e220221, Jan. 2024.

Article PubMed Google Scholar

N. Arun et al., “Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging,” Radiol Artif Intell, vol. 3, no. 6, p. e200267, Nov. 2021.

Article PubMed PubMed Central Google Scholar

A. Saporta et al., “Benchmarking saliency methods for chest X-ray interpretation,” Nature Machine Intelligence, vol. 4, no. 10, pp. 867–878, Oct. 2022.

Article Google Scholar

W. Jin, X. Li, and G. Hamarneh, “One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images,” arXiv [cs.CV], 11-Jul-2021.

P. Rajpurkar et al., “MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs,” arXiv [physics.med-ph], 11-Dec-2017.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” arXiv [cs.CV], 02-Dec-2015.

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” arXiv [cs.CV], 25-Aug-2016.

S. S. Halabi et al., “The RSNA Pediatric Bone Age Machine Learning Challenge,” Radiology, vol. 290, no. 2, pp. 498–503, Feb. 2019.

Article PubMed Google Scholar

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” arXiv [cs.CV], 07-Oct-2016.

K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv [cs.CV], 20-Dec-2013.

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, Sydney, NSW, Australia, 2017, pp. 3319–3328.

D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “SmoothGrad: removing noise by adding noise,” arXiv [cs.LG], 12-Jun-2017.

A. Kapishnikov, T. Bolukbasi, F. Viégas, and M. Terry, “XRAI: Better Attributions Through Regions,” arXiv [cs.CV], 06-Jun-2019.

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for Simplicity: The All Convolutional Net,” arXiv [cs.LG], 21-Dec-2014.

R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, “Grad-CAM: Why did you say that?,” arXiv [stat.ML], 22-Nov-2016.

S. Hooker, D. Erhan, P.-J. Kindermans, and B. Kim, “A benchmark for interpretability methods in deep neural networks,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2019, pp. 9737–9748.

J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 6, pp. 679–698, Jun. 1986.

Article CAS PubMed Google Scholar

M. He, X. Wang, and Y. Zhao, “A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs,” Sci. Rep., vol. 11, no. 1, p. 9097, Apr. 2021.

Article CAS PubMed PubMed Central Google Scholar

J. Irvin et al., “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,” arXiv [cs.CV], 21-Jan-2019.

L. Oakden-Rayner, J. Dunnmon, G. Carneiro, and C. Re, “Hidden stratification causes clinically meaningful failures in machine learning for medical imaging,” in Proceedings of the ACM Conference on Health, Inference, and Learning, Toronto, Ontario, Canada, 2020, pp. 151–159.

G. Yona and D. Greenfeld, “Revisiting Sanity Checks for Saliency Maps,” arXiv [cs.LG], 27-Oct-2021.

S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” arXiv [cs.AI], 22-May-2017.

J. Teneggi, A. Luster, and J. Sulam, “Fast Hierarchical Games for Image Explanations,” arXiv [cs.CV], 13-Apr-2021.

J. Teneggi, B. Bharti, Y. Romano, and J. Sulam, “SHAP-XRT: The Shapley Value Meets Conditional Independence Testing,” Transactions on Machine Learning Research, 11-Jul-2023.

Z. Liu, E. Adeli, K. M. Pohl, and Q. Zhao, “Going Beyond Saliency Maps: Training Deep Models to Interpret Deep Models,” Inf. Process. Med. Imaging, vol. 12729, pp. 71–82, Jun. 2021.

PubMed PubMed Central Google Scholar

View original article

JOURNAL OF DIGITAL IMAGING

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Gradient-Based Saliency Maps Are Not Trustworthy Visual Explanations of Automated AI Musculoskeletal Diagnoses

留言 (0)