Gradient-Based Saliency Maps Are Not Trustworthy Visual Explanations of Automated AI Musculoskeletal Diagnoses

P. Rajpurkar and M. P. Lungren, “The Current and Future State of AI Interpretation of Medical Images,” N. Engl. J. Med., vol. 388, no. 21, pp. 1981–1990, May 2023.

Article  PubMed  Google Scholar 

A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artif. Intell. Rev., vol. 53, no. 8, pp. 5455–5516, Dec. 2020.

Article  Google Scholar 

P. Rajpurkar et al., “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” arXiv [cs.CV], 14-Nov-2017.

P. Rajpurkar et al., “Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists,” PLoS Med., vol. 15, no. 11, p. e1002686, Nov. 2018.

Article  PubMed  PubMed Central  Google Scholar 

R. Ranjbarzadeh, A. Bagherian Kasgari, S. Jafarzadeh Ghoushchi, S. Anari, M. Naseri, and M. Bendechache, “Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images,” Sci. Rep., vol. 11, no. 1, p. 10930, May 2021.

L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal, “Explaining Explanations: An Overview of Interpretability of Machine Learning,” arXiv [cs.AI], 31-May-2018.

J. R. Zech, M. A. Badgeley, M. Liu, A. B. Costa, J. J. Titano, and E. K. Oermann, “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study,” PLoS Med., vol. 15, no. 11, p. e1002683, Nov. 2018.

Article  PubMed  PubMed Central  Google Scholar 

J. Teneggi, P. H. Yi, and J. Sulam, “Examination-level Supervision for Deep Learning–based Intracranial Hemorrhage Detection at Head CT,” Radiology: Artificial Intelligence, p. e230159, Dec. 2023.

N. Bien et al., “Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet,” PLoS Med., vol. 15, no. 11, p. e1002699, Nov. 2018.

Article  PubMed  PubMed Central  Google Scholar 

A. Mitani et al., “Detection of anaemia from retinal fundus images via deep learning,” Nat Biomed Eng, vol. 4, no. 1, pp. 18–27, Jan. 2020.

Article  PubMed  Google Scholar 

Z. Kang, E. Xiao, Z. Li, and L. Wang, “Deep Learning Based on ResNet-18 for Classification of Prostate Imaging-Reporting and Data System Category 3 Lesions,” Acad. Radiol., Jan. 2024.

L. Alzubaidi et al., “Trustworthy deep learning framework for the detection of abnormalities in X-ray shoulder images,” PLoS One, vol. 19, no. 3, p. e0299545, Mar. 2024.

Article  CAS  PubMed  PubMed Central  Google Scholar 

J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity Checks for Saliency Maps,” arXiv [cs.CV], 08-Oct-2018.

J. Zhang, H. Chao, G. Dasegowda, G. Wang, M. K. Kalra, and P. Yan, “Revisiting the Trustworthiness of Saliency Methods in Radiology AI,” Radiol Artif Intell, vol. 6, no. 1, p. e220221, Jan. 2024.

Article  PubMed  Google Scholar 

N. Arun et al., “Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging,” Radiol Artif Intell, vol. 3, no. 6, p. e200267, Nov. 2021.

Article  PubMed  PubMed Central  Google Scholar 

A. Saporta et al., “Benchmarking saliency methods for chest X-ray interpretation,” Nature Machine Intelligence, vol. 4, no. 10, pp. 867–878, Oct. 2022.

Article  Google Scholar 

W. Jin, X. Li, and G. Hamarneh, “One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images,” arXiv [cs.CV], 11-Jul-2021.

P. Rajpurkar et al., “MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs,” arXiv [physics.med-ph], 11-Dec-2017.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” arXiv [cs.CV], 02-Dec-2015.

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” arXiv [cs.CV], 25-Aug-2016.

S. S. Halabi et al., “The RSNA Pediatric Bone Age Machine Learning Challenge,” Radiology, vol. 290, no. 2, pp. 498–503, Feb. 2019.

Article  PubMed  Google Scholar 

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” arXiv [cs.CV], 07-Oct-2016.

K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv [cs.CV], 20-Dec-2013.

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, Sydney, NSW, Australia, 2017, pp. 3319–3328.

D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “SmoothGrad: removing noise by adding noise,” arXiv [cs.LG], 12-Jun-2017.

A. Kapishnikov, T. Bolukbasi, F. Viégas, and M. Terry, “XRAI: Better Attributions Through Regions,” arXiv [cs.CV], 06-Jun-2019.

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for Simplicity: The All Convolutional Net,” arXiv [cs.LG], 21-Dec-2014.

R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, “Grad-CAM: Why did you say that?,” arXiv [stat.ML], 22-Nov-2016.

S. Hooker, D. Erhan, P.-J. Kindermans, and B. Kim, “A benchmark for interpretability methods in deep neural networks,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2019, pp. 9737–9748.

J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 6, pp. 679–698, Jun. 1986.

Article  CAS  PubMed  Google Scholar 

M. He, X. Wang, and Y. Zhao, “A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs,” Sci. Rep., vol. 11, no. 1, p. 9097, Apr. 2021.

Article  CAS  PubMed  PubMed Central  Google Scholar 

J. Irvin et al., “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,” arXiv [cs.CV], 21-Jan-2019.

L. Oakden-Rayner, J. Dunnmon, G. Carneiro, and C. Re, “Hidden stratification causes clinically meaningful failures in machine learning for medical imaging,” in Proceedings of the ACM Conference on Health, Inference, and Learning, Toronto, Ontario, Canada, 2020, pp. 151–159.

G. Yona and D. Greenfeld, “Revisiting Sanity Checks for Saliency Maps,” arXiv [cs.LG], 27-Oct-2021.

S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” arXiv [cs.AI], 22-May-2017.

J. Teneggi, A. Luster, and J. Sulam, “Fast Hierarchical Games for Image Explanations,” arXiv [cs.CV], 13-Apr-2021.

J. Teneggi, B. Bharti, Y. Romano, and J. Sulam, “SHAP-XRT: The Shapley Value Meets Conditional Independence Testing,” Transactions on Machine Learning Research, 11-Jul-2023.

Z. Liu, E. Adeli, K. M. Pohl, and Q. Zhao, “Going Beyond Saliency Maps: Training Deep Models to Interpret Deep Models,” Inf. Process. Med. Imaging, vol. 12729, pp. 71–82, Jun. 2021.

PubMed  PubMed Central  Google Scholar 

留言 (0)

沒有登入
gif