ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives

Elsevier

Available online 27 April 2024

Diagnostic and Interventional ImagingAuthor links open overlay panel, , , , , , , , Highlights•

A systematic review revealed that 84.1% (37 out of 44) of radiology studies show ChatGPT's effectiveness, and none suggested unsupervised use in clinical practice.

Key benefits of ChatGPT include its valuable assistance for radiologists' decision-making, effectiveness in organizing and simplifying radiology reports, and improved patient outcomes.

ChatGPT has the potential to revolutionize radiology; however, it is too soon to confirm its complete proficiency and accuracy, and it necessitates oversight and preparation before implementation into clinical practice.

AbstractPurpose

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

Materials and methods

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

Results

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists’ decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

Conclusion

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

Keywords

Artificial intelligence

ChatGPT

Decision support systems

Large language model

OpenAI

AbbreviationsAI

Artificial intelligence

NOS

Newcastle-Ottawa Scale

IR

Interventional radiology

ACR

American College of Radiology

SIR

Society of Interventional Radiology

© 2024 The Author(s). Published by Elsevier Masson SAS on behalf of Société française de radiologie.

留言 (0)

沒有登入
gif