Imagine performing a gastroscopy on a 50-year-old lady with epigastric pain, guided by an advanced artificial intelligence (AI) system that suggests autoimmune gastritis and encourages a biopsy. With the AI tool offering an excellent diagnostic yield of 90% sensitivity and 90% specificity, would you follow its recommendation?
“Alongside this advancement, however, the study also highlights the real-world challenge of the “low-prevalence effect” in human–AI interaction, which both AI developers and users need to recognize.”
In this issue of Endoscopy, Chen et al. tackle the challenge of diagnosing autoimmune gastritis – a relatively rare yet clinically significant condition that requires unique treatment approaches compared with typical gastritis [1]. Autoimmune gastritis is a progressive disease associated with serious complications such as vitamin B12 deficiency anemia, neuropathy, and an increased risk of gastric neuroendocrine tumors. Despite these notable clinical consequences, autoimmune gastritis remains difficult to detect endoscopically as its appearance resembles common gastritis. Furthermore, endoscopists often have limited awareness of the disease, which often hinders early detection [2].
To overcome these diagnostic obstacles, the authors developed an AI tool capable of differentiating autoimmune gastritis from other types by analyzing endoscopic images. They trained a sophisticated deep neural network using a dataset of approximately 20 000 images collected from six hospitals in China. This model achieved impressive diagnostic accuracy with 90% sensitivity and 93% specificity in external validation. The performance of AI was validated with a rigorous benchmark evaluation on 90 white-light images, where 30% of the cases represented autoimmune gastritis, underscoring the high diagnostic precision of the model.
If implemented successfully, this AI tool could drive significant improvements in clinical practice, potentially reducing missed diagnoses and ensuring that patients with autoimmune gastritis receive timely and targeted treatment. The authors’ commitment to comprehensive data collection, collaborative work with computer scientists, and rigorous testing of the developed software, which highlights the importance of this study in the field of endoscopy.
AI applications in endoscopy are advancing rapidly, with a gradual shift of focus from major diseases to rare but clinically relevant conditions. The present study’s focus on autoimmune gastritis reflects the next logical step beyond more common targets such as colorectal polyps. However, applying AI to rare diseases presents unique challenges. For example, the low prevalence of autoimmune gastritis in real-world settings can limit the tool’s practical utility and may lead to unexpected consequences.
For instance, with autoimmune gastritis prevalence at approximately 1%, even a tool with 90% sensitivity and 90% specificity would yield a positive predictive value (PPV) of only 8% ([Table 1]). This means that out of every 100 AI alerts, only 8 would accurately represent autoimmune gastritis.
Table 1 A hypothetical case study – 1000 patients are diagnosed with an artificial intelligence (AI) tool in gastroscopy that can identify a rare disease (1% prevalence) with 90% sensitivity and 90% specificity. Its positive predictive value will be only 8% (9/108).AI prediction
Gold standard
Positive
Negative
Positive
9
99
Negative
1
891
Returning to our initial example: you now know that the PPV of the AI tool is only 8%, but it persistently suggests taking a biopsy to confirm autoimmune gastritis. To identify one disease condition, you would need to perform nine extra biopsies and potentially accompanying serological tests. Would you still follow the AI’s suggestion? You may change your first thought, I guess.
This is a typical trick of “low-prevalence effect” in human–AI interaction, a psychological phenomenon in which clinicians, when faced with low-prevalence conditions, may be less inclined to trust positive findings, potentially leading to diagnostic oversights [3] [4] [5]. In other words, the low PPV for rare diseases can psychologically influence physicians, making them less likely to accept AI-recommended positives, ultimately decreasing the tool’s sensitivity for disease detection. This effect has been extensively studied in screening mammography, highlighting a pressing concern for the application of AI in diagnosing rare diseases and recommending further research into mitigating these psychological biases [3] [4]. In theory, we could preserve high diagnostic performance for such AI tools by thoughtfully and logically considering these psychological factors in clinical decision-making [6] [7]. Use of autonomous AI would also be an effective solution where there is no interaction between human and AI; however, the current regulation does not allow use of autonomous AI in principle.
The present study represents a pivotal step forward in exploring the role of AI in diagnosing autoimmune gastritis – a condition that is often overlooked in routine endoscopic practice. Alongside this advancement, however, the study also highlights the real-world challenge of the “low-prevalence effect” in human–AI interaction, which both AI developers and users need to recognize. As we look forward to results from the prospective trials the authors encourage, now is the time for endoscopists to embrace the intersection of medicine and psychology, fostering an integrated approach for implementing AI tools effectively in endoscopic practice.
Publication HistoryArticle published online:
13 December 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
留言 (0)