Diagnostic Accuracy of Artificial Intelligence in Endoscopy: Umbrella Review


Introduction

Gastrointestinal diseases impose a serious burden on health care systems worldwide. The data show that gastrointestinal diseases cause millions of deaths worldwide every year []. Endoscopy, as an efficient and convenient method, can effectively diagnose various gastrointestinal diseases []. Endoscopic intervention can also effectively treat early gastrointestinal cancers [].

In recent years, with the rise of artificial intelligence (AI), numerous studies have been conducted to explore its application in the field of endoscopy, aiming to assist medical professionals in lesion identification and endoscopy quality control [,].

At present, some meta-analyses have reported the diagnostic value of AI in endoscopy [-]. Although AI has high sensitivity and specificity in identifying lesions in some studies, due to merger heterogeneity and sample size variations, the reliability of merger analysis outcomes needs further discussion [-].

In this study, an umbrella review methodology was used to elucidate current research directions and identify potential future research ideas by evaluating existing meta-analyses on AI in endoscopy. The meta-analyses of current studies were screened and extracted, and the quality of outcomes was assessed.


MethodsRegistration

The protocol was registered on PROSPERO (CRD42023483073) before the study began. PROSPERO is an open access database of systematic reviews. Registration before the start of the study effectively reduced selective reporting [,]. This umbrella review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The details can be seen in .

Search Strategy

Two researchers searched PubMed, Web of Science, Embase, and Cochrane Library with a comprehensive search strategy up to November 2023. In addition, we searched “Google Scholar” to identify gray literature and searched for references of eligible articles. Two researchers independently screened the titles and abstracts and reviewed the full texts to identify eligible studies. Any discrepancies were resolved through consultation with a third researcher until a consensus was reached. The search strategy details are available in Table S1 in .

Inclusion and Exclusion Criteria

The inclusion criteria were as follows: (1) studies evaluating the diagnostic value of AI in endoscopy; (2) studies that provided at least one outcome data—sensitivity or specificity; (3) articles that had meta-analyses and were conducted by systematic methods; and (4) articles published in English.

We excluded studies that met the following criteria: (1) experiments not on humans, (2) unavailable full text, (3) duplicate studies, and (4) studies lacking critical information.

Data Extraction

Two researchers independently extracted data. The third researcher would extract data if there were any discrepancies. The following basic information was included: the first author, year of publication, country, kind of endoscopy, detection, followed guidelines, registered number, number of included studies in the meta-analyses, outcomes, included study types in the meta-analyses, and tools for assessing the risk of the Bias. Then, we collected outcome information, including sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio, and area under the curve. We searched for missed information in primary studies if necessary.

Evaluation of Article Quality

Two reviewers independently evaluated the quality of the articles using A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR2). AMSTAR is a tool for evaluating the systematic reviews of randomized trials [,]. In 2015, researchers introduced AMSTAR2, which expanded the application scope of AMSTAR to include the evaluation of systematic reviews of nonrandomized trials []. AMSTAR2 consists of a 16-item questionnaire prompting reviewers to respond with “yes,” “partly yes,” or “no” to each item. We viewed 2 “partly yes” answers as 1 “yes.” In total, 7 items were considered important. If all the items were in conformity or only 1 unimportant item was out of conformity, the study was evaluated as having high quality. If more than 1 unimportant item did not fit, the study was rated as having moderate quality. If 1 important item did not conform, the study was rated as having low quality; the study was regarded as having critically low quality if more than 1 important item did not conform.

Data Analysis

We collected the outcome indicators of applying AI technology in different scenarios. This study evaluated the application of AI diagnostic techniques in different endoscopes. Considering that there are several studies analyzing the same issues, if there were multiple meta-analyses, we selected high-quality studies according to the AMSTAR2 criteria. If the quality of different studies was consistent, we chose the latest published study among them. After that, the most recent meta-analysis was collected and performed again to ensure that the most recent results were obtained. To make the results more reliable, we chose a more conservative method. Moreover, we used the random effect model to ensure the reliability of the result.

We calculated the effect quantity and 95% CI of each meta-analysis. In each meta-analysis, the P value of the Cochran Q test and the I2 metric were used to evaluate the heterogeneity caused by the threshold effect. The Deek test was used to test publication bias. We used forest figures to show the diagnostic value of AI in endoscopy. We also used the bar accumulation charts to show the conformity of the included articles. In this study, we used R (version 4.3.2; R Foundation for Statistical Computing) for calculation. If the P value was more than .05, we considered that there was no statistically significant difference.

Grading of the Evidence

Using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) principle, 2 reviewers evaluated the credibility of evidence independently. GRADE proposes 5 factors for downgrading certainty in the evidence (the risk of bias, inconsistency, indirectness, imprecision, and publication bias) and 2 factors for upgrading certainty in the evidence (large effect and dose-response). These factors were used to evaluate outcomes as being of high, moderate, low, or very low quality. The body of evidence for diagnostic test accuracy studies begins with high quality. There was no guidance on the up factors in the diagnostic test accuracy study; we only downgraded the evidence using the 5 downgrading factors. For the comparative study, we defined its initial reliability according to the results of AMSTAR2 and then adjusted it according to the above factors.


ResultsStudy Selection

We initially identified 3230 studies through the database and 208 studies through manual retrieval. After eliminating duplicates, we had 3013 studies. Then, the researchers eliminated 2897 studies that did not meet the criteria based on their titles and abstracts. After reading the full text of 116 studies, 80 irrelevant studies and 15 studies without meta-analyses were excluded, and finally, 21 studies were included for statistical analysis and evaluation. These included 10 studies pertaining to upper gastrointestinal endoscopy[,-], 5 studies focusing on colonoscopy [-], and 4 studies on capsule endoscopy [-]. Additionally, there was 1 study about endoscopic ultrasound (EUS) and 1 study about laryngoscopy [,]. Detail can be seen in .

Figure 1. Search strategy and study screening. Included Study Characteristics

A total of 10 studies reported the diagnostic value of AI technoloyg in upper gastrointestinal endoscopy. These studies encompassed various original research papers, ranging from 7 to 39 studies per investigation. These studies analyzed the diagnostic value of AI in various diseases, including esophageal and gastric neoplasia, Barrett esophagus, and Helicobacter pylori infection. In terms of research strategies, 9 research reports followed PRISMA guidelines, and 5 studies were registered on PROSPERO. With regard to evaluating bias, 8 studies used Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2), 1 study used QUADAS, and 1 study was not evaluated. QUADAS evaluates the diagnostic accuracy of the research system, including patient selection, index test, reference standard, flow, and timing. In 2011, researchers developed QUADAS-2 for better evaluation []. All studies were included in observational studies for diagnostic evaluation.

A total of 5 research studies on the diagnostic value of AI in colonoscopy were included. Among them, 1 study focused on ulcerative colitis, one focused on colon polyps and tumors, one used Prediction Model Risk of Bias Assessment Tool (PROBAST) to evaluate bias, 3 used QUADRAS-2, and one was not evaluated.

Of the remaining 6 included studies, 4 studies reported the value of AI in capsule endoscopy for diagnosing bleeding and ulcers; 2 studies reported AI’s diagnostic value of laryngoscopes in examining normal or diseased throat structures and in EUS for diagnosing gastrointestinal stromal tumors separately; 5 studies were conducted according to the PRISMA guidelines; and 3 studies were registered in advance. All 6 studies were included in the observational study, and 5 of them used QUADRAS-2. The details can be seen in and Table S2 in .

Table 1. Basic information of included studies.StudyYearCountryKind of endoscopyAimIncluded studies, nFollowed guidelinesTan et al []2022AustraliaUpper endoscopyDetection of Barrett esophagus12PRISMAMa et al []2022ChinaUpper endoscopyDetection of esophagus cancer7PRISMABang et al []2020KoreaUpper endoscopyDetection of Helicobacter pylori infection8PRISMAShi et al []2022ChinaUpper endoscopyDetection of chronic atrophic gastritis8PRISMAGuidozzi et al []2023South AfricaUpper endoscopyDetection of Barrett esophagus and cancer14PRISMAJahagirdar et al []2023AmericaColonoscopyDetection of ulcerative colitis12PRISMAKeshtkar et al []2023IranColonoscopyDetection of colorectal polyp and cancer24NRBang et al []2022KoreaWireless capsule EndoscopyDetection of ulcers, polyps, celiac disease, bleeding, and hookworm39PRISMASoffer et al []2020IsraelWireless capsule EndoscopyDetection of ulcers, polyps, celiac disease, bleeding, and hookworm19PRISMAGomes et al []2023AmericaEndoscopic ultrasonographyDetection of gastrointestinal stromal tumor8PRISMAZurek et al []2022PolandLaryngeal endoscopyDetection of lesions in the larynx11PRISMABai et al []2023ChinaColonoscopyPrediction of invasion depth of colorectal cancer or neoplasms10PRISMAQin et al []2021ChinaWireless capsule endoscopyDetection of erosion/ulcer, gastrointestinal bleeding, and polyps/cancer16PRISMAMohan et al []2021AmericaWireless capsule endoscopyDetection of gastrointestinal ulcers9NRBang et al []2021KoreaColonoscopyDetection of diminutive colorectal polyps13PRISMALui et al []2020ChinaColonoscopyDetection of colorectal polyp and cancer18PRISMALui et al []2020ChinaUpper endoscopyDetection of gastric and esophageal neoplastic lesions and Helicobacter pylori23PRISMAVisaggi et al []2021ItalyUpper endoscopyDetection of Barrett neoplasia19NRZhang et al []2021ChinaUpper endoscopyDetection of esophageal cancer and neoplasm16PRISMAXie et al []2022ChinaUpper endoscopyDetection of gastric cancer and prediction invasion depth17PRISMAChen et al []2022ChinaUpper endoscopyDetection of early gastric cancer12PRISMA

aPRISMA: Preferred Reporting Items for Systematic reviews and Meta-Analyses.

bNR: not reported.

Methodological Quality of Included Studies

In all the included studies, methodological quality ranged from very low to moderate. Results show that the methodology was rated as moderate for 6 studies, low for 2 studies, and very critically low for the remaining 13 studies. Among the articles about upper endoscopy, it was found that 5 studies exhibited a moderate level of methodological quality. In comparison, 2 studies were deemed to have low quality, and 3 studies had were very low quality. The critical problems were the need for advanced registration and an incomplete retrieval strategy. The noncritical problem was that the original literature funding had not been reported. Besides, studies of moderate methodological quality were conducted on both the stomach and esophagus of the upper gastrointestinal tract. Three studies on colonoscopy were of moderate quality, 2 were of low quality, and the remaining 8 were of very low quality. The main problems were the meta-merging method and the evaluation of publication bias.

Regarding the application of AI in capsule endoscopy, 1 study was of moderate quality, and the other 3 had critically low quality. In addition, the research on applying EUS to identify gastrointestinal stromal tumors and laryngoscope to identify normal and pathological structures of the throat had critically low quality. The details can be seen in Table S3 and Table S4 in .

Meta-Analyses

There were 4 outcomes for the esophagus. The sensitivity was 0.89 (95% CI 0.84-0.93) for esophageal neoplasia, 0.95 (95% CI 0.91-0.98) for esophageal squamous cell carcinoma, 0.94 (95% CI 0.67-0.99) for abnormal intrapapillary loops, and 0.97 (95% CI 0.67-1.00) for gastroesophageal reflux disease. Their specificity was 0.86 (95% CI 0.83-0.93) for esophageal neoplasia, 0.92 (95% CI 0.82-0.97) for esophageal squamous cell carcinoma, 0.94 (95% CI 0.84-0.98) for abnormal intrapapillary loops, and 0.97 (95% CI 0.75-1.00) for gastroesophageal reflux disease. The sensitivity of gastric cancer and chronic atrophic gastritis was 0.89 (95% CI 0.85-0.93) and 0.94 (95% CI 0.88-0.97), respectively. At the same time, their specificity was 0.93 (95% CI 0.88-0.97) and 0.96 (95% CI 0.88-0.98), respectively. The sensitivity and specificity of judging the invasion depth of gastric cancer were 0.82 (95% CI 0.78-0.85) and 0.90 (95% CI 0.82-0.95), respectively. The sensitivity and specificity of Helicobacter pylori infection were 0.87 (95% CI 0.72-0.94) and 0.86 (95% CI 0.72-0.96).

In colonoscopy, the sensitivity and specificity of colon polyps were 0.93 (95% CI 0.91-0.95) and 0.87 (95% CI 0.76-0.93), respectively. The sensitivity and specificity of colon neoplasia were 0.94 (95% CI 0.85-0.98) and 0.98 (95% CI 0.94-0.99), respectively. The sensitivity and specificity of ulcerative colitis were 0.83 (95% CI 0.78-0.87) and 0.92 (95% CI 0.89-0.95), respectively. For invasion depth of colon neoplasia, the sensitivity and specificity were 0.71 (95% CI 0.58-0.81) and 0.95 (95% CI 0.91-0.97), respectively.

For wireless capsule endoscopy, we got 2 results. The sensitivity and specificity of the diagnosis of gastrointestinal ulcer were 0.93 (95% CI 0.89-0.95) and 0.92 (95% CI 0.89-0.95), respectively. The sensitivity and specificity of the diagnosis of gastrointestinal bleeding were 0.96 (95% CI 0.94-0.97) and 0.97 (95% CI 0.95-0.99), respectively. The sensitivity and specificity of EUS in diagnosing gastrointestinal stromal tumors were 0.92 (95 % CI 0.89-0.95) and 0.80 (95 % CI 0.75-0.85), respectively. The sensitivity of healthy and diseased tissues in AI-identified laryngoscope was 0.91 (95 % CI 0.83-0.98) and 0.91 (95 % CI 0.86-0.96), respectively, and the specificity was 0.97 (95 % CI 0.96-0.99) and 0.95 (95 % CI 0.90-0.99), respectively. The details can be seen in and .

Figure 2. Diagnostic value of artificial intelligence in different endoscopic outcomes. Table 2. Outcomes of artificial intelligence in endoscopy diagnosis.StudyDetectionSensitivity (95% CI)Specificity (95% CI)PLR (95% CI)NLR (95% CI)DOR (95% CI)AUC (95% CI)ModelTan et al []Early Barrett esophagus0.90 (0.87-0.93)0.84 (0.80-0.88)NRNR0.90 (0.87-0.93)NRRandomMa et al et al []Early esophageal cancer0.90 (0.82-0.94)0.91 (0.79-0.96)9.8 (3.8-24.8)0.11 (0.06-0.21)NR0.95NRBang et al []Helicobacter pylori Infection0.87 (0.72-0.94)0.86 (0.72-0.96)6.2 (3.8-10.1)0.15 (0.07-0.34)40 (15‐112)0.92 (0.90-0.94)NRGuidozzi et al[]Esophageal squamous cell carcinoma0.91 (0.84-0.95)0.80 (0.63-0.90)NRNRNRNRRandomGuidozzi et al []Esophageal adenocarcinoma0.91 (0.87-0.94)0.87 (0.82-0.91)NRNRNRNRNRShi et al []Chronic atrophic gastritis0.94 (0.88-0.97)0.96 (0.88-0.98)21.58 (7.91-58.85)0.07 (0.04-0.13)320.19 (128.5-797.84)0.98 (0.96-0.99)NRJahagirdar et al []Ulcerative colitis0.83 (0.78- 0.87)0.92 (0.89-0.95)NRNRNR0.92 (0.88-0.94)NRKeshtkar et al []Colorectal polyp0.92 (0.85-0.96)0.94 (0.89-0.96)14.5 (8.4-25.2)0.09 (0.05-0.16)162 (59.44-5)0.97 (0.96-0.99)NRKeshtkar et al []Colorectal cancer0.94 (0.85-0.98)0.98 (0.94-0.99)41.2 (13.7-124.2)0.06 (0.02-0.16)677 (108-4240)0.99 (0.98-1.00)NRBang et al []Gastrointestinal ulcer0.93 (0.89-0.95)0.92 (0.89-0.94)NRNR138 (79-243)0.97 (0.95-0.98)NRBang et al []Gastrointestinal hemorrhage0.96 (0.94-0.97)0.97 (0.95-0.99)NRNR888 (343-2303)0.99 (0.98-0.99)NRSoffer et al []Mucosal ulcers0.95 (0.89-0.98)0.94 (0.90-0.96)NRNRNRNRRandomSoffer et al []Bleeding0.98 (0.96-0.99)0.99 (0.97-0.99)NRNRNRNRRandomGomes et al []Gastrointestinal stromal tumors0.92 (0.89-0.95)0.80 (0.75-0.85)4.26 (2.7-6.7)0.09 (0.14-0.18)71.74 (22.43-229.46)0.949NRZurek et al []Healthy laryngeal tissue0.91 (0.83-0.98)0.97 (0.96-0.99)NRNRNR0.945RandomZurek et al []Benign and malignant lesions0.91 (0.86-0.96)0.95 (0.90-0.99)NRNRNR0.924RandomBai et al []Invasion depth of early colorectal cancer0.71 (0.58-0.81)0.95 (0.91-0.97)NRNRNR0.93 (0.90-0.95)NRQin et al []Erosion or ulcers0.96 (0.91-0.98)0.97 (0.93-0.99)36.8 (12.3-110.1)0.04 (0.02-0.09)893 (103-5834)0.99 (0.98-1.00)NRQin et al []Gastrointestinal bleeding0.97 (0.93-0.99)1.00 (0.99-1.00)289.4 (80.3-1043.0)0.03 (0.01-0.08)10,291 (1539-68,791)1.00 (0.99-1.00)NRQin et al []Polyps and cancer0.97 (0.82-0.99)0.98 (0.92-0.99)42.7 (11.3-161.8)0.03 (0.01-0.21)1291 (60-27-808)0.99 (0.98-1.00)NRMohan et al []Gastrointestinal ulcers or hemorrhage0.96 (0.94-0.97)0.96 (0.95-0.97)NRNRNR95.4 (94.3-96.3)NRBang et al []Colorectal polyps0.93 (0.91-0.95)0.87 (0.76-0.93)7.1 (3.8-13.3)0.08 (0.06-0.11)87 (38-201)0.96 (0.93-0.97)NRLui et al []Colorectal polyps0.92 (0.89-0.95)0.90 (0.85-0.93)NRNRNR0.96 (0.95-0.98)RandomLui et al []Neoplastic lesions in the stomach0.92 (0.88-0.95)0.88 (0.78-0.95)NRNRNR0.96 (0.94-0.99)NRLui et al []Barrett esophagus0.88 (0.83-0.92)0.90 (0.86-0.95)NRNRNR0.96 (0.93-0.99)NRLui et al []Neoplastic lesions in squamous esophagus0.76 (0.48-0.93)0.92 (0.67-0.99)NRNRNR0.88 (0.82-0.96)NRLui et al []Helicobacter pylori status0.84 (0.71-0.93)0.90 (0.79-0.96)NRNRNR0.92 (0.88-0.97)NRVisaggi et al []Barrett neoplasia0.89 (0.84-0.93)0.86 (0.83-0.93)6.50 (1.59-2.15)0.13 (0.20-0.08)50.53 (24.74-103.22)0.90 (0.85-0.94)RandomVisaggi et al []Esophageal squamous cell carcinoma0.95 (0.91-0.98)0.92 (0.82-0.97)12.65 (1.61-3.51)0.05 (0.11-0.02)258.36 (44.18-1510.7)0.97 (0.92-0.98)RandomVisaggi et al []Abnormal intrapapillary capillary loops0.94 (0.67-0.99)0.94 (0.84-0.98)14.75 (1.46-3.70)0.07 (0.39-0.01)225.83 (11.05- 4613.93)0.98 (0.86-0.99)RandomVisaggi et al []Gastroesophageal reflux disease0.97 (0.67-1.00)0.97 (0.75-1.00)38.26 (0.98-6.22)0.03 (0.44-0.00)1159.6 (6.12-219711.69)0.99 (0.80-0.99)RandomZhang et al []Esophageal neoplasms0.94 (0.92-0.96)0.85 (0.73-0.92)6.40 (3.38-12.11)0.06 (0.04-0.10)98.88 (39.45-247.87)0.97 (0.95-0.98)RandomXie et al []Gastric cancer0.89 (0.85-0.93)0.93 (0.88-0.97)13.4 (7.3-25.5)0.11 (0.07-0.17)NR0.94 (0.91-0.98)RandomXie et al []Invasion depth of gastric cancer0.82 (0.78-0.85)0.90 (0.82-0.95)8.4 (4.2-16.8)0.20 (0.16-0.26)NR0.90 (0.87-0.93)RandomChen et al []Gastric cancer0.86 (0.75-0.92)0.90 (0.84-0.93)NRNRNR0.94NR

aPLR: positive likelihood ratio.

bNLR: negative likelihood ratio.

cDOR: diagnostic odds ratio.

dAUC: area under the curve.

eNR: not reported.

f95% CIs were not reported.

Grading of Evidence

We evaluated the reliability of each outcome through GRADE. Results showed that the quality was evaluated as very low for 44.1% of the outcomes and low for 55.9% of the outcomes. Our research found that the sensitivity and specificity of Barrett neoplasia, esophageal squamous cell carcinoma, Helicobacter pylori infection, chronological gastritis, colorectal polyp, gastrointestinal ulcer, and gastrointestinal hemorrhage had low credibility. The other outcomes had very low credibility. Generally speaking, the primary defects were indirectness and imprecision. These problems were caused by the different AI models and training methods used in the original literature, and there were also differences in the selection of recognition samples. Endoscopists in different regions used different samples and chose different AI algorithms to train and test the models, making the synthesized results less credible. Detail can be seen in Table S4 in .


DiscussionPrincipal Findings

In this study, we conducted a systematic review of the current use of AI in endoscopic diagnosis, assessing the quality of research and meta-analyses conducted in this field. AI has been studied and applied in upper gastrointestinal endoscopy, colorectal endoscopy, capsule endoscopy, and laryngoscopy. The meta-analysis results showed that AI has high sensitivity and specificity for these types of endoscopy. However, the overall evidence level of the outcomes was low.

In previous studies, AI could effectively assist in sedation and training in the operation process of upper digestive tract examination [,]. The earliest research we examined was conducted in 2007, when computers were trained to identify esophageal cancer []. At that time, the research only distinguished malignant and nonmalignant esophageal tissues in vitro.

With the rise of AI and the continuous upgrading of training methods, the application of AI in gastrointestinal endoscopy, including esophageal cancer, gastric cancer, and Helicobacter pylori infection, has been widely studied. In addition to the ordinary white light examination, computer-aided systems have shown a certain diagnostic value in stained and magnifying endoscopic imaging [,]. Moreover, some studies have found that trained models have research value in diagnosing gastric cancer’s infiltration depth [].

A study in 2022 compared the diagnostic value of computer-aided systems and professional endoscopists in gastric cancer images through retrospective data and found no significant difference in the diagnostic rate between the two groups []. This shows that AI aid is not inferior to endoscopists in image diagnosis. Wu conducted a single-center randomized controlled trial and found that the missed diagnosis rate of gastric adenoma could be significantly reduced using AI []. Multi-center randomized controlled studies are still needed for further analysis in the future.

AI has been widely studied in colorectal endoscopy. A meta-analysis showed that AI could effectively improve adenoma detection rate []. However, another meta-analysis based on real-world research reached the opposite conclusion []. The findings of our study proposed that AI has a noticeable effect in identifying intestinal lesions. However, many problems still need to be effectively addressed, particularly in terms of clinical implementation and practical translation.

In November 2023, the team at West China Hospital led a 12-center study with more than 10,000 patients []. This randomized controlled trial compared the relationship between AI-assisted and routine examinations in the missed diagnosis rate of esophageal lesions. The results showed that AI could not significantly improve the missed diagnosis rate of esophageal lesions. Many teams are constantly developing, improving and trying to use AI models in clinics. As mentioned above, although AI has been shown to have a significant effect in many studies, there has been an increase in research regarding the failure of AI to significantly improve the effectiveness of endoscopy in the clinical situation. In the application process, we found that the recognition threshold of AI greatly affected its application value. We are explored the possiblity of classifying patients according to some baseline information or endoscopic mucosal background images and then continuously optimized the AI recognition threshold according to the risk stratification of different patients. This approach aimed to achieve individualized endoscopic examinations and improve overall identification accuracy [-].

Moreover, the economic impact of a large-scale rollout of AI systems in clinical work on patients and health care institutions must be further studied. In addition, the differences in validation sets make it difficult to truly achieve accurate side-by-side comparisons when evaluating the capabilities of different AI models, which may lead to biased results. We believe it would be beneficial to produce an open platform that includes test data sets from different parts and different lesions of the gastrointestinal tract so that researchers can test the effectiveness of AI recognition in the future.

This study has several strengths. According to our preliminary understanding, the umbrella evaluation of using AI in endoscopic applications must be revised. To a certain extent, we have filled this blank. Second, we conducted a strict analysis and discussion following the PRISMA guidelines. Third, two researchers conducted all analyses, and the results were reliable.

There are also some limitations to this study. First, various computer-aid models have certain heterogeneity, and this could not be avoided in the analysis. Therefore, our results are a general summary of the current technology. Second, we could not gather the data of some unpublished studies. Third, the limited number of studies made it difficult to do further subgroup analyses. Fourth, we only included studies reported in English, which might have introduced some biases to our study.

Conclusions

This study found that AI has high diagnostic value in endoscopy. These findings provide a theoretical basis for the development and evaluation of AI-assisted systems, aimed at assisting endoscopists in conducting examinations, thereby improving patient health outcomes. However, it is worth noting that there is no convincing high-quality evidence in the existing research and further research is needed in the future.

This research was supported by the following grants: (1) Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (CIFMS; grants 2021-I2M-1-015, 2021-I2M-1-061, 2021-I2M-1-013, and 2022-I2M-C&T-B-054); (2) Sanming Project of Medicine in Shenzhen (SZSM201911008); (3) Capital’s Funds for Health Improvement and Research (grant CRF2020-2-4025); and (4) Beijing Hope Run Special Fund of Cancer Foundation of China (grant LC2021A03).

The data supporting this study’s findings are available on request from the corresponding author.

BZ was responsible for data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, supervision, writing the original draft, as well as reviewing and editing the final draft. AC was responsible for data curation, formal analysis, investigation, and resources. GW was responsible for conceptualization, funding acquisition, investigation, and resources.

None declared.

Edited by Christian Lovis; submitted 15.01.24; peer-reviewed by Feng Yu, Kai Liu; final revised version received 25.05.24; accepted 26.05.24; published 15.07.24.

© Bowen Zha, Angshu Cai, Guiqi Wang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 15.7.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

留言 (0)

沒有登入
gif