Artificial Intelligence–Assisted Colonoscopy in Real-World Clinical Practice: A Systematic Review and Meta-Analysis

INTRODUCTION

Adequate detection of colorectal polyps during colonoscopy is crucial for colorectal cancer (CRC) prevention. However, the quality of screening colonoscopies varies significantly, with tandem studies suggesting an adenoma miss rate (AMR) of at least 25% (1). Adenoma detection rate (ADR) is an established quality indicator in screening colonoscopy. Higher ADR is associated with a lower risk of interval CRC and CRC death. Adenomas per colonoscopy (APC) is a potentially more accurate quality indicator because it may reflect complete colon clearance from all adenomas and may be associated with an incremental survival benefit over ADR-based monitoring (2).

Several artificial intelligence (AI)–augmented real-time detection systems (computer-aided detection [CADe]) aiming to increase polyp detection have been developed and are currently commercially available. The first commercially available CADe system in the United States is GI Genius (Medtronic), which was assessed in the landmark randomized controlled trial (RCT) by Repici et al. (3) Multiple RCTs have demonstrated improved ADR and significant AMR reduction with CADe. In a 2021 meta-analysis of existing RCTs, Hassan et al found that CADe achieved significant improvements in pooled ADR vs non-CADe colonoscopy (36.6% vs 25.2%) (4). The benefit of CADe has been observed in both experienced and novice endoscopists. However, conflicting evidence on the impact of CADe have begun to emerge, including a recent RCT in community practice that showed no benefit (5).

A critical question is whether the results of CADe RCTs are reproducible in the real world, which is very different from the sterile RCT environment. There may be crucial differences in patient population, endoscopist expertise, work environment, time constraints, and reporting and recording biases, all of which may affect colonoscopy performance. To understand novel technology, evaluation needs to be performed in real-world settings in addition to RCTs. Several recent real-world CADe implementation studies have failed to demonstrate the benefits reported in RCTs (6–8). Our aims in this systematic review and meta-analysis were to synthesize all emerging data on the impact of CADe in colonoscopy in a real-world setting, focusing on ADR and APC.

METHODS Search strategy and selection criteria

We systematically searched PubMed, Embase, and Web of Science for relevant studies published between January 1, 2020, and April 1, 2023. The study is registered under PROSPERO (CRD42023424037). The search strategy included AI, computer-assisted, computer-aided, colonoscopy, polyp, and adenoma. In addition, we reviewed reference lists of included articles and systematic reviews to identify additional relevant studies. Further details of the search can be found in Supplementary Digital Content (see Supplementary Table 1, https://links.lww.com/CTG/B55).

Two independent reviewers (M.W. and S.F.) screened the titles and abstracts in accordance with a predetermined list of inclusion and exclusion criteria (Table 1). Inclusion criteria consisted of publication in English (complete manuscripts or abstracts) and evaluation of CADe in colonoscopy for polyp detection. We excluded case reports, case series, review articles, meta-analyses, and systematic reviews. Because the focus of this meta-analysis was to review real-world evidence, we also excluded RCTs. We excluded studies that did not report on ADR and APC. To reflect routine clinical practice, we excluded studies focusing on trainees or involving only video review. When there were multiple reports from the same cohort, we selected the most recent study.

Table 1. - Search inclusion criteria Inclusion/exclusion criteria Population Adults who undergo a colonoscopy for detecting colorectal cancer Intervention The use of AI-assisted colonoscopy Comparator Conventional colonoscopy Outcome ADR, APC Timing of effect During colonoscopy Time of search January 1, 2020, to April 1, 2023 Setting Outpatient Study design Prospective and retrospective. RCTs were excluded

ADR, adenoma detection rate; AI, artificial intelligence; APC, adenoma per colonoscopy; RCT, randomized control trial.


Data extraction

Following screening of the titles and abstracts in accordance with the inclusion/exclusion criteria, the same 2 reviewers (M.W., S.F.) extracted data independently from the selected studies. Discrepancies were resolved by consensus with 2 additional investigators (U.L. and U.K.). In accordance with the Meta-analysis Of Observational Studies in Epidemiology statement, we developed a case report form to include the following information from each study: year of publication, first author name, full manuscript vs abstract, study country, study location (single center vs multicenter), study type (prospective vs retrospective), type of control (concurrent vs historical), and type of CADe used. We extracted procedural characteristics when available including ADR and APC (9). If needed, we contacted authors for clarification or to identify additional data if available.

Statistical analysis

The primary outcome of this study was comparison of the pooled ADR, defined as the proportion or percentage of colonoscopies in which 1 or more adenomas were detected. The secondary outcome was pooled APC. The pooled risk ratio (RR) with 95% confidence intervals (CIs) was used to compare ADR in colonoscopies with vs without CADe. Pooling was achieved using the Mantel-Haeznel random-effects model due to high study heterogeneity. Heterogeneity was determined using the I2 statistic, with an I2 value above 50% indicating high heterogeneity. The pooled rate ratio with 95% CIs was used to compare APC in colonoscopies with vs without CADe. Heterogeneity was examined using forest plots to visualize the contributions of individual studies to the pooled results, with sequential removal of significant outliers to assess their effects. Subgroup analyses, e.g., comparing study design or the use of particular CADe tool(s), were also conducted. Publication bias was examined using a funnel plot where symmetry was taken to show low publication bias. Statistical analyses were performed using Cochrane Review Manager 5.3 software. P < 0.05 was used as the threshold to determine statistical significance.

Quality assessment

Two authors (M.W. and S.F.) evaluated the quality of the included studies independently using the modified Newcastle-Ottawa scale, which scored studies across 3 categories: selection, comparability, and outcome (10). Disagreements that arose in scoring were resolved with evaluation by 2 additional investigators (U.L. and U.K.).

Data availability

The review protocol, template data collection forms, and extracted data may be made available on request.

RESULTS Study selection

The literature search yielded 2,502 results. After removing studies before 2020 and duplicates, 1,314 articles were screened against eligibility criteria, of which 928 were excluded based on abstract review because they did not evaluate use of AI (Figure 1). Of the 386 full texts reviewed for eligibility, 12 studies fit our criteria.

F1Figure 1.:

Study selection. Flowchart of the study selection process.

Study characteristics

The 12 studies included a total of 11,660 patients (Table 2). Studies originated from the United States, Europe, Asia, the Middle East, and New Zealand. Of these studies, 10 were fully published (6–8,11–17) and 2 (18,19) were abstracts. Of these, 6 (5 fully published, 1 abstract) were prospective (6,11–13,16,18) in design and 6 (5 fully published, 1 abstract) were retrospective (7,8,14,15,17–19). Five studies (7,8,12,13,18) used the GI Genius (Medtronic) system (N = 6,892). On evaluation of quality, all included studies had Newcastle-Ottawa score of 8 or 9 stars (see Supplementary Table 2, https://links.lww.com/CTG/B55), indicating high quality. More details of the studies are available in Supplementary Digital Content (see Supplementary Table 3, https://links.lww.com/CTG/B55).

Table 2. - Study characteristics Study Year Location Study design Control CADe used (N) Without CADe (N) CADe vs without CADe APC (P value) CADe vs without CADe ADR (P value) Quan (6) 2022 United States Multicenter prospective Historical EndoVigilant (N = 300) 300 1.35 vs 1.07 (0.099) 52 vs 46.3 (0.165) Koh (12) 2022 Singapore Single-center prospective Historical GI Genius (N = 298) NA 30.4 vs 24.3 (0.02)b Ishiyama (11) 2021 Japan Single-center prospective Concurrent EndoBRAINEYE (N = 918) 918 0.42 vs 0.3, (0.003) 26.4 vs 19.9 (0.001) Shaukat (16) 2022 United States Single-center prospective Historical Skout (N = 83) 283 1.46 vs 1.01, (0.104) 54.2 vs 40.6 (0.028) Richter (14) 2023 Germany Single-center retrospective Historical CADEye (N = 163) 140 0.39 vs 0.41 (>0.05) Nehme (13) 2021 United States Single-center prospective Historical GI Genius (N = 403) 641 1.27 vs 1.17, (0.45) 50.4 vs 53 (0.41) Ahmada (18), 2021 England Single-center prospective Historical GI Genius (N = 82) 86 48.8 vs 46.5 (0.77) Agazzia (19), 2022 Italy Single-center prospective Historical CADEye (N = 250) 450 46 vs 30.7 (<0.005) Wong (17) 2022 Hong Kong Single-center retrospective Historical ENDO-AID (N = 119) 115 52.9 vs 37.4 (0.017) Schauer (15) 2021 New Zealand Single-center retrospective Historical ENDO-AID (N = 213) 213 47.9 vs 38.5 (0.03) Levy (8) 2022 Israel Single-center retrospective Historical GI Genius (N = 1,969) 2,175 0.6 vs 0.68 (0.001) 30.3 vs 35.2 (0.001) Ladabaum (7) 2022 United States Single-center retrospective Historical and concurrent GI Genius (N = 619) 619 0.78 vs 0.89 (0.23) 40.1 vs 41.8 (0.44)

ADR, adenoma detection rate; APC, adenoma per colonoscopy; CADe, computer-assisted detection.

aAbstract available only.

bADR for without CADe was using baseline polypectomy rate.


Pooled analysis: ADR

Among 10 fully published studies (6–8,11–17) and 2 abstracts (18,19), ADR was statistically significantly higher with CADe (36.3%) than without CADe (35.8%), with a RR of 1.13 (95% CI 1.01–1.28), P = 0.04 (Figure 2). When only fully published studies were included, the pooled RR was no longer statistically significant (1.11, 95% CI 0.98–1.24, P = 0.10) (see Supplementary Figure 1, https://links.lww.com/CTG/B55).

F2Figure 2.:

Pooled ADR risk ratio of all studies including abstracts. ADR, adenoma detection rate; CADe, computer-assisted detection.

Among 6 prospective studies (6,11–13,16,18) (5 fully published, 1 abstract), ADR was statistically significantly higher with CADe than without CADe (37.3% vs 35.2%; RR 1.15, 95% CI 1.01–1.32, P = 0.04) (Figure 2). By contrast, among 6 retrospective studies (7,8,14,15,17–19) (5 fully published, 1 abstract), ADR did not differ with CADe vs without CADe (35.7% vs 36.2%; RR 1.12, 95% CI 0.92–1.36, P = 0.27) (Figure 2). Among the 5 fully published retrospective studies (7,8,14,15,17), there was also no difference between with CADe and without CADe (RR 1.04 [95% CI 0.88–1.23], P = 0.65) (see Supplementary Figure 1, https://links.lww.com/CTG/B55).

Pooled analysis: APC

Among 6 studies (6–8,11,13,16) (all fully published) that included adequate data on APC, the pooled rate ratio for APC with vs without CADe was 1.12 (95% CI 0.95–1.33), P = 0.18 (Figure 3). Among the 4 prospective studies (6,11,13,16), the pooled rate ratio for APC with vs without CADe was 1.27 (95% CI 1.11–1.46), P = 0.0006 (see Supplementary Figure 2, https://links.lww.com/CTG/B55).

F3Figure 3.:

Pooled rate ratio for APC with vs without CADe. APC, Adenoma per colonoscopy; CADe, computer-assisted detection.

Evaluation of GI Genius

Of the various CADe platforms, GI Genius was used in the largest number of studies (4 published studies (7,8,12,13)). ADR did not differ with vs without GI Genius (RR 0.96, 95% CI 0.85–1.07, P = 0.42) (Figure 4). Among the 3 GI Genius studies (7,8,13) with data allowing for APC comparison, APC did not differ with vs without GI Genius (rate ratio 0.94, 95% CI 0.82–1.08, P = 0.37) (see Supplementary Figure 3, https://links.lww.com/CTG/B55). There were insufficient data for separate analyses of other CADe platforms.

F4Figure 4.:

Pooled ADR risk ratio of full studies evaluating only GI Genius. ADR, adenoma detection rate; CADe, computer-assisted detection.

DISCUSSION

In this systematic review, we synthesized the currently available body of evidence on the impact of CADe in real-world colonoscopy practice. We found a statistically significant but clinically minimal improvement in ADR with CADe vs without CADe when all 12 available studies were considered (ADR 36.3% vs 35.8%; RR 1.13, 95% CI 1.01–1.28, P = 0.04). A statistically significant improvement in ADR was no longer found when 2 abstracts were excluded. In subanalyses, a statistically significant improvement in ADR with CADe vs without CADe was found in the 6 prospective, but not the 6 retrospective, studies. We found no differences in APC with vs without CADe among the 6 applicable studies. Among the 4 and 3 applicable studies, we found no significant differences in ADR or APC, respectively, with vs without GI Genius.

To date, most studies have examined the use of CADe in the RCT setting. In general, the RCT results have been favorable, demonstrating a significant increase in ADR and APC and a corresponding drop in AMR. In 2019, Hassan et al analyzed 5 RCTs and found pooled ADR (36.6% vs 25.2%; RR 1.44, 95% CI 1.27–1.62, P < 0.01) and APC (0.58 vs 0.36; RR 1.70, 95% CI 1.53–1.89, P < 0.01) favoring CADe vs non-CADe colonoscopy (4). However, we did not find the same trend in real-world data collected to date.

There are several important distinctions between RCTs and real-world studies that underscore the importance of real-world data. Many RCTs include only colonoscopies with optimal preparation and documented withdrawal time performed by expert endoscopists. By contrast, in real-world studies, the procedures are performed by a wide array of endoscopists with different skills and work practices. Furthermore, the lack of blinding of endoscopists is a crucial factor for any emerging technology evaluation. It is possible that the effect of knowing one is being observed (Hawthorne effect) is a more powerful determinant of performance in RCTs. While real-world studies cannot completely avoid the Hawthorne effect, they could be less influenced by unconscious bias in favor of CADe. Notably, in our evaluation of studies published in full, CADe demonstrated a modest benefit in prospective studies, but not in retrospective studies.

We performed a subanalysis for GI Genius, the first United States Food and Drug Administration–approved CADe platform. Contrary to RCTs such as the study by Repici et al, (3) we found no significant improvement in ADR or APC. It is unlikely that this is a finding specific to GI Genius only, especially because most of the available CADe platforms seem to have similar performance characteristics and usability features. After the dates of our literature search, an abstract has reported improved ADR and sessile serrated lesion detection in the real-world setting using GI Genius, with the benefit attributable to those endoscopists who used CADe in most of their cases (20). Because our meta-analysis included 6 different CADe platforms, additional data and more focused studies will be needed to understand any differential impact between CADe systems.

Our study has several limitations. There was significant heterogeneity among included studies, with I2 ranging from 64% to 91%. However, this is inherent in evaluation of real-world studies, in which there are multiple study designs and CADe systems being aggregated and evaluated. The reasons for the variability in real-world experience remain to be determined, including the potential effect of the specific implementation strategy, expectations, and incentives beyond the simple application of CADe. Another reason may be our definition for detecting colon cancer included both screening and surveillance colonoscopies and some studies that included other gastrointestinal symptoms under this definition. However, we believe that given our objective to understand the real-world impact of CADe, it was acceptable to pool studies based on the similarities between CADe platforms. The most significant limitation is the use of historical controls by most of the included studies, which is prone to multiple potential biases. However, this was an intentional decision on our behalf because we aimed to audit the effect of implementation of CADe systems on real-world practices, with a warts and all pragmatic approach. The ultimate benchmark for implementation of these systems—and what any clinical practice contemplating adoption of CADe would want to know—is whether we detect more polyps with CADe, and what is the real-world cost of CADe per polyp. With extended follow-up, the interval cancer rate will be the ultimate marker of the clinical utility of CADe.

We believe it would be an error to dismiss the potential of AI to improve colonoscopy quality based on the discordant results to date of CADe in RCTs and in the real-world setting. Rather, this is a call for further research to try to understand the factors at play. Colonoscopy quality rests on the fundamentals of good procedural technique, which includes meticulous inspection of the entire surface area of the colorectum. A minimum withdrawal time is necessary for this, but not sufficient. CADe can only highlight polyps on mucosa exposed by the endoscopist, though there is ongoing AI work focusing on adequacy of mucosal exposure (21). CADe combined with other AI-related tools to enhance quality of endoscopy examination may lead to better standardization of colonoscopy quality and lower polyp miss rates.

In conclusion, in a systematic review and meta-analysis of real-world studies evaluating the utility of CADe, we found a statistically significant but clinically minimal improvement in ADR with CADe vs without CADe when all available studies were considered, but not after excluding abstracts or when focusing on retrospective studies. Furthermore, we found no difference in APC with vs without CADe. Given the discrepancy between these real-world results and the accumulated data from RCTs, further studies are needed to understand the true benefit of CADe in colonoscopy and the subtleties of human-AI interactions that may underlie the differences in performance in RCTs vs real-world implementation.

CONFLICTS OF INTEREST

Guarantor of the article: Uri Kopylov, MD.

Specific author contributions: M.T.W.: planning the study, screening studies, extracting data, analyzing and interpreting data, and drafting the manuscript. This author approves the final draft submitted. S.F.: planning the study, screening studies, extracting data, analyzing and interpreting data, and drafting the manuscript. This author approves the final draft submitted. D.Y.: planning the study, analyzing and interpreting data, and drafting the manuscript. This author approves the final draft submitted. U.L.: planning the study, analyzing and interpreting data, and drafting the manuscript. This author approves the final draft submitted. U.K.: planning the study, analyzing and interpreting data, and drafting the manuscript. This author approves the final draft submitted.

Financial support: None to report.

Potential competing interests: M.T.W.: consultant for Capsovision, Neptune Medical, AgilTX. S.F.: none. D.Y.: none. U.L.: consultant for Neptune Medical, Medtronic, Medial EarlySign, Freenome, Guardant, and Kohler Ventures; Advisory Board for Universial Dx, Vivante. U.K.: advisory and speaker fees–Abbvie, BMS, Celtrion, Takeda, Janssen, Medtronic, and Pfizer, research support—Janssen, Medtronic, and Takeda.

Study Highlights

WHAT IS KNOWN ✓ Computer-aided detection (CADe) has been found to improve adenoma detection rate and adenomas per colonoscopy in randomized controlled trials. ✓ However, emerging real-world data seem to contrast with the results of randomized controlled trials. ✓ There is a need to assess the impact of CADe in real-world settings.

WHAT IS NEW HERE ✓ In this meta-analysis, we evaluated 12 studies of CADe in real-world settings and found that adenoma detection rate was slightly higher with vs without CADe (36.3% vs 35.8%). ✓ However, in subgroup analyses, this improvement did not persist across retrospective studies or when only studies published in full were considered. ✓ Future research is needed to understand the true impact of artificial intelligence technology on colonoscopy quality and polyp detection. REFERENCES 1. Zhao S, Wang S, Pan P, et al. Magnitude, risk factors, and factors associated with adenoma miss rate of tandem colonoscopy: A systematic review and meta-analysis. Gastroenterology 2019;156(6):1661–74.e11. 2. Wang HS, Pisegna J, Modi R, et al. Adenoma detection rate is necessary but insufficient for distinguishing high versus low endoscopist performance. Gastrointest Endosc 2013;77(1):71–8. 3. Repici A, Badalamenti M, Maselli R, et al. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology 2020;159(2):512–20.e7. 4. Hassan C, Spadaccini M, Iannone A, et al. Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: A systematic review and meta-analysis. Gastrointest Endosc 2021;93(1):77–85.e6. 5. Wei MT, Shankar U, Parvin R, et al. Evaluation of computer-aided detection during colonoscopy in the community (AI-SEE): A multicenter randomized clinical trial. Am J Gastroenterol 2023;118(10):1841–7. 6. Quan SY, Wei MT, Lee J, et al. Clinical evaluation of a real-time artificial intelligence-based polyp detection system: A US multi-center pilot study. Sci Rep 2022;12(1):6598. 7. Ladabaum U, Shepard J, Weng Y, et al. Computer-aided detection of polyps does not improve colonoscopist performance in a pragmatic implementation trial. Gastroenterology 2023;164(3):481–3.e6. 8. Levy I, Bruckmayer L, Klang E, et al. Artificial intelligence-aided colonoscopy does not increase adenoma detection rate in routine clinical practice. Am J Gastroenterol 2022;117(11):1871–3. 9. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-analysis of observational studies in epidemiology (MOOSE) group. JAMA 2000;283(15):2008–12. 10. Wells G, Shea BJ, O'Connell D, et al. The Newcastle-Ottawa Scale (NOS) for Assessing the Quality of Nonrandomised Studies in Meta-Analyses. Ottawa Hospital Research Institute: Ottawa, Ontario, Canada, 2021. 11. Ishiyama M, Kudo SE, Misawa M, et al. Impact of the clinical use of artificial intelligence-assisted neoplasia detection for colonoscopy: A large-scale prospective, propensity score-matched study (with video). Gastrointest Endosc 2022;95(1):155–63. 12. Koh FH, Ladlad J, Teo EK, et al. Real-time artificial intelligence (AI)-aided endoscopy improves adenoma detection rates even in experienced endoscopists: A cohort study in Singapore. Surg Endosc 2023;37(1):165–71. 13. Nehme F, Coronel E, Barringer DA, et al. Performance and attitudes toward real-time computer-aided polyp detection during colonoscopy in a large tertiary referral center in the United States. Gastrointest Endosc 2023;98(1):100–9.e6. 14. Richter R, Bruns J, Obst W, et al. Influence of artificial intelligence on the adenoma detection rate throughout the day. Dig Dis 2023;41(4):615–9. 15. Schauer C, Chieng M, Wang M, et al. Artificial intelligence improves adenoma detection rate during colonoscopy. N Z Med J 2022;135(1561):22–30. 16. Shaukat A, Colucci D, Erisson L, et al. Improvement in adenoma detection using a novel artificial intelligence-aided polyp detection device. Endosc Int Open 2021;9(2):E263–70. 17. Wong YT, Tai TF, Wong KF, et al. The study on artificial intelligence (AI) colonoscopy in affecting the rate of polyp detection in colonoscopy: A single centre retrospective study. Surg Pract 2022;26(2):115–9. 18. Ahmad A, Dhillon A, Wilson A, et al. Early evaluation of a computer assisted polyp detection system in bowel cancer screening. Gut 2021:A42. 19. Agazzi S, Chicco F, Scudeller L, et al. Real-time artificial intelligence-aided colonoscopy experience: The impact on routine clinical practice in a high-volume center-preliminary data. United Eur Gastroenterol J 2021:813–4. 20. Keswani R, Thakkar U, Sals A, et al. Adoption of a computer-aided detection system significantly improves polyp detection in routine clinical practice. Gastrointest Endosc 2023;97(6):AB468–9. 21. Thakkar S, Carleton NM, Rao B, et al. Use of artificial intelligence-based analytics from live colonoscopies to optimize the quality of the colonoscopy examination in real time: Proof of concept. Gastroenterology 2020;158(5):1219–21.e2.

留言 (0)

沒有登入
gif