Artificial intelligence and the push for small adenomas: all we need?

Artificial intelligence (AI) has been enthusiastically welcomed by the endoscopic community, although many of the potential applications have remained in the testing environment using images and videos, and have not been tested in the clinical setting. Some computer-aided polyp detection (CADe) systems have entered the field of clinical application, however, and have been studied extensively in randomized trials. At the time of writing, 19 such trials were available as full publications [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]. They all show a more or less significant increase in detection of small adenomas. The same results are summarized over and over again in no less than 16 meta-analyses [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35], plus a protocol for a further one [36], as well as a meta-analysis of some of these meta-analyses [37]!

“The question relates to the ultimate clinical relevance of increasing ADR from high to even higher levels …, we still do not know whether we have a cutoff ADR level, above which there is no further improvement in cancer prevention.”

In this issue of Endoscopy, Ahmad et al. report on their use of CADe within a bowel cancer screening program [38]. The study is distinct from most other randomized trials in several respects. The patient population was homogeneous, including only colonoscopies from a fecal immunochemical test (FIT)-based screening program. Furthermore, a cap device, mostly Endocuff Vision (Olympus, Tokyo, Japan), was used at the discretion of the endoscopist. This device has been shown to increase polyp yield in a recent meta-analysis [39], and also recently in a FIT-based randomized screening colonoscopy study published in Endoscopy [40]. Endocuff was used in about 70 % of colonoscopies in the Ahmad et al. study, and it is a shame that its use was not standardized. Finally, polyp detection rate (PDR) and not adenoma detection rate (ADR) was used as the main outcome parameter, in contrast to almost all other studies.

The results show that PDR was slightly higher in the CADe group (85.7 %) compared with the control group (79.7 %; P = 0.05). ADRs were 71.4 % and 65.0 %, respectively. So, almost all patients had a polyp and more than two-thirds had an adenoma. No difference in ADR was found, but the level was already high in the control group. In other FIT-based screening programs, ADR was as high as in the control group of the current study in the Netherlands [41] and lower in studies from other countries, namely 55 % in Spain [42], 51 % in Sweden, and 41 % in Italy [40]. In primary screening colonoscopy, ADR is naturally lower, as patients have not been selected by a stool test. In the recently published NordICC trial, ADR was 31 % [43], which is similar to the ADR found in the German screening colonoscopy setting [44]. A recent review examined ADR differences in the control groups of randomized trials and found generally a very wide variation. The overall mean value of 37.5 % was however increased by a factor of 1.6 within FIT-based screening programs [45].

The sample size calculation in the present study seems a bit strange. The study was powered “to detect a 10 % increase in polyp detection, from a detection rate of 20 % in the control group up to 30 % in the CADe group”. In reality, the PDR was almost 80 % in the control group. In addition, higher ADRs and PDRs have been published in FIT-based screening colonoscopy programs, as outlined above. So, in essence, the study could have been underpowered, and with larger case numbers, an ADR increase from 65 % to 71 % could have been significant (if relevant is another question). Interestingly, numerical differences in adenomas were in the group 1–5 mm (from 765 to 846) but also in the group > 10 mm (from 94 to 120 polyps), presenting some contradiction to most previous studies, which show that small polyps are found more frequently by AI, but this may be coincidental.

So, what can we conclude about clinical relevance of AI increasing ADR? The first question is, do we see an upper limit of ADR that cannot be substantially improved upon by AI or any other means? Furthermore, may this study by Ahmad et al. have reached the possible ADR ceiling? Looking at the randomized trials cited above, the relative increase in ADR was dependent on the basic ADR level in the control group in most of the papers; relative ADR increases were around 50 % in low ADR papers [15] [18], about 30 % in trials with ADR levels between 20 %–30 % [11] [17], and 20 %–25 % in studies with basic ADR levels of 30 %–45 % [8] [10] [14], such as in FIT-based screening programs [2]. This is somewhat logical: there is less room for improvement in high ADR performers. We are confident that this will be the topic of another meta-analysis soon. Tandem studies in AI showed an even greater effect on the adenoma miss rate, but it is well known that tandem trials yield better results than comparative randomized studies [46].

The second question relates to the ultimate clinical relevance of increasing ADR from high to even higher levels. We know that ADR correlates with interval cancer rate (i. e. higher ADR prevents more colorectal cancer). However, we still do not know whether we have a cutoff ADR level, above which there is no or not much further improvement in cancer prevention. This was indirectly suggested by the Polish and Austrian follow-up studies [47] [48], which primarily took a cutoff level of 20 % for their data analyses. In contrast, the Californian group suggested a linear correlation in their studies [49] [50]. Their recent publication however includes figures that somehow contradict the conclusions [50]. It appears logical that if more than two-thirds of individuals undergoing endoscopy have a (mostly small/very small) adenoma, then the risk associated with adenomas per se may be of less importance, and finding more and more small adenomas may not be the key to improved cancer prevention.

The third and fourth questions are which adenomas should we chase and what else needs to be done? The relevance of small adenomas has been doubted by many database analyses in the past 10 years, which correlated patients’ prognoses with the stage of adenomas found and removed during colonoscopy, and was reconfirmed in a large review at the end of 2022 [51]. Another recent meta-analysis summarized the evidence of 12 such studies [52]. The authors uniformly showed no relevant disadvantage with regard to colorectal cancer development for individuals with small adenomas only, compared with the wider population or people with a normal colonoscopy. The worrisome finding of all these studies, however, is that the colorectal cancer incidence was fourfold higher when patients had advanced adenomas, which were obviously removed. Should we be ensuring that these patients benefit more from colonoscopy and polypectomy in order to bring their risk level down to “normal”? Therefore, we presume that there is more to colonoscopy quality and cancer prevention than just finding more and more small adenomas. We know that resection can be incomplete, with sometimes worrisome rates reported [53] [54] [55], and adherence to follow-up recommendations is not optimal. Mostly, patients with small adenomas tend to undergo surveillance earlier than recommended, and those with advanced lesions have surveillance intervals that are longer than those recommended in guidelines [56] [57] [58] (and perhaps are initially also under-resected). Is it possible that our continual focus on ADR is misplaced?

Finally, randomized trials, despite the high-quality evidence they provide, may also have shortcomings. The focus of a study and the willingness to produce good results, and to achieve a better publication, may increase the examiner’s awareness and attention in both groups (Hawthorne effect), but perhaps more in the study group, such as an AI-based application. On the other hand, there are recent sobering reports about what happens if AI systems are incorporated into a colonoscopy program. ADR did not increase with AI assistance in a US center [59], and they decreased in an Israeli hospital [60], perhaps due to shorter withdrawal times. The group from Würzburg used eye-tracking technology and showed that trainees focussed their eyes in the middle of the picture when using AI, whereas without AI, their gaze tended to travel further around the endoscopy screen [61]. This phenomenon is known from radiology and is called “deskilling”. This may not be the intention of AI, which is still dependent on the examiner’s ability to maneuver the scope adequately and also to pay attention; however, if the expectation is that AI will do the job, then attention may be reduced.

What do we need? Studies should look at more relevant outcomes than just whether we find more small polyps (we already have abundant evidence that this is the case). Authors and editors should refrain from writing and accepting more and more meta-analyses about AI polyp detection. There are also more challenging applications to be studied, starting with tissue diagnosis (and histology replacement, which may face quite a few legal and credentialing hurdles), but also performance, interpretation, documentation, risk stratification, and follow-up of colonoscopies and patients with more advanced polyps, all parameters that have to be documented before, during, and after colonoscopy. If some of this documentation paperwork were automatic and the burden reduced for endoscopists, we could perhaps focus better on good patient care and finally also on good clinical research.

Publication History

Article published online:
07 March 2023

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

View original article

ENDOSCOPY

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Artificial intelligence and the push for small adenomas: all we need?

留言 (0)