If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Verily Life Sciences, San Francisco, California, USAStanford University Medical Center, Clinical Excellence Research Center, Palo Alto, California, USA
Several artificial intelligence (AI) systems for polyp detection during colonoscopy have emerged in the gastroenterology literature and continue to demonstrate significant improvements in quality outcomes. This study assesses clinical quality outcomes during white-light colonoscopy with and without a novel AI computer-aided detection system, DEtection of Elusive Polyps (DEEP2), using Fuji 7000 series colonoscopes (Fujifilm, Singapore).
Methods
An unblinded, randomized (1:1), controlled, prospective study was performed at a single ambulatory care endoscopy center under institutional review board approval. Included participants ages 40 to 85 years were scheduled to undergo colonoscopy for screening, surveillance, or symptoms. Exclusion criteria were inflammatory bowel disease, prior colorectal surgery, known polyp referral, pregnancy, inadequate bowel prep, and incomplete colonoscopies. DEEP2 was trained and validated only on white-light imaging, excluding the use of continuous digital chromoendoscopy.
Results
Mean patient age was 62.4 years (SD, 10.29), and 49% were men. Of 674 colonoscopies analyzed, significant differences were found in the adenoma detection rate (ADR) between the 2 arms of the study, those performed without versus with DEEP2 (10% vs 27%-37%, respectively; P = .0057). Significant differences were also found for adenomas per colonoscopy (APCs; .62 vs .39, respectively; P < .001) and polyp detection rate (17% vs 39%-56%, respectively; P < .001). In the right-sided colon, where most interval cancers are found, it also showed significant ADR and APC differences (P < .01). The false alert rate (mean, 4 per examination) was lower than the mean of >20 false alerts reported for other computer-aided detection systems. Withdrawal times were equivalent between arms (mean, 7.2 minutes; not significant).
Conclusions
Seven enrolled physicians and 5 participating nurses reported a unanimous desire to continue using DEEP2 after the completion of the study and after commercial availability. (Clinical trial registration number: MYTRIALS.)
In recent years, artificial intelligence (AI) and machine learning have been described as high-tech software tools to enable more personalized patient care. Indeed, AI has been positioned by top academic clinicians such as Eric Topol
and others as technology that can help “make healthcare human again.” Aspirations for AI include relieving endoscopists and other clinicians of the often mundane tasks spent in procedural documentation, thus enabling more time conversing with patients, exercising empathy, discernment, and compassion. Another aim of AI-enabled endoscopy is rooted in studies demonstrating variation in individual endoscopist performance.
Such variations are differences in visual and interpretational skills, speed and accuracy of diagnoses, attention span and fatigue, and ultimately varying rates of polyp and adenoma detection.
Improving adenoma detection, as measured by adenoma detection rates (ADRs) and adenomas per colonoscopy (APCs), and increasing confidence in colonoscopy effectiveness could result in enhanced population screening acceptance and significantly improve overall colorectal cancer prevention. Beyond contributing to higher ADRs and detecting more elusive polyps, AI may soon be expected to automatically capture photo images of lesions, document real-time quality metrics, and record procedural times, thus providing information to improve overall endoscopy performance and efficacy. In recent years, a small but growing number of AI-assisted polyp detection systems
have been studied for their impact on improving ADRs and APCs, demonstrating positive published results beginning with the inaugural international conference on AI for gastroenterology held in Washington, DC in 2018.
Recently published meta-analyses show consistent results with a range of overall performance outcomes, demonstrating valuable impacts on key quality indicator benchmarks, reporting a 6% average absolute increase in ADRs
Despite the advancements in AI technology and successful incorporation and growing acceptance of this technology into gastroenterology practices, a few core questions remain regarding its use in everyday colonoscopy screening and surveillance, which we aimed to address with this study.
The present research studied an AI system, called DEtection of Elusive Polyps (DEEP2; Fujifilm, Singapore), first presented by Livovsky et al
in an early small clinical safety and feasibility study in 2021 where it demonstrated high polyp detection sensitivity (>99%) with only 3.9 false alerts per procedure. The primary aim of the current study was to compare differences in the ADR when randomized with the use of an AI system (DEEP2) versus standard of care white-light endoscopy (WLE; ie, without AI overlay) among 6 community endoscopists with different baseline ADRs and withdrawal times.
Our endoscopy center (Elisha Hospital, Haifa, Israel) previously field-tested 2 unique commercially available AI polyp detection systems and found significant burden with both systems because of high false alert rates (as many as 30+ false alerts per individual colonoscopy). As such, a secondary aim in this study was to specifically measure the false alert rate as a surrogate for both accuracy and user experience of the AI (DEEP2) algorithm.
Methods
From March through December 2021, we enrolled 1033 patients undergoing routine screening and surveillance colonoscopy at a single endoscopy center in Haifa, Israel. The DEEP2 polyp detection system was trained and validated using WLE without the aid of mechanical augmentation devices. For a subset of colonoscopies performed using linked-color imaging (LCI) and mechanical augmentation devices (endocuffs and endocaps), data were excluded from the primary analysis and will be the subject of future exploratory research and publication.
In addition to placing an overlay green bounding box highlighting suspected lesions of interest, DEEP2 also provides a simultaneous audible alert. If a polyp or lesion enters the field of view, the audible alert repeats only once, not less than 5 seconds after the index alert, to prevent alert fatigue. An additional feature of the DEEP2 system is the presentation of a thumbnail image temporarily showing the suspected lesion flagged by the algorithm with a picture-in-picture view (Fig. 1). The thumbnail image appears alongside the real-time endoscopy video feed (for 5 seconds). Thus, the thumbnail gives endoscopists an opportunity to consider (in real time) whether indeed the lesion had been overlooked, without having to specifically relocate it.
Figure 1The thumbnail confirms that a polyp was passed by the endoscope.
Inclusion criteria were patients aged between 40 and 85 years who were previously scheduled to undergo colonoscopy. Colonoscopy indications were classified as screening (asymptomatic), surveillance after previous polypectomy or with positive occult bleeding (ie, positive fecal immunochemistry test), or symptomatic (visible/overt rectal bleeding, abdominal pain, or change in bowel habits). Exclusion criteria were suspicion of or known inflammatory bowel disease, a known polyp (large polyps indicated for excision), previous surgery of the colon or rectum including for past colon cancer, known polyposis syndromes, pregnancy, and any unwillingness or inability to personally give informed consent. All patients were advised in writing a week before their scheduled procedure of possible enrollment in the study. Oral and written informed consent was obtained from all study participants on the day of the procedures, after confirming inclusion criteria were met and before enrollment and randomization.
Subsequent to study completion and before analysis, patients were excluded if their total Boston Bowel Preparation Scale score was less than 6 of 9 or if the procedure was deemed incomplete because cecal intubation was not achieved. The minimum withdrawal time was specified (as per American Society for Gastrointestinal Endoscopy guidelines) to be 6 minutes, and all study withdrawal times were recorded during the procedure. Seven experienced endoscopists, each having completed more than 2000 prior colonoscopies, participated in the study. User experience and provider satisfaction was qualitatively assessed by interview for each of the enrolling physicians in addition to the participating endoscopy nurses.
This was an investigator-initiated study, performed with the assistance and funding from Verily Life Sciences LLC. All colonoscopy procedures were performed at a single ambulatory endoscopy facility at the Elisha Medical Center, a private hospital in northern Israel. The colonoscopes used during this study were all Fuji (Fujifilm, Singapore) high-definition 7000 series.
This study was approved by the United Health Services Institutional Review Board for ethical research and was overseen by both the institutional review board and the independent Arazi group clinical research oversight monitor at monthly intervals, ensuring the study met all international standards of good clinical practice. The study was registered with the Israeli Ministry of Health registry of ongoing trials (MYTRIALS)(MOH_2021-04-07_009879). Patients were not compensated for participation in the trial.
The protocol called for endoscopists to remove all suspicious lesions in accordance with their standard clinical practice and for each lesion to be sent separately for histology. Primary data collected were physician impression of the size, location, and morphology of each polyp; procedural withdrawal time; number of false alerts during each procedure (both insertion and withdrawal); and final pathologist-determined histology of each resected lesion (input to electronic database once available).
Lesion resections were allowed during both insertion and withdrawal portions of the examinations, at the endoscopist’s discretion. All suspicious lesions were resected and sent for pathologist evaluation except for diminutive obvious hyperplasias in the rectosigmoid, per colonoscopist discernment, and in accordance with standard clinical practice.
Randomization was achieved by alternating weeks: 1 week with the use of DEEP2 and the next week with standard of care (WLE without application of DEEP2). Patients were informed as to which study arm (intervention or control) they were assigned, both in advance and on the day of the procedure. Neither the patient nor the physician were blinded to the intervention.
Statistical analyses were performed independently by the statistics laboratory of the Haifa University. Lesion-level and patient-level occurrences for both study arms were summarized using standard colonoscopy quality metrics: ADR and polyp detection rate were used to summarize the percentage of colonoscopies performed that resulted in occurrences of adenomas and polyps, respectively. For estimation at the polyp level, differences across arms were summarized using incidence rate ratios for APCs and polyps per colonoscopy. Incidence rate ratios were calculated using unconditional maximum likelihood estimation (Wald); confidence intervals and statistical tests used normal approximation (Wald intervals and χ2 test). For ADR and polyp detection rate, statistical tests of differences assumed these percentages were equal across arms under the null hypothesis. For incidence rate ratios, statistical tests evaluated whether ratios = 1 under the null hypothesis. A P < .05 was used to establish statistical significance and was not adjusted for multiple testing. Resected lesions with missing pathology values were imputed as negative for pathology (ie, normal tissue) and therefore were not counted toward adenomatous or hyperplastic polyp totals.
Results
Data were collected from 1033 consecutive eligible patients consented to participate in the trial. Three patients who otherwise met eligibility criteria declined participation and were not enrolled. Postprocedure per-protocol exclusion criteria were failed cecal intubation (n = 9), inadequate bowel preparation or cleaning (<6/9 on the Boston Bowel Preparation Scale, n = 95), and procedures performed entirely using LCI or with mechanical augmentation devices (n = 255). We therefore analyzed data from 674 participants randomized to receive WLE (mean age, 61.0 ± 9.95 years with DEEP2 vs 60.8 ± 9.79 years without DEEP2; 317 men [47%] and 357 women). Enrolled participants received either screening (32%), postpolypectomy surveillance (35%), or diagnostic (31%) colonoscopies from March through December 2021.
Mean Boston Bowel Preparation Scale scores were similar between the intervention DEEP2 arm (7.51 ± 1.07) and the control arm (7.52 ± .09; not significant). Average procedural withdrawal times (excluding time spent resecting and washing) were also similar (7.2 ± 3.6 minutes with DEEP2 vs 6.6 ± 3.4 minutes without DEEP2). No procedural adverse events were reported in in the control arm or in connection with the use of DEEP2.
Five hundred thirty-five suspected lesions were resected. The endoscopist estimated lesion size during the procedures and before resection. Each resected lesion had a histopathology diagnosis of adenomatous, hyperplastic, cancerous, or “other” rendered by the pathologist. Table 1 shows the complete listing of polyp histology by size.
Table 1Lesion size and histology with and without DEEP2
For the 674 colonoscopies analyzed using WLE, the overall ADR was 37% in the intervention arm (122/330) compared with 27% in the control arm (93/344) without DEEP2 (P = .006) (Tables 2 and 3). Likewise, APCs were significantly greater in the intervention arm (DEEP2) than in the control arm (.62 vs .39, respectively; P < .001) (Table 4). When evaluating polyps by size for the DEEP2 group versus control group, there was a statistically significant difference in APCs for both small polyps (≤5 mm, P < .001) and, more noteworthy, medium polyps (6-9 mm, P = .021) compared with large polyps (≥10 mm, P = .271) (Table 4).
Table 2ADR with and without DEEP2 analyzed by location and size
ADR
χ2 (df = 1, n = 674)
With DEEP2 (n = 330)
Without DEEP2 (n = 344)
Difference (%) (95% confidence interval), P value
Location
Rectum
3.9 (13)
3.5 (12)
.5 (–2.4 to 3.3), .757
Sigmoid colon
14.5 (48)
9.0 (31)
5.5 (.7-10.4), .026
Descending colon
6.7 (22)
5.5 (19)
1.1 (–2.5 to 4.8), .535
Transverse colon
8.8 (29)
5.2 (18)
3.6 (–.3 to 7.4), .070
Ascending colon
12.4 (41)
7.3 (25)
5.2 (.7-9.7), .024
Cecum
4.2 (14)
1.7 (6)
2.5 (–.1 to 5.1), .056
Proximal colon
21.5 (71)
12.5 (43)
9.0 (3.4-14.7), .002
Distal colon
23.3 (77)
17.4 (60)
5.9 (–.2 to 12.0), .057
Size
Small
23.9 (79)
16.6 (57)
7.4 (1.3-13.4), .017
Medium
14.5 (48)
10.2 (35)
4.4 (–.6 to 9.3), .084
Large
5.8 (19)
3.8 (13)
2.0 (–1.2 to 5.2), .227
Values are % (no. colonoscopies with at least 1 adenoma) unless otherwise defined.
ADR, Adenoma detection rate; DEEP2, DEtection of Elusive Polyps.
The greatest difference in ADRs and APCs by location between the DEEP2 group and the control group was in the proximal colon (Table 5). This was particularly true in the right-sided colon with an APC of .21 found in the DEEP2 group versus .12 in the control group (P = .007) (Table 5). The difference was less pronounced in other segments of the colon.
Table 5Adenomas per colonoscopy with and without DEEP2 analyzed by location and size
With DEEP2 (n = 330)
Without DEEP2 (n = 344)
Incidence rate ratio (95% confidence interval), P value
In the DEEP2 arm only, the mean number of false alerts per colonoscopy was 4, with a minimum and maximum number of false alerts per examination reported as 0 and 8, respectively. Of note, the gastroenterologists reported these false alerts in real time during both insertion and withdrawal phases.
Discussion
In this large, single-center, randomized control study conducted in a community setting in Israel, the DEEP2 AI-enabled polyp detection system performed as expected, based on the previously published clinical feasibility study results.
Our current exploratory clinical trial demonstrated both statistical and clinical significance at improving both ADRs and APCs, including in the proximal colon.
The improvements in ADR (27%-37%) and APCs (.39-.62) with the use of DEEP2 were both found to be significant (P = .006 and P < .001, respectively) compared with WLE alone. Across the range of AI-enabled polyp detection systems studied in recent meta-analysis reports, DEEP2 performs at the higher end of the efficacy spectrum, particularly with regard to its impact on ADR.
This positive effect size on ADR was also pronounced and statistically significant in the right-sided colon, which has historically been the relative Achilles heel of most gastroenterology endoscopists.
with randomization by indication and a tandem study design, if possible, to further assess the impact of DEEP2 on adenoma miss rate. For future studies, we hypothesize that mechanical colonoscope caps or Endocuffs (Olympus, Tokyo, Japan) may be most beneficial in the narrower parts of the colon, specifically in the left-sided and distal segments of the colon, because mechanical devices assist with spreading folds for increased visibility.
It is plausible that by using both AI-enabled polyp detection systems with the potential for relative strength in the right-sided colon and mechanical augmentation with strength in the left-sided colon endoscopists might achieve more optimized pancolonic inspection, as has been shown with longer withdrawal times.
Finally, we recommend for future investigation that DEEP2 trained and validated with LCI, narrow-band imaging, or other electronic chromoendoscopy could further augment the collective colonoscopic effectiveness for screening and preventing cancer.
Feedback from the endoscopy nursing staff was positive regarding what they perceived as improvements to the colonoscopy procedure. All participating nurses reported that the AI system may help focus the physicians’ actions, for instance, to encourage a slower progress of the procedure to follow the system cues, such as prompting their attention to areas of interest where a polyp was located by the system.
More specifically, from the 5-point survey delivered retrospectively to the nursing staff on the question regarding how helpful was this device to you during the procedures (1 = annoying, 2 = neutral, 3 = good, 4 = very good, 5 = excellent), 3 nurses responded with “excellent” and 2 with “very good.” Likewise, of 7 physicians, 4 reported “excellent” and 3 reported “very good.”
In qualitative responses, nurses indicated that the use of this AI system may refine the effectiveness of the colonoscopy procedure and ultimately have a positive impact in the overall management of patients. Consistently, participating physicians also had positive qualitative feedback for the AI system. All 7 physician and all 5 nurses expressed a desire to continue using the DEEP2 system after the study once (and if) it is made commercially available.
Limitations of this study were the single-center design and single geographic region from which to select participants. Quite possibly, this narrow population sampling and limited heterogeneity affects generalizability and reproducibility when compared with similar-sized studies. The average age of participants in this study was similar to other comparable reports. However, the colorectal cancer rate varies regionally and may be different in Israel than in other countries. Another study limitation could be the relatively lower racial diversity in the study population. Although race and ethnicity were not prospectively documented in this study, it is presumed that >95% of the participants were white, which may also contribute to the relative differences in ADR and polyp detection rate in this study compared with other similarly sized studies of mixed-race populations.
Optimizing colorectal cancer screening by race and sex: microsimulation analysis II to inform the American Cancer Society colorectal cancer screening guideline.
With the growing body of evidence in support of AI-enabled colonoscopy, we also believe the increasing use of these systems to augment human behavior could potentially usher in a new era of AI-empowered nurse endoscopists or mid-level providers. Given the reports of physician shortages globally, regions of the world could greatly benefit from well-trained advanced practice providers aided by AI to potentially achieve acceptable ADRs.
On the whole, we support the need for these AI systems to continue their advancement to account for the use of LCI, narrow-band imaging, and blue-light imagine, which will undoubtedly require new training and validation sets. Although there may yet be a long journey ahead on the road to optimization for AI in gastroenterology, studies like these demonstrate that we will soon have highly safe and effective tools as a strong foundation on which to build. AI assistance has staying power, and we believe these tools will become increasingly more refined and user-friendly as physicians partner with industry to build these platforms collaboratively. Such partnership will ensure that high-tech solutions stay clinically relevant and remain devoted to serving patients and saving more lives.
In conclusion, the DEEP2 AI polyp detection system demonstrated significant improvements in both ADRs and APCs, including in the proximal colon, with very few false alerts per procedure. The relatively low false alert rate was a primary driver of the high overall provider satisfaction scores as compared with other AI-enabled systems previously tested at our ambulatory surgical center.
Acknowledgments
We acknowledge the endoscopists who participated in this study, Badira Makhoul, Stanislav Bezobchuk, Ayalet Partoush-Abu, Eran Zittan, and Raffea Shallaby, and Chief Nurse Anna Kobzan and the staff nurses, the statistician Anil Patwardhan, and Shlomi Israelit for participation and assistance as teamwork in completing this study.
Disclosure
The following authors disclosed financial relationships: J. Lachter: Consultant for Google Health and Verily Life Sciences LLC. S. C. Schlachter, S. Plowman, R. Goldenberg, N. Aizenberg, E. Rivlin: Employee of Verily Life Sciences LLC. N. Rabani: Full-time employee of Google. All other authors disclosed no financial relationships.
References
Topol E.
Deep medicine: how artificial intelligence can make healthcare human again.
Optimizing colorectal cancer screening by race and sex: microsimulation analysis II to inform the American Cancer Society colorectal cancer screening guideline.