Cooperation between artificial intelligence and endoscopists for diagnosing invasion depth of early gastric cancer

Definitions of gastric cancer terminology

Macroscopic type, histological type, and invasion depth of gastric cancer were classified based on the classification of the Japanese Gastric Cancer Association [9]. Mainly, depressed-type early gastric cancer lesions were included in this study. Therefore, the macroscopic type was either type 0-IIc or type 0-III. For histological type, well-differentiated adenocarcinoma, moderately differentiated adenocarcinoma, and papillary adenocarcinoma were considered differentiated-type cancer. Poorly differentiated adenocarcinoma, signet-ring cell carcinoma, and mucinous carcinoma were considered undifferentiated-type cancer. When both were mixed, the lesions were classified into the histological type of the predominant lesion. Depth was classified as intramucosal cancer (M), cancer with submucosal (SM) invasion of < 500 μm (SM1), and cancer with submucosal invasion of ≥ 500 μm (SM2).

Training and test images

A flowchart outlining the study is shown in Supplementary Fig. 1. The Ethics Committees of Yamaguchi University Hospital and Hofu Institute of Gastroenterology approved this study. We retrospectively reviewed the endoscopic images and histopathological diagnoses at Yamaguchi University Hospital from 2009 to October 2020. We selected cases in which the patient underwent endoscopic resection or surgery for depressed-type early gastric cancer. There were 250 training cases each for the intramucosal and submucosal cancers, excluding cases with poor observational conditions, multiple lesions in one image, and lesions that did not fit within one image. We selected one representative white-light image from each case and collected 500 images for training. The endoscopic images and histopathological diagnoses were collected at the Hofu Institute of Gastroenterology from 2007 to January 2017 and reviewed retrospectively. The cases were identified in the same way as those for training. In total, 200 test images were created with 100 cases of intramucosal cancers and 100 cases of submucosal cancers. Both the test and training cases are consecutive cases, and they were not arbitrarily selected. Training and test images were captured using a GIF-H260, GIF-Q260J, GIF-H260Z, GIF-H290, or GIF-H290Z endoscope (Olympus, Tokyo, Japan).

The clinicopathological characteristics of the training cases are shown in Supplementary Table 1. Of the 250 SM cancers, 95 were SM1 cancer and 155 were SM2 cancer. The clinicopathological characteristics of the test cases are shown in Supplementary Table 2. Of the 100 SM cancers, 26 were SM1 cancer, and 74 were SM2 cancer.

Design of an AI classifier to diagnose invasion depth of early gastric cancer

We used the EfficientnetB1 model for learning, which was pre-trained on ImageNet, a large dataset of more than 14 million images, to design the AI classifier for diagnosing invasion depth of early gastric cancer. The EfficientnetB1 model was used for the feature extraction layer from the input images. The extracted features were replaced by a fully coupled layer to produce two outputs: intramucosal (M) and submucosal (SM) cancers.

Two experienced endoscopists (A.G. and J.N.), certificated member of the Japan Gastroenterological Endoscopy Society, examined all images along with the corresponding macroscopic and histopathology findings and then circled a cancerous area on the individual images. The images were cropped in an outer frame bordering the cancerous area and resized to 240 × 240 pixels. The hardware used to construct the AI classifier included an NVIDIA GeForce RTX 3070 graphics processing unit (GPU) and an AMD Ryzen Threadripper 3960X 24-core central processing unit. The classifier was prepared in the Python 3.8.5 and Tensorflow 2.4.1 environments. The hyperparameters of the classifier were evaluated with a batch size of 32, and the number of epochs was evaluated between 15 and 25, and the 15 with the highest evaluation results were adopted. The fully coupled layer was optimized with the Adam function and a learning rate of 0.001.

Internal evaluation of diagnostic ability of the AI classifier for training images

The diagnostic ability of the AI classifier to differentiate intramucosal and submucosal cancers was evaluated by the leave-one-out method [10] with 500 training images. According to this method, the 500 training images were divided into 499 training images and one pseudo-test image. We used the 499 training images to design the AI classifier based on deep learning and diagnosed one pseudo-test image. Thus, the danger of an optimistic bias due to the over-fitting the training images may decrease. This step was repeated 500 times such that each of the 500 training images was selected once as a pseudo-test image, and the diagnostic ability of the AI classifier was evaluated internally. Submucosal cancer was defined as positive, and the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 measure were calculated. The F1 measure is the harmonic mean of sensitivity and PPV, expressed as 2 × sensitivity × PPV/(sensitivity + PPV). We used the F1 measure as a benchmark for making a balanced diagnosis of M and SM cancers.

The softmax function was used to output a continuous value from 0 to 1 for the diagnostic probability of classification as intramucosal (M) or submucosal (SM) cancer. Diagnostic probability of the AI classifier exceeding 75% was defined as high confidence and 51–75% as low confidence, and diagnostic ability between high confidence and low confidence was evaluated.

Diagnoses by individual endoscopists

Eight endoscopists unaware of the pathology results of the 500 training images were asked to differentiate whether the invasion depth of gastric cancer was intramucosal or submucosal. We mainly use white-light imaging to diagnose the invasion depth of gastric cancer. For mainly depressed lesions, M cancer is characterized by a flat depressed base, and the tip of the converging fold narrows irregularly or is abruptly interrupted. In contrast, SM cancer has an irregular depressed base, and the tip of the converging fold is enlarged [11]. Nagahama et al. also reported that lesions positive for the non-extension sign were classified as SM2 cancers, whereas those negative for the non-extension sign were classified as M-SM1 cancers [12]. The endoscopists diagnose invasion depth of gastric cancer based on these previously published findings. The diagnosis was made by 4 (A–D) experts in esophagogastroduodenoscopy with over 10 years (10–15 years) of experience, and 4 (E–H) endoscopists with less than 5 years (3–5 years) of experience. Submucosal cancer was defined as positive, and the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 measure was calculated for each endoscopist.

For the decision based on the diagnoses from individual endoscopists, we adopted a majority vote for simplicity. It is well-known in pattern recognition fields that markers which are called features or specific attributes of the patients in diagnosis cannot be selected on the basis of their individual effectiveness [13]. In this study, both AI classifier and endoscopists are considered markers. Thus, all combinations of markers should be paid attention [14]. Considering endoscopists as markers, combinations of 3 out of 4 expert endoscopists were studied. The combination with the highest F1 measure was determined by a majority vote of 3 expert endoscopists. Unanimous voting by all 3 endoscopists was defined as high confidence and other results as low confidence. We evaluated the diagnostic ability between high confidence and low confidence.

Cooperation between the AI classifier and endoscopists

To explore how endoscopists can utilize the diagnostic support of the AI classifier in the clinical setting, we devised a diagnostic method based on cooperation between the AI classifier and the endoscopists as shown in Fig. 1. If the diagnoses of AI and the endoscopists were consistent, the diagnosis was considered final (indicated by the blue cells in Fig. 1), If the diagnoses of AI and the endoscopists differed, the diagnosis with the higher confidence level was adopted (indicated by the yellow cells in Fig. 1). If the diagnoses of AI and the endoscopists did not agree at the same confidence level (indicated by the pink cells in Fig. 1), the following four patterns were examined. Pattern I: the AI diagnosis was adopted for both mismatch 1 and 2. Pattern II: the endoscopists’ diagnosis was adopted for mismatch 1, where the diagnosis by AI is SM and by the endoscopists is M, and the AI diagnosis was adopted for mismatch 2, where the diagnosis by AI is M and by the endoscopists is SM. Pattern III: the AI diagnosis was adopted for mismatch 1, where the diagnosis by AI is SM and by the endoscopists is M, and the endoscopists’ diagnosis was adopted for mismatch 2, where the diagnosis by AI is M and by the endoscopists is SM. Pattern IV: the endoscopists’ diagnosis was adopted for both mismatch 1 and 2. The pattern with the best F1 measure on the training images was used as the final diagnosis for cooperation between the AI classifier and the endoscopists in this study.

Fig. 1figure 1

Diagnostic method of determining invasion depth by cooperation between AI and the endoscopists. If the diagnosis by AI and the endoscopists was consistent, the diagnosis was considered final (indicated by the blue cells). If the diagnosis by AI and the endoscopists differed, the diagnosis with the higher confidence level was adopted (indicated by the yellow cells). Diagnoses by AI and the endoscopists that did not agree at the same confidence level are considered mismatches (indicated by the pink cells). M, intramucosal cancer; SM, submucosal invasion; AI, artificial intelligence; Mismatch 1, AI diagnosis is SM but that of the endoscopists is M; Mismatch 2, AI diagnosis is M but that of the endoscopists is SM

Diagnosis of test images

We collected 200 test images of cases from another institution independent of the training cases. The selected endoscopists were also asked to differentiate whether the invasion depth of gastric cancer was intramucosal or submucosal for the test images and then the majority vote was conducted. The accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 measure were calculated for the AI classifier, the endoscopists, and cooperation between the AI classifier and the endoscopists.

Characteristics of misdiagnosis

Even though the AI classifier performed the diagnosis with high confidence of 95% or more, we presented the misdiagnosed cases and summarized their characteristics.

Statistical analysis

The data were statistically analyzed using StatFlex V6 statistical software (Artech Co., Ltd., Osaka, Japan). Quantitative variables are presented as median and range, and qualitative variables are presented as frequency and percentage. Comparison of the accuracy between high confidence and low confidence was examined by the chi-square test for 2 × 2 contingency tables.

留言 (0)

沒有登入
gif