A Semi-Supervised Learning Framework for Classifying Colorectal Neoplasia Based on the NICE Classification

The Proposed Semi-Supervised Learning Framework (SimCLR)

The proposed framework, which consists of SSL (Fig. 1) and fine-tuning, is presented in Fig. 2.

Fig. 1figure 1

The self-supervised learning (SSL) flowchart

Fig. 2figure 2

The semi-supervised learning framework (SimCLR). The framework includes two parts: self-supervised learning (SSL) and fine-tuning. The SimCLR-based models were evaluated on the test dataset

Self-Supervised Learning (SSL) (Fig. 1)

The self-supervised learning model consisted of 3 main components: data augmentation, base encoder, and projection head.

Data augmentation includes a combination of policy, including cropping, flipping, and color distortions. The augmentation module transforms any given data example randomly, resulting in correlated views from a particular image, e.g., two views, denoted as pg vs. ph, which is considered a positive pair. The module also randomly yields uncorrelated views from different images, i.e., ph vs. ni and ph vs. nk, which are negative pairs [14].

A base encoder is used to extract representation vectors from augmented data examples. The framework allows various selections of the network architecture without any constraints.

Projection head maps representations to the space where contrastive loss is applied.

In our study, unlabelled NBI endoscopic images of colorectal neoplasia from PolypsSet [15] were used to train the SSL model. VGG16, MoblieNet, Resnet50, and Xception (initially trained on ImageNet) were loaded as the backbones of base encoders. We manually removed heads and added three fully connected layers (1*2048, 1*256, and 1*128) to the above backbones. In the SSL procedure, the last three convolutional layers and the three added fully connected layers were trainable, while the other layers of the backbones were nontrainable.

Fine-Tuning (Fig. 2)

After the initial training using the SSL model, the semi-supervised model was fine-tuned with the labelled NBI images of colorectal neoplasia from the PICCOLO dataset. Two extra fully connected layers (1*64 and 1*16) and a classifier were added. In the fine-tuning procedure, only the two newly added layers and the classifier were trainable for the target task of classification, while the others were nontrainable.

Supervised Transfer Learning (Fig. 3)Fig. 3figure 3

The supervised transfer-learning flowchart

Supervised transfer learning was performed on the dataset as fine-tuning, i.e., the PICCOLO dataset. Like in the case of SSL, four other networks pretrained on ImageNet were loaded. Similarly, five fully connected layers (1*2048, 1*256, 1*128, 1*64, and 1*16) and a classifier were added to the backbones without a head. In the supervised transfer learning procedure, the last three convolutional layers, the five added fully connected layers, and the classifier were trainable, while the others were nontrainable.

Model Training

The Keras Python (version 3.8.0) platform (backbone: TensorFlow version 2.8.0) was used to train the models. Each image was resized to 224 × 224 pixels and input into the models in the form of RGB channels. The training parameters are listed in Supplementary Table 2. The training code for SSL was inspired by that of Sayak Paul, which is available at https://github.com/sayakpaul/SimCLR-in-TensorFlow-2. Our training code is available at https://osf.io/t3g8n.

DatasetsPolypsSet

Li et al. [15] collected various publicly available endoscopic datasets and a new dataset from the University of Kansas Medical Center to develop a relatively large endoscopic dataset for polyp detection and classification (https://doi.org/https://doi.org/10.7910/DVN/FCBUOR). The publicly available dataset includes 155 colorectal video sequences with 37,888 frames from the MICCAI 2017, CVC colon DB, and GLRC datasets [16]. NBI images were collected from the dataset to train the SSL model. To prevent duplication and ensure image quality, three endoscopists with more than 10 years of experience from Soochow University reviewed and finally selected 2000 unlabelled NBI images.

PICCOLO

This dataset contains 3433 images from clinical colonoscopy videos, including 2131 white light images and 1302 NBI images, from colonoscopy procedures in 40 human patients (https://www.biobancovasco.bioef.eus/en/Sample-and-data-catalog/Databases/PD178-PICCOLO-EN.html, Basque Biobank: https://labur.eus/EzJUN) [17]. To prevent duplication and ensure image quality, three endoscopists above reviewed and labelled 551 eligible NBI images based on the NICE classification (NICE I, n = 219; NICE II, n = 221; NICE III, n = 111). The labelled 551 endoscopic images were used to fine-tune the semi-supervised model. The detailed information on the two public datasets is presented in Supplementary Table 3.

Soochow University/Shanghai Jiao Tong University Dataset

A total of 1432 NBI images of colorectal neoplasia were collected from the First Affiliated Hospital of Soochow University and Kowloon Hospital of Shanghai Jiao Tong University. Three senior endoscopists independently reviewed and labelled 358 eligible images based on the NICE classification (NICE I, n = 126; NICE II, n = 109; NICE III, n = 123). The method for endoscopist reviewing and labelling is shown in Supplementary Fig. 1. The characteristics of colorectal neoplasia are listed in Supplementary Table 4. The dataset was used as an external test dataset. This study was approved by the ethics committee of the First Affiliated Hospital of Soochow University (approval number 2022098).

Human Endoscopists

To further evaluate the performances of the models, images from the test dataset (Soochow University/Shanghai Jiao Tong University) were evaluated by two independent endoscopists (junior, 3 years of endoscopic experience, and senior, more than 10 years of experience). They had not participated in reviewing or labelling the training images beforehand and were blind to the test set images. They were given 551 labelled images as a reference, then classified the 358 testing images independently. Moreover, to simulate a real clinical environment, two endoscopists were asked to complete the classification assignment within 1 h. A custom web interface was constructed to allow the reviewers to window, zoom, manipulate, and categorize each image.

Statistical Analysis

A confusion matrix was constructed and used to evaluate the performances of the models and endoscopists. TN, FN, TP, and FP indicate true negatives, false negatives, true positives, and false positives, respectively.

The accuracy represents the proportion of samples that were classified correctly among all samples.

The Matthew correlation coefficient (MCC) [18] measures the differences between the actual and predicted values. The MCC is the best single-value classification metric for summarizing the confusion matrix.

Cohen’s kappa [19] was used to measure the level of agreement between two raters or judges who each classified items into mutually exclusive categories.

A detailed explanation of MCC and Cohen’s Kappa is presented in the Supplementary Introduction.

Interpretation of Modelst-SNE Analysis

In this study, clustering patterns of predictions generated by models were visualized using t-SNE, an unsupervised technique for reducing the dimensionality of data [20]. By leveraging t-SNE with principal component analysis initialization, the high-dimensional vectors were processed and transformed into a two-dimensional visualization, revealing both the local structure and global geometry.

Grad-CAM

To enhance the interpretability of convolutional neural networks, Grad-CAM selectively highlights regions in input images that significantly contribute to prediction [21]. This technique could provide insights into how networks make decisions.

A detailed explanation of t-SNE and Grad-CAM is presented in the Supplementary Introduction.

留言 (0)

沒有登入
gif