Diagnostics, Vol. 13, Pages 123: GAR-Net: Guided Attention Residual Network for Polyp Segmentation from Colonoscopy Video Frames

1. IntroductionColorectal Cancer is one of the most common variants of cancer found in human beings. The predecessor of this cancer is the formation of polyps which are found in the colon region. The malignancy degree assessment of colorectal adenocarcinoma consists of two stages. The first stage involves the detection and delineation of polyps from the colon region via colonoscopy examinations. In this stage, the polyps are identified and delineated by an expert clinician. The second stage involves biopsy sample analysis from the segmented polyps using Hematoxylin and Eosin (H&E) staining technique. To reduce the risk of colorectal cancer, the polyps are often analyzed under H&E Staining to assess the malignancy degree and eventually resected [1]. The morphological segmentation of the gland for histopathology is commonly performed by pathologists to determine the stage of cancer/tumor. Accurate segmentation of the glands is a crucial stage in obtaining reliable morphological statistics for quantitative diagnosis. Hence, this is generally performed by the expert pathologist who segments and studies the structure of the glands in the biopsy sample. However, to proceed with the H&E staining on the biopsy sample, there is a need to detect, segment, and delineate polyps from colonoscopy video frames. Unfortunately, even with the careful perusal of each frame in a colonoscopy video, an expert clinician might miss some polyps [2]. Hence, there is a need for a real-time Computer-Aided Detection (CAD) system that can detect and segment polyps in the first stage itself, which is the focus of this research work. The development of such a system can assist clinicians with the delineation of polyps from the colon and can reduce the miss-rate of polyps. Automated polyp segmentation is a challenging task mainly due to the varied appearance, shape, and size of polyps. Even though there is a progressive change in the texture, size, and shape of the colorectal polyps in the later stages, it is small and may have no obvious differentiating texture appearance in the earlier stages. This makes it difficult to differentiate with intestinal tissue. Some polyps might even take the entire field of view in the colonoscopy camera. Also, each frame is susceptible to image artifacts, the pattern of shadows, highlights, and even occlusions due to the illuminations in colon screening. In some types of polyps, there is no obvious boundary between a polyp and the surrounding tissue, and the same polyp may look significantly different depending on the camera angle. So, the reliability of polyp segmentation by manual delineation is greatly affected by the lab’s guidelines and the experience of the clinician. Hence, it is hard to determine the gold standard for an automatic segmentation method to deal with all possible types of polyps efficiently, thereby increasing the difficulty of developing a reliable polyp segmenting CAD system. The earlier techniques, such as template matching, contour detections, and texture-based analysis, required manual intervention. Many machine learning and computer vision techniques were applied to solve the polyp segmentation problem. Ref. [3] studied the application of active contours for the segmentation of polyps. As the above approach relies heavily on pre-defined template and shape models, it failed to detect small polyps. Ref. [4] introduced the “depth of valley” concept to detect more general polyp shapes-segmenting the polyps through evaluation of their relationship between the detected edge and the pixels. Their region segmentation algorithm could not handle all types of polyps and lacked robustness. It not only fails to identify some of the small polyps but also segments the endoluminal scene incorrectly as a polyp. To address the challenges in the measurement of segmented colon, different image processing techniques along with statistical analysis were performed [5], but it is a time-consuming and tedious process and could not detect a new type of polyps. Also, the above approaches do not consider contextual information and are not robust.Recently, deep learning techniques have proven to solve many real-world problems with high robustness. Many variants of deep neural network architectures have been reported in the literature for the task of semantic segmentation for various applications, viz., remotely sensed data segmentation [6], road-scene segmentation, indoor scene segmentation [7], and biomedical image segmentation [8]. The study by [9] shows the superior performance of a Fully Convolutional Network (FCN) for semantic segmentation in colonoscopy images but is not able to yield an accurate prediction in the case of noisy images. In another work by [10], FCNs are employed for the segmentation of polyps along with a probabilistic-based post-processing algorithm. In this approach, a heuristic-based threshold was used to differentiate the polyp from the normal tissues, which is error-prone and could not characterize well all types of polyps that are irregular in shape and size. The incorporation of the attention mechanism in deep neural networks can help the model generate a less noisy and more refined output map. For the diagnosis of coronary artery diseases [11], the attention-based vessel segmentation approach has been applied by adding low-level and high-level features. The sparse contour attention mechanism has been applied to obtain accurate region boundaries for liver segmentation in abdominal CT images [12]. They combined the sparse contour attention along with an auto-context algorithm and applied the self-supervised algorithm to improve the performance of segmentation.In this research work, we propose a novel deep end-to-end architecture for segmenting polyps from colon screening frames by employing a modified residual network with a special attention mechanism. In the proposed approach, the lower semantic information captured in initial layers is also considered to handle different sizes, and shapes of polyps and to suppress the noise in the input. A novel Guided Attention mechanism is proposed that allows the model to generate and apply attention maps for each feature map in the input to obtain a refined and accurate segmentation output. We evaluated our model on two datasets, viz., the CVC-ClinicDB polyp dataset and the recent Kvasir-Seg [13] dataset and achieved state-of-the-art performance over other proposed deep learning models. Our significant contributions to this research work are summarized below:

A novel end-to-end deep learning framework for segmenting polyps from colonoscopy video frames.

A modified and enhanced Residual Block is proposed that suppresses the noise and preserves the low-level feature maps for a more accurate semantic segmentation.

A special learning technique is introduced with a novel attention mechanism for obtaining an accurate segmentation map.

A novel attention mechanism to capture the refined attention maps regardless of the size and shape of the polyp, also under improper illumination conditions.

Design of a competitive and robust model with consistent performance over the benchmark CVC-ClinicDB dataset and the Kvasir-SEG Dataset.

This paper is organized as follows: The related works are discussed in Section 2. The proposed methodology is elaborated on in Section 3. The results and discussions of all experiments are presented in Section 4 followed by a conclusion in Section 5. 2. Related WorksMany machine learning and deep learning methods were proposed by various researchers in the field of medical image analysis that includes, including lesion identification in pulmonary nodules, lung nodules, colonial cancer, brain lesion segmentation, polyp segmentation, etc. Ref. [14] presented a detailed survey on image-based cancer detection using various deep learning architectures. In their study, they have outlined the methods suitable for different types of cancers, including breast cancer, lung cancer, skin cancer, prostate cancer, brain cancer, colonial cancer, cervical cancer, bladder cancer, etc. They discussed the issues of the lack of large data sets required for training the better models and the various available solutions like image augmentation and transfer learning to address the same. Computer Aided Detection (CAD) plays a major role in detecting lesions by providing assistance in the workflow of the radiologist. Ref. [15] studied the problems in identifying lesions in pulmonary nodules. To address the issues of highly imbalanced data and to reduce the false negatives in classification, they have proposed a multi-kernel approach. In their work, feature fusion and oversampling have been employed to select the important subset of relevant features. Ref. [16] proposed a deep learning-based technique for lung nodule detection on low-dose thoracic helical CT (LDCT) dataset and exploited the Convolutional Neural Network (CNN) and the traditional Artificial Neural Network (ANN). In their study, they observed that CNN architecture is good at capturing low-level and high-level features compared to ANN. In this research work, we mainly focus on the polyp segmentation problem. Many solutions were proposed to automate the segmentation process of polyps in colon screening. In many cases, polyps have well-defined shapes and structures. Hence, earlier methods tried to leverage this to perform polyp segmentation. Ref. [17] proposed the usage of the canny edge detector technique to process the images and identify relevant edges with the assistance of template matching techniques. Following this, Ref. [3] studied the application of active contours for the segmentation of polyps, but these template-based models are not suitable for detecting small polyps. Many texture-based methods were also introduced as a solution to the polyp segmentation problem. Karkanis et al. [18] used Grey-Level Co-occurrence Matrix (GLCM) and wavelet methods to detect polyps. Ref. [19] proposed an SVM-based method to detect and classify the abnormalities in endoscopic images. They mainly focused on feature extraction and developed an algorithm that can assign the weights for the relevant features and to remove the useless ones from the hand-crafted features that were extracted from the endoscopic image. With their deep sparse SVM-based approach, they were able to reduce the feature dimension and build a better model for classifying the endoscopic images on their own dataset. In another work, an SVM-based approach with hand-crafted features was applied [19] to detect the abnormalities in endoscopic images. As their SVM model could not handle the noisy and poor-quality images, they introduced a rejection stage. The image quality was pre-assessed based on its pixels, and only if it was at an acceptable level was it fed to the next segmentation stage by SVM. Otherwise, the image was rejected in the pre-processing stage itself, thereby limiting its usage. Ref. [20] attempted to characterize the polyps by traditional methods such as edge detection, feature extraction, and feature reduction and then applying an ensemble-based approach for classification. But these handcrafted features were not accurate for delineating the polyp boundary.For colonic polyp measurement, Ref. [21] followed a topographical height map approach. They computed the topographic features from the generated height maps of the polyp. The concentric patterns from the height maps were then used for the texture analysis. By applying the SVM classifier, the normal surface of the colon was differentiated from the colonic polyps. They analyzed the experimental measurements with that of the height map approach, and it was found to be more efficient than the other methods. Recently, a rapid change has been observed in these tasks as Convolutional Neural Networks (CNN) are being employed to provide more robustness compared to hand-crafted features.All the above approaches are not sufficient for the polyp segmentation task, as they fail to capture the contextual information from the images. To address the above problems, Fully Convolutional Networks (FCN) proposed by [22] were adapted for semantic segmentation. Later, U-Net [8] architecture was widely used for developing an end-to-end model for semantic segmentation. Following these works, Ref. [9] proposed a standard FCN for the segmentation of polyps and used the Random Forest algorithm to decrease the false positive results. From neural machine translation to sentence classification, applying the attention mechanism has allowed models to focus on important features, resulting in less noisy, more refined feature maps [23]. Recently, Ref. [24] tried to incorporate an attention mechanism, and their study suggests that attention mechanisms can substantially reduce the noise in the output and help the model generate a more refined output map. Hence, it can be concluded that a better attention mechanism can further improve the performance of the models. Ref. [25] tried various methods, from Machine Learning to Deep CNN models, and suggested various methods for classifying Gastrointestinal (GI) tract diseases.

As deep learning methods have proven to learn robust features for segmentation problems, we have applied them to get a robust model for the segmentation of polyps of varied textures, shapes, and sizes. However, recent deep learning approaches in polyp segmentation output noisy outputs and broken segmentation maps. Hence, the incorporation of a better attention mechanism can help the model generate a less noisy and more refined output map. In this work, we have proposed a Guided Attention Residual Network (GAR-Net) by employing both residual blocks and attention mechanisms to obtain a refined segmentation map for polyp segmentation.

Materials and Methods should be described with sufficient details to allow others to replicate and build on the published results. Please note that the publication of your manuscript implies that you must make all materials, data, computer code, and protocols associated with the publication available to readers. Please disclose at the submission stage any restrictions on the availability of materials or information. New methods and protocols should be described in detail, while well-established methods can be briefly described and appropriately cited.

Research manuscripts reporting large datasets that are deposited in a publicly available database should specify where the data have been deposited and provide the relevant accession numbers. If the accession numbers have not yet been obtained at the time of submission, please state that they will be provided during review. They must be provided prior to publication.

Interventional studies involving animals or humans, and other studies that require ethical approval, must list the authority that provided approval and the corresponding ethical approval code.

5. Conclusions

In this paper, we presented GAR-Net: Guided Attention-based Residual Network, which is an architecture designed to address the need for a more accurate and refined segmentation map for the colorectal polyps found in colonoscopy examinations. The proposed architecture takes advantage of residual blocks, and attention mechanisms to output refined segmentation maps. We have modified the residual block by including a convolution layer in the skip connection to suppress the noise and capture the refined low-level feature map. We have proposed a new attention mechanism that successfully captures a refined attention map both in earlier and in deeper layers of the model. The Guided Attention mechanism proposed for this GAR-Net architecture generates a more refined output map regardless of improper illuminations, providing a robust model for segmenting polyps from colonoscopy video frames.

Comprehensive examinations and experiments were conducted using the benchmark CVC-ClinicDB dataset and Kvasir-SEG dataset to evaluate and assess the proposed model with the existing state-of-art architectures. Through experimental results, it is shown that our proposed GAR-Net model can provide a reliable and robust model with the highest Dice co-efficient and mIoU score, outperforming other proposed semantic segmentation models such as FCN8, U-Net, U-Net with Gated Attention, ResUNet, SegNet, and DeepLabv3. The computation overload is slightly high in our proposed GAR-Net architecture, as we used normal convolution over depth-wise separable convolution. We did an experiment with depth-wise separable convolution and found it quite detrimental, especially to the attention mechanisms. There is further research scope to improve this model by making it lightweight and incorporating spatial information in Guided Attention Learning. We can conclude that the proposed GAR-Net architecture can be considered a strong baseline for further investigation in the direction of developing a robust and clinically useful method for polyp segmentation from colonoscopy video frames.

留言 (0)

沒有登入
gif