A novel end-to-end deep learning framework for segmenting polyps from colonoscopy video frames.
A modified and enhanced Residual Block is proposed that suppresses the noise and preserves the low-level feature maps for a more accurate semantic segmentation.
A special learning technique is introduced with a novel attention mechanism for obtaining an accurate segmentation map.
A novel attention mechanism to capture the refined attention maps regardless of the size and shape of the polyp, also under improper illumination conditions.
Design of a competitive and robust model with consistent performance over the benchmark CVC-ClinicDB dataset and the Kvasir-SEG Dataset.
This paper is organized as follows: The related works are discussed in Section 2. The proposed methodology is elaborated on in Section 3. The results and discussions of all experiments are presented in Section 4 followed by a conclusion in Section 5. 2. Related WorksMany machine learning and deep learning methods were proposed by various researchers in the field of medical image analysis that includes, including lesion identification in pulmonary nodules, lung nodules, colonial cancer, brain lesion segmentation, polyp segmentation, etc. Ref. [14] presented a detailed survey on image-based cancer detection using various deep learning architectures. In their study, they have outlined the methods suitable for different types of cancers, including breast cancer, lung cancer, skin cancer, prostate cancer, brain cancer, colonial cancer, cervical cancer, bladder cancer, etc. They discussed the issues of the lack of large data sets required for training the better models and the various available solutions like image augmentation and transfer learning to address the same. Computer Aided Detection (CAD) plays a major role in detecting lesions by providing assistance in the workflow of the radiologist. Ref. [15] studied the problems in identifying lesions in pulmonary nodules. To address the issues of highly imbalanced data and to reduce the false negatives in classification, they have proposed a multi-kernel approach. In their work, feature fusion and oversampling have been employed to select the important subset of relevant features. Ref. [16] proposed a deep learning-based technique for lung nodule detection on low-dose thoracic helical CT (LDCT) dataset and exploited the Convolutional Neural Network (CNN) and the traditional Artificial Neural Network (ANN). In their study, they observed that CNN architecture is good at capturing low-level and high-level features compared to ANN. In this research work, we mainly focus on the polyp segmentation problem. Many solutions were proposed to automate the segmentation process of polyps in colon screening. In many cases, polyps have well-defined shapes and structures. Hence, earlier methods tried to leverage this to perform polyp segmentation. Ref. [17] proposed the usage of the canny edge detector technique to process the images and identify relevant edges with the assistance of template matching techniques. Following this, Ref. [3] studied the application of active contours for the segmentation of polyps, but these template-based models are not suitable for detecting small polyps. Many texture-based methods were also introduced as a solution to the polyp segmentation problem. Karkanis et al. [18] used Grey-Level Co-occurrence Matrix (GLCM) and wavelet methods to detect polyps. Ref. [19] proposed an SVM-based method to detect and classify the abnormalities in endoscopic images. They mainly focused on feature extraction and developed an algorithm that can assign the weights for the relevant features and to remove the useless ones from the hand-crafted features that were extracted from the endoscopic image. With their deep sparse SVM-based approach, they were able to reduce the feature dimension and build a better model for classifying the endoscopic images on their own dataset. In another work, an SVM-based approach with hand-crafted features was applied [19] to detect the abnormalities in endoscopic images. As their SVM model could not handle the noisy and poor-quality images, they introduced a rejection stage. The image quality was pre-assessed based on its pixels, and only if it was at an acceptable level was it fed to the next segmentation stage by SVM. Otherwise, the image was rejected in the pre-processing stage itself, thereby limiting its usage. Ref. [20] attempted to characterize the polyps by traditional methods such as edge detection, feature extraction, and feature reduction and then applying an ensemble-based approach for classification. But these handcrafted features were not accurate for delineating the polyp boundary.For colonic polyp measurement, Ref. [21] followed a topographical height map approach. They computed the topographic features from the generated height maps of the polyp. The concentric patterns from the height maps were then used for the texture analysis. By applying the SVM classifier, the normal surface of the colon was differentiated from the colonic polyps. They analyzed the experimental measurements with that of the height map approach, and it was found to be more efficient than the other methods. Recently, a rapid change has been observed in these tasks as Convolutional Neural Networks (CNN) are being employed to provide more robustness compared to hand-crafted features.All the above approaches are not sufficient for the polyp segmentation task, as they fail to capture the contextual information from the images. To address the above problems, Fully Convolutional Networks (FCN) proposed by [22] were adapted for semantic segmentation. Later, U-Net [8] architecture was widely used for developing an end-to-end model for semantic segmentation. Following these works, Ref. [9] proposed a standard FCN for the segmentation of polyps and used the Random Forest algorithm to decrease the false positive results. From neural machine translation to sentence classification, applying the attention mechanism has allowed models to focus on important features, resulting in less noisy, more refined feature maps [23]. Recently, Ref. [24] tried to incorporate an attention mechanism, and their study suggests that attention mechanisms can substantially reduce the noise in the output and help the model generate a more refined output map. Hence, it can be concluded that a better attention mechanism can further improve the performance of the models. Ref. [25] tried various methods, from Machine Learning to Deep CNN models, and suggested various methods for classifying Gastrointestinal (GI) tract diseases.As deep learning methods have proven to learn robust features for segmentation problems, we have applied them to get a robust model for the segmentation of polyps of varied textures, shapes, and sizes. However, recent deep learning approaches in polyp segmentation output noisy outputs and broken segmentation maps. Hence, the incorporation of a better attention mechanism can help the model generate a less noisy and more refined output map. In this work, we have proposed a Guided Attention Residual Network (GAR-Net) by employing both residual blocks and attention mechanisms to obtain a refined segmentation map for polyp segmentation.
Materials and Methods should be described with sufficient details to allow others to replicate and build on the published results. Please note that the publication of your manuscript implies that you must make all materials, data, computer code, and protocols associated with the publication available to readers. Please disclose at the submission stage any restrictions on the availability of materials or information. New methods and protocols should be described in detail, while well-established methods can be briefly described and appropriately cited.
Research manuscripts reporting large datasets that are deposited in a publicly available database should specify where the data have been deposited and provide the relevant accession numbers. If the accession numbers have not yet been obtained at the time of submission, please state that they will be provided during review. They must be provided prior to publication.
Interventional studies involving animals or humans, and other studies that require ethical approval, must list the authority that provided approval and the corresponding ethical approval code.
5. ConclusionsIn this paper, we presented GAR-Net: Guided Attention-based Residual Network, which is an architecture designed to address the need for a more accurate and refined segmentation map for the colorectal polyps found in colonoscopy examinations. The proposed architecture takes advantage of residual blocks, and attention mechanisms to output refined segmentation maps. We have modified the residual block by including a convolution layer in the skip connection to suppress the noise and capture the refined low-level feature map. We have proposed a new attention mechanism that successfully captures a refined attention map both in earlier and in deeper layers of the model. The Guided Attention mechanism proposed for this GAR-Net architecture generates a more refined output map regardless of improper illuminations, providing a robust model for segmenting polyps from colonoscopy video frames.
Comprehensive examinations and experiments were conducted using the benchmark CVC-ClinicDB dataset and Kvasir-SEG dataset to evaluate and assess the proposed model with the existing state-of-art architectures. Through experimental results, it is shown that our proposed GAR-Net model can provide a reliable and robust model with the highest Dice co-efficient and mIoU score, outperforming other proposed semantic segmentation models such as FCN8, U-Net, U-Net with Gated Attention, ResUNet, SegNet, and DeepLabv3. The computation overload is slightly high in our proposed GAR-Net architecture, as we used normal convolution over depth-wise separable convolution. We did an experiment with depth-wise separable convolution and found it quite detrimental, especially to the attention mechanisms. There is further research scope to improve this model by making it lightweight and incorporating spatial information in Guided Attention Learning. We can conclude that the proposed GAR-Net architecture can be considered a strong baseline for further investigation in the direction of developing a robust and clinically useful method for polyp segmentation from colonoscopy video frames.
留言 (0)