MCFA-UNet: Multiscale cascaded feature attention U-Net for liver segmentation

The liver is the largest substantial organ among human abdominal organs. Liver cancer, a malignant tumor occurring in the liver, is a major public health problem and one of the most common malignant tumors worldwide. With the development of science and technology, medical imaging plays an essential role in computer-aided diagnosis. Medical imaging techniques such as B-ultrasound imaging, nuclear magnetic resonance, and computed tomography significantly contribute to the diagnosis of liver diseases. Regarding traditional medical image processing methods, physicians manually mark image data. It is time-consuming and laborious and dependent on the experience of physicians. Therefore, it is of great significance to find an efficient and accurate method for automatic liver segmentation using medical imaging data for the diagnosis and treatment of liver diseases.

As deep learning algorithms are constantly making breakthroughs in recent years, Convolutional Neural Network (CNN) have exhibited excellent effects in object detection and image segmentation. Therefore, many researchers have applied CNN in the field of medical image segmentation and achieved outstanding results in various medical image segmentation tasks, such as lung nodules [1], [2], brain tumors [3], [4], and fundus blood vessel segmentation [5], [6]. In 2015, Ronneberger et al. [7] designed a deep fully convolutional network U-Net for biomedical image segmentation. It included a coding path used to capture feature information and a decoding path that can accurately position. Meanwhile, a skip connection between the coding path and the decoding path was achieved to fuse feature information of different levels, reducing the loss of information and making the U-Net more accurate in pixel positioning.

Liver segmentation has the characteristics of scale diversity, background complexity, boundary ambiguity and low contrast intensity of organs. Accurate and automatic segmentation of liver can greatly assist the clinical diagnosis and analysis. Liu et al. [8] improved the 2D U-Net and proposed a new framework, IU-Net. By increasing the depth of the model, more advanced semantic features were obtained, and the segmentation results of the IU-Net were refined using the graph cutting method to effectively segment liver images. Li et al. [9] established a bottleneck supervision model (BS-UNet), which consisted of a coding network and a segmentation network. The encoding network was trained into an autoencoder to obtain the tag-mapped encoding, and the segmentation network was trained into additional supervision, so as to lower information loss and improve the segmentation effect.

Since the liver CT image of each patient is a 3D image composed of multiple 2D slices, the semantic information within and between 3D image slices cannot be fully utilized when a 2D segmentation network is used for liver segmentation tasks, limiting the accuracy of the segmentation results. In 2016, Cicek et al. [10] proposed a volume segmentation network 3D U-Net learning from volume images with a few annotations. This network expanded the network structure of the U-Net, replaced all two-dimensional operations with three-dimensional ones, and achieved better segmentation results with feature information between slices with a few annotations. Li et al. [11] designed the H-DenseUNet model in 2018. This model made full use of the information between CT image slices by combining with 2D U-Net and 3D U-Net models to obtain a higher segmentation accuracy and alleviate the problem of high memory consumption of a single 3D network model. Lei et al. [12] presented a lightweight 3D model LV-NET for liver segmentation. The reverse residual bottleneck block designed by them can not only reduce the number of parameters but also extract intersection information well and improve the training process through 3D deep supervision, and thus better results for liver segmentation were obtained. The above methods verified that three-dimensional U-NET networks can effectively utilize the feature information within and between CT image slices to acquire better segmentation effects by sparsely labeled volume images. Therefore, the three-dimensional U-Net network is taken as the basic network structure for improvement in this paper.

MultiResUNet is an improved network based on the U-Net proposed by Ibtehaz et al. [13] to obtain multiscale feature and reduce the network parameters. The MultiRes block proposed by them adopted continuous lightweight convolution to extract multiscale feature and the outputs of each convolution layer is connected. Yang et al. [14] Replaced the common convolution in the MultiRes block with the dilated convolution, expanding the receptive field size of the convolution kernel without increasing the parameters.

However, MultiRes block only performs a single concatenate operation, on the output multiscale features, and cannot make full use of the multiscale feature information, resulting in limited segmentation ability for liver images with variable shape and fuzzy boundary. Following the analysis and research of the above coding path feature module and the issues in the liver segmentation task, an improved 3D U-Net network structure was proposed in this paper. Its main contributions are:

(1)

The multiscale cascaded feature attention module was designed to perform feature extraction of the coding path and make full use of the integrated multiscale cascaded features to enhance the network's segmentation ability of liver boundary features and detail features.

(2)

The attention-gate mechanism was introduced in the skip connection process to reduce the semantic gap between the coding path and the decoding path, suppress irrelevant regional features, and strengthen the sensitivity and accuracy of dense tag prediction of the network by integrating feature information of different levels.

(3)

A new loss function was adopted to overcome data imbalance in medical images.

留言 (0)

沒有登入
gif