Development of a modified 3D region proposal network for lung nodule detection in computed tomography scans: a secondary analysis of lung nodule datasets

Lung nodule datasets

In this secondary data analysis, three datasets of lung modules acquired on LDCT were used to evaluate the performance of the modified 3D RPN in this study. The Lung Nodule Analysis 2016 (LUNA16) dataset is the largest public dataset, comprising 1186 lung nodules from 888 patients [18]. This dataset has been used widely to evaluate a variety of deep-learning–based pulmonary nodule detection methods [7, 20,21,22]. In addition, two private ongoing pulmonary nodule datasets maintained by the Radiology Department at the National Cheng Kung University Hospital (NCKUH) were used in this study: the NCKUH Lung Nodule received Operation (LNOP) dataset that included patients undergoing surgical resection for lung nodules with histological confirmation, and the NCKUH Lung Nodule in Health Examination (LNHE) dataset that included patients with lung nodules that were found by LDCT.

The LUNA16 dataset contains 1186 lung nodules. To minimize the bias caused by variation in nodule number, approximate 1000 pulmonary nodules were retrieved from LNOP and LNHE datasets. Therefore, the data of 1027 lung nodules derived from 708 patients, which were collected in the LNOP dataset from Dec. 2018 to Dec. 2021, were retrieved for training and testing deep learning models. In addition, the data of 1000 lung nodules derived from 420 patients, which were collected in the LNHE from Jan. 2019 to Dec. 2020, were used in this study.

Moreover, for temporal validation, the whole 1027 and 1000 lung nodules from LNOP and LNHE, respectively, were used as train sets. Additional 348 and 500 lung nodules that were recently collected in LNOP and LNHE, respectively, were used as test sets.

Data annotation

The regions of interest (ROIs) of pulmonary nodules on axial images were manually labeled slice by slice by a thoracic radiologist (C.L.) and a thoracic surgeon (C.C.). After reaching consensus, 2D ROIs were converted to form 3D ROI. The 3D ROI of lung nodule was defined as the ground truth in this study.

3D region proposal network

The architecture of the proposed 3D RPN consisted of three blocks: backbone, neck, and head (Fig. 1A). The backbone network is used for feature extraction; the neck is used for feature fusion; and the head is used for dense prediction, which generates a prediction frame (anchor box) for each anchor point on the feature map. The training environment and training strategy is listed in Table 1.

Fig. 1figure 1

The architectural architecture of deep learning model. (A) 3D RPN. The boxes with anchor sizes of 5, 10, and 20 voxel sizes in each layer of detectors were used in the head block. Because the outputs included probability, x, y, z, d, the dimensions of each layer were 3* 5 = 15. (B) The complete pulmonary nodule detection system

Table 1 The training environment and training strategyArchitecture and modification of the pulmonary nodule detection system

The architecture of the 3D lung-nodule detection system is composed of three modules: pre-processing, deep learning model (3D RPN), and post-processing (Fig. 1B). 3D patch-based image input was adopted for pre-processing and post-processing. In the pre-processing module, to resample all CT images to the same size, the voxel spacing of all CT images was resampled to 1:1:1 mm. Each radiodensity value was converted from Hounsfield units (HU) (range, − 1200 to 600 HU) to a decimal between 0 and 1 and stored as a single-precision floating-point number. In the post-processing module, the extrapulmonary region is removed to reduce the false positive.

Pruning experiments

In the training, a series of pruning experiments were performed using the LUNA16 dataset to modify each block of the 3D RPN for better performance. Although the ResNet module is commonly used to construct the backbone network [9], we first replaced the ResNet module with the ResNeXt module in the training phase [15]. Subsequently, the design of the Cross Stage Partial Network (CSPNet) [23], was incorporated into the ResNeXt module to form the CSP-ResNeXt module (Fig. 2). The FPN design was then added to the neck and detector of the selected 3D RPN with the CSP-ResNeXt module, achieving feature fusion and multi-level outputs on the neck and detector. The next pruning experiment involved modification of the anchor assignment of the 3D RPN.

Fig. 2figure 2

The CSPNet and ResNeXt modules are integrated into the design of the backbone network

Nearest anchor assignment

Anchor assignment, also called training sample selection, is the training of an object detection model to decide which anchor boxes on the input image patch are positive, negative, or ignored samples based on the ground truth in the training phase [9]. Only positive and negative samples are involved in the used for calculating the loss function. Because most lung nodules were almost spherical in shape with varied sizes, the boxes with anchor sizes of 5, 10, and 20 voxel sizes in each layer of detectors were used in the head block of 3D RPN (Fig. 1A). Several studies of object detection have used fixed Intersection over Union (IoU) matching for anchor assignment [5, 24]; however, the IoU matching method often results in multiple positive samples (Fig. 3).

To search for a more suitable anchor assignment method for 3D lung nodule detection, we applied the nearest anchor method in this study. The nearest anchor method assigned the only one anchor box with anchor point closest to the ground truth as the positive anchor (Fig. 3). If multiple anchor boxes shared a common anchor point, only the anchor box closest to the ground truth in size was selected as the positive sample.

Fig. 3figure 3

Illustration of the nearest anchor method. The IoU-based method could recognize both blue and yellow anchor boxes as positive samples. In contrast, the nearest anchor method recognized the blue anchor as the positive anchor, because it had an anchor point closest to the ground truth (green)

Performance evaluation measures

The modified 3D RPN was then trained on the LUNA16, LNOP, and LNHE datasets. The performance of the modified 3D RPN was evaluated by 10-fold cross-validation using free-response receiver operating characteristic (FROC) and CPM. The FROC is the curve drawn by the model showing the true positive rate under different confidence thresholds. The average recall rate (sensitivity) was defined at 0.125, 0.25, 0.5, 1, 2, 4, and 8 false positives per scan as previously described [25, 26]. CPM, a metric derived from FROC, was the average recall of 7 specific false positives per scan on the FROC. CPM and sensitivity were expressed as mean ± standard deviation (SD). After training, the modified 3D RPN was then tested on the LUNA16, LNOP, and LNHE test sets, with the average recall rate set at 2 false positives per scan.

留言 (0)

沒有登入
gif