Utility of the deep learning technique for the diagnosis of orbital invasion on CT in patients with a nasal or sinonasal tumor

This retrospective study was approved by the Institutional Review Board of the Hokkaido University, and the requirement for patients’ written informed consent was waived.

Study population

Based on the medical records, we selected the cases of 233 patients who were treated at our hospital during the period from January 2009 to March 2021, with the following inclusion criteria: (1) patients with a pathologically confirmed malignant nasal or sinonasal tumor and (2) pretreatment coronal CT images reconstructed with soft tissue kernel including the tumor lesion. Some patients were excluded by the following exclusion criteria: (1) patients with an inoperable tumor regardless of the presence of the orbital invasion (e.g., a malignant hematologic tumor) (n = 20), (2) CT acquisition or reconstruction parameters (e.g., slice thickness, matrix size, convolution kernel, etc.) were not available (n = 12), (3) patients whose primary tumor was located clearly apart from the orbital wall (n = 32), and (4) intraorbital structures which severely affected the imaging findings of the target lesion, such as an artificial eye (n = 2). Ultimately, 167 patients and their 168 lesions (one of the patients had metachronous multiple cancers) were considered eligible for this study. We randomly selected 119 lesions as the training dataset to create the diagnostic model, and we used the other 49 lesions as the test dataset to evaluate the performance of the established model, each approx. 7:3 ratio (Fig. 1).

Fig. 1figure 1

Study population, study flow, and recruitment pathway

CT images

CT images of the total 168 lesions were obtained by various scanners from four vendors. Of the 168 lesions, 150 were post-contrast enhanced images and the other 18 lesions were non-contrast enhanced images. We used the coronal reconstructed CT images for the evaluation. Other image parameters were as follows. Slice thickness: 1–3 mm, matrix size: around 512 × 512, reconstruction kernel: soft tissue.

Determination of the final diagnosis

Two board-certified radiologists with 6 and 15 years of experience in head and neck radiology determined whether the quality of the CT images was appropriate for interpretation. Subsequently, all of the coronal CT images were divided into orbital invasion beyond periorbita-positive or -negative (hereinafter, referred to as “invasion-positive or -negative”) groups by these two radiologists in consensus, using a Digital Imaging and Communication in Medicine (DICOM) viewer (XTREK, J-MAC SYSTEM, Tokyo). Imaging findings of the bone destruction of orbital wall, the presence of irregularity between the tumor margin and orbital components, extraocular muscle involvement by the tumor, and the orbital fat obliteration around the tumor were carefully assessed, and invasion-positive or -negative status was determined in each image by taking all of imaging findings into consideration. After this image assessment, a total of 81 cases were diagnosed as invasion-positive and the other 87 cases were diagnosed as invasion-negative. Approximately 9 months after the above-mentioned consensus reading, the two board-certified radiologists re-evaluated all cases individually to divide them into invasion-positive and negative cases in order to determine the inter-observer agreement in the case-based invasion-positive and -negative decisions they had made.

After the case-based evaluation for the division of invasion-positive and -negative cases, for the preparation of the training dataset, we further performed slice-based evaluation to divide the invasion-positive and -negative slices within each positive case; this procedure was conducted in the training dataset only. In each orbital invasion-positive case, all slices in the range of evaluation (i.e., the range from the nasolacrimal duct orifice to the tip of the middle cranial fossa) were assessed and divided into invasion-positive slices and -negative slices by the abovementioned two radiologists. In contrast, all of the CT images in the orbital invasion-negative cases were assigned as invasion-negative slices.

To assess the variability of consensus reading by board certified radiologists in determining the invasion-positive and -negative status, other two board-certified radiologists with 7 and 12 years of experience evaluated all cases by consensus to divide into invasion-positive and -negative, as an additional consensus reading session.

Image analysisImage selection and post-processing for the deep learning analysis

We randomly selected a training data set to create a diagnostic model and a test data set to evaluate the performance of the established model (see below Results). First, image segmentation was performed on all coronal CT images. We manually drew the square regions of interest (ROIs) to encompass the orbital bone wall with an ROI size of approx. 12 cm2. If the target tumor was located around the midline and contacted the bilateral orbit, the ROIs were separately drawn for both right and left sides. We used CT images in the range from the nasolacrimal duct orifice to the tip of the middle cranial fossa for the ROI placement. However, specific CT image slices in which the primary tumor was observed to be far from the orbital wall were excluded for further analysis. The segmented images of the right orbit were then horizontally flipped and all aligned, and they were observed in the same manner as the left orbit. A triangular mask was applied on the upper and lateral parts of the orbit. The CT window of all images was adjusted to the window level 60/window width 300 Hounsfield units (HU). Finally, each processed image was output as a Joint Photographic Experts Group (JPEG) document. The image processing steps are illustrated in Fig. 2.

Fig. 2figure 2

Image preprocessing on CT images for the deep learning analysis. First, a square region of interest (ROI) was manually placed to fully include the orbital wall on coronal CT images (red arrow). Next, segmented images by the ROI were extracted as continuous slices including the tumor. Then, right-side lesion images were inverted to the left side (flipped horizontally) and all images were aligned, as the lesion is shown at the left side. Thereafter, the top and outer areas in the image were masked (white asterisk). Finally, these processed images were fed into the data augmentation process with image rotation and/or a shift for training data

For the preparation of the invasion-positive group in the training dataset, we included only the specific slices that were judged as invasion-positive in the slice-based evaluation (see above: Determination of the final diagnosis); this dataset consisted of 408 images from 56 lesions. In contrast, we included slices on which the tumor lesion was in contact with the orbital wall but had not invaded periorbita in the invasion-negative group; this dataset consisted of 635 images from 63 lesions. Before the training session, data augmentation was performed to improve the robustness of the model, with random rotation and vertical and/or horizontal shifting for each image; a total of 10 additional images were generated for each image.

In the test dataset, all images in all lesions (both invasion-positive and -negative) were used; this consisted of 25 invasion-positive lesions (191 invasion-positive images and 123 invasion-negative images; every lesion included at least one invasion-positive image) and 24 invasion-negative lesions (326 invasion-negative images). The test dataset was evaluated without a data augmentation procedure.

Deep learning analysis

We classified the invasion-positive or -negative status of the coronal CT images by using transfer learning from a pre-trained CNN algorithm devoted to image classification. The original model used in this work was the Visual Geometry Group 16 (VGG16) model, developed at Oxford University in 2015, which had been trained and evaluated on the ImageNet collection (http://image-net.org/index) [8]. The VGG16 model is composed of 16 layers with a combination of five convolutional blocks (13 convolutional layers) and three fully connected layers, finishing with a dense layer that operates the final classification through 1000 different categories proposed by the ImageNet dataset [8]. Thanks to its simplicity, the VGG16 model is well adapted for transfer learning for a small dataset [9]. For the model’s training, the last fully connected layer was trained, whereas the parameters of other prior layers were fixed to the original weights of the VGG16 model. This allowed us to keep the more generic features of the VGG16 model and adapt the model to the CT images through a limited number of trainable parameters.

For the training session, the stochastic gradient descent with momentum (sgdm) optimizer was used. Hyperparameters were set to 15 epochs, a mini-batch size of 32, and the learning rate 1.0 × 10− 5. In the transfer learning, 30% of the training data was used as internal validation during the training. The VGG16 model is able to convert the input image into a probability regarding the category in which it would belong, and in our present study the VGG16 model outputted a binary classification of invasion-positive or -negative status on the test dataset. The model was established using an Ubuntu 18.04 long-term support (LTS)-based server with a Core i9 10980XE 18core/36thread 3.0-GHz central processing unit (CPU), four NVIDIA Quadro RTX8000 graphics processing unit (GPU) cards, and 128-GB (16GB × 8) DDR4–2933 quad-channel memory for training and validation. The time required for an epoch was approx. 14 min, and approx. 13 epochs were enough to reach the final score evaluated on the test dataset. All image analyses were performed using MATLAB (R2021a, MathWorks, Natick, MA, USA) and Metavol software (https://www.metavol.org) [10].

Visual evaluation by radiologists

Two general radiologists with 6 and 3 years of experience who were not specialists in the interpretation of head and neck images reviewed all of the images in the test dataset (49 lesions; 25 invasion-positive/24 negative) and independently determined whether the tumor lesion was invasion-positive or -negative. They referred to all of the slices with a complete field of view (not segmented images) for the evaluation. For patients with lesions bordering the bilateral orbits, they evaluated the right and left sides separately to determine the presence of invasion. At approx. 2 months after the first reading session, the same two general radiologists reviewed all of the images in the test dataset again to determine the invasion-positive or -negative status, referring to the patient-based diagnoses provided by the CNN model developed with VGG-16 described above.

Statistical analyses

The distribution of patient characteristics between the training and test cohorts were compared using the χ2-test for categorical variables and the Mann-Whitney U-test for continuous variables.

We used the kappa coefficient to evaluate the inter-observer agreement regarding the case-based invasion-positive and -negative decisions made by the two board-certified radiologists with extensive head-and-neck imaging experience. The kappa coefficient was also used to assess the agreement between one of the board-certified radiologists and the result of the consensus reading, and between the other board-certified radiologist and the result of the consensus reading. Kappa values < 0.40 were interpreted as poor agreement, 0.41–0.57 as fair agreement, 0.58–0.74 as good agreement, and > 0.75 as excellent agreement [11].

In addition, the agreement between the result of first consensus reading (i.e., the final diagnosis) and that of additional consensus reading by other two board-certified radiologists was assessed using the Kappa coefficient. A receiver operating characteristic (ROC) curve analysis to calculate the area under the curve (AUC) was also performed using the result of the additional consensus reading by setting the result of first consensus reading as the gold standard.

The diagnostic performances for the test dataset obtained with 1) the developed CNN diagnostic model, 2) the general radiologists without the developed CNN diagnostic model’s assistance, and 3) the general radiologists with the developed CNN diagnostic model’s assistance were respectively evaluated. When we evaluated the CNN diagnostic model, we first performed slice-based diagnoses by dividing all slices into those indicating the invasion-positive or -negative status by using the developed CNN model in each patient. Each of the abovementioned slice-based diagnoses was then converted to an individual-based diagnosis by adding up the number of consecutive slices that the CNN model determined to be invasion-positive per one patient. A ROC curve analysis was performed to determine the optimal number of consecutive slices for diagnosing invasion-positive or -negative status as a patient-based diagnosis, using the Youden index. The diagnostic performance was assessed by computing the following performance metrics: the AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The diagnostic performance achieved by each of the general radiologists without the CNN model’s assistance was compared with that obtained by the CNN model alone and that by the same radiologist with the CNN model’s assistance, respectively. The comparison of the AUC was calculated by the χ2-test. Statistical significance was set at p-values < 0.05. BellCurve for Excel (Social Survey Research Information Co., Tokyo) was used to perform all statistical analyses.

留言 (0)

沒有登入
gif