J. Imaging, Vol. 8, Pages 327: Fully Automated Segmentation Models of Supratentorial Meningiomas Assisted by Inclusion of Normal Brain Images

1. Introduction

Meningiomas are tumors in the meninges that cover the brain and spinal cord. As many of them are asymptomatic, they are often accidentally detected during magnetic resonance imaging (MRI) examinations, for example, during routine medical check-ups. Patients who experience incidental meningioma discovery undergo routine MRI scans to monitor the tumor’s growth. Two-dimensional measurements of tumors can potentially underestimate the risk of tumor growth. In comparison, volumetric measurements can enable the tumor growth to be monitored with high accuracy.

However, the manual measurement of tumor volume is a laborious task, making treatment planning challenging. There is variability in measurement due to (1) varying expertise levels between radiologists and (2) inherent human errors. Hence, the automation of tumor segmentation is imperative for tumor monitoring.

There has been substantial progress in the field of 3D medical image segmentation based on deep learning, especially with the advent of U-Net [1], because U-Net is able to learn feature maps from many slices. Since then, there have been breakthrough studies stemming from U-Net [2,3,4]. Due to the specificities of data representation from image to sentence, we are very aware that certain deep learning structures perform better than others [5]. The attention mechanism is very popular in the field of Natural Language Processing (NLP), because it has allowed us to enrich the input data features and guide the neural architecture to enable more relevant elements to be found [6].There have been attempts to incorporate the attention module to U-Net for use in medical image segmentation [2,3,7,8]. Yeung et al. introduced novel dual attention-gated U-Net architecture, called Focus U-Net, for use in polyp segmentation in colonoscopy images [2].To train any model, a large dataset, good model, and a well-defined loss function and optimizer are needed [9]. Firstly, it is challenging to collect a good amount of medical imaging dataset. In machine learning communities, transfer learning from another domain is a conventional strategy. Transfer learning is widely used to overcome this limitation.The glioma dataset from the purpose of the Brain Tumor Segmentation (BraTS) benchmark [10,11,12] has been used to evaluate various state-of-the-art segmentation methods. While glioma segmentation methods are being actively studied using the BraTS benchmark [13,14,15,16], relatively few methods have been reported for use in meningioma segmentation, especially from MRI images. The utilization of gliomas from the BraTS dataset in order to enable meningioma segmentation is a domain adaptation problem. Effort has been made to overcome such a problem: Ouyang et al. [17] achieved a state-of-the-art performance in 3D CT medical image segmentation when the model was pre-trained with a different modality: 3D MRI.Recently, Laukamp et al. [18] successfully segmented lesions in meningioma patients using a three-dimensional (3D) neural network (CNN) trained solely with the BraTS benchmark. Later, Laukamp et al. [19] demonstrated an improved meningioma segmentation model which was trained using the same 3D CNN but with meningioma MRI images alone. It was postulated that training models with matched tumor types was superior to borrowing a model developed for a different tumor type. Bouget et al. reported a meningioma segmentation model using a large dataset, which achieved good overall performance, while its performance was compromised when used for small tumors [20].In numerous brain tumor segmentation studies [18,19], the structures of the lesions are typically classified into the categories of contrast-enhancing tumors, non-contrast-enhancing tumors, necrosis, and edema. However, meningioma lesions are much more clinically diverse, ranging from solid tumors to tumors with necrosis, edema, cysts, calcification, or heterogeneous enhancement. Such diversified lesions are expected to hinder the efficient training of neural networks, as these structures are assumed to be noisy. Hence, previous studies have only focused on defined lesions [18,19]. To reflect the actual diversity of real-world data in the clinic, we used meningioma data containing diverse radiological findings to build an automatic deep-learning-based segmentation model.Recently, the fine-tuning of U-Net-structured neural networks (TernausNet) pre-trained using large amounts of data such as ImageNet [21] has provided good performances in two-dimensional (2D) medical image segmentation [17]. A model built from a non-medical domain has fared well in this task; however, enhanced model performance can be expected if we train a model using medical images. Inspired by previous studies [16,17,18,19], we attempted to utilize a model that was trained with BraTS glioma images. We chose to use nnU-net, which was proposed by Isensee et al. [22], as the neural network structure. Then, we attempted to extend the definition of soft Dice loss, proposed by Milletari et al. [23], in order to incorporate brain MRI images without lesions; we named this balanced Dice loss (BDL). Finally, we implemented an Adam optimizer to minimize the loss function.In this paper, we reported ablation studies regarding the training strategies used when only scarce medical datasets were available. An automated meningioma segmentation model was made using a series of steps: transfer learning with BraTS glioblastoma and fine-tuning with meningioma and radiologically clean brain images. We implemented a modified version of soft Dice loss for an nnU-net model [22] to enable the model to learn all of the features from our dataset. 3. ResultsOur meningioma dataset included the MRI scans of follow-up patients. To prevent the inclusion of the same patient’s MRIs in both the training and test sets, the test set (17 MRIs) was randomly extracted from non-follow-up cases. The average tumor volume of the test set meningioma was 30.31 cm3 (minimum: 0.24 cm3, maximum: 139.87 cm3) according to the experts’ manual segmentation (Figure S1). We employed the training strategies by varying the data used for (1) pre-training and (2) fine-tuning. A five-fold CV was used for hyperparameter selection. The test set was fixed for all strategies, and we reported the performance scores based on this test set.As shown in Table 1, the 3D U-Net, trained with a meningioma dataset, achieved a higher Dice score of 0.72 (sd: 0.28) than the Dice score of 0.60 (sd: 0.32) which was achieved with the BraTS 2019 dataset. As reported by Laukamp et al. [18,19], the performance increased when the neural network was trained with the disease of interest, that is, meningioma. This implies that transfer learning [35] from one disease to another requires fine-tuning with the latter. Indeed, pre-training with BraTS 2019 followed by fine-tuning with a meningioma dataset increased the Dice score to 0.76 (sd: 0.23). It appears that pre-training not only stabilizes the training process but also contributes to learning parts that are not learned in the existing dataset.We also evaluated the use of normal brain data during training. This increased the sample size from 74 to 84. Transfer learning and the use of normal brain data increased the Dice score to 0.79 (sd: 0.23). However, the soft Dice loss function did not properly account for the contribution from normal data, where the losses remained close to 1. Our BDL could give more weight to normal data by adjusting hyperparameter β. Using a five-fold CV, the β was optimized to 100. As a result, we achieved a Dice score of 0.84 (sd: 0.15) with the test dataset. The average segmentation performance across all folds was 0.85 (sd. 0.04) (Dice scores of each fold: 0.88, 0.82, 0.86, 0.92, and 0.80), confirming the stability of the model. Although our test set was limited, its performance was very similar to the stable performance of the larger training set. Hence, it seems that there was no overfitting issue. Two representative examples of the segmentation results of the final model (transfer learning + normal + BDL) are shown in Figure 1. The Dice scores for these two subjects were 0.96 (Figure 1A) and 0.93 (Figure 1B). 4. Discussion

In this study, deep learning was used for the fully automated segmentation of supratentorial meningiomas. To overcome the fact that there was a relatively small amount of meningioma image data, transfer learning with a large number of publicly available BraTS glioma images was used to produce the initial model for meningioma segmentation. Then, MRIs consisting of both meningiomas and normal brains were included in the fine-tuning of the final model.

Typical meningiomas appear as dural-based masses that are isointense to gray matter on both T1- and T2-weighted images. To the best of our knowledge, previous studies have only focused on well-defined meningioma MRI samples for the development and evaluation of such models [18,19], while there are large variations in real-world imaging appearance [35].

To reflect such issues, we gathered sets of meningioma MRI images with diverse characteristics, including cysts, calcifications, necrosis, and heterogeneously enhancing lesions. We focused our model on learning the features of supratentorial meningioma, as infratentorial meningioma is relatively rare and intermingled with complicated neurovascular structures. As this study is the first trial to assess the utility of automatic segmentation for meningioma, we simplified our MRI dataset.

Bouget et al. investigated automated meningioma segmentation using only one imaging modality (T1) with a lightweight model [20]. However, it had a severe drawback: a dip in the Dice score (~0.5, at best) when the meningioma lesion was smaller than 1 cm3. Small tumors should not be ignored, because tumor growth rates are unpredictable. To assess the consistency in performance across the tumor size, we categorized the tumor sizes into three levels from smallest to largest, creating Dice score boxplots (Figure S2). Our model showed a modest decrease in performance for small-sized tumors (~0.7 for Category A tumors (3)).Although our model showed good performance in clinically diverse lesions, the frequencies of such appearances were limited in our dataset. Hence, the performance of the test samples fluctuated to some extent. It was especially poor in cases of meningioma with heterogeneous enhancement. As shown in Figure 2A, heterogeneous enhancement due to necrosis was observed inside the tumor, and the predictive performance of the model for this subject had a Dice score of 0.34. Heterogeneous enhancement was also observed in Figure 2B, and the predictive performance of the model for this brain was a Dice score of 0.84. To overcome this issue, brain images including heterogeneous enhancement lesions should be collected and used to train models. If the model matures enough to handle the primary task—meningioma segmentation—then we believe that the model can be further improved, which would also help explain various features of meningioma [36].

While most other studies have mainly focused on a model’s architecture to improve its performance, our proposed strategy involved transfer learning and the inclusion of normal brain MRIs. To effectively utilize normal MRIs, we successfully developed a new loss function, BDL. Notably, in our study, better performance was achieved after the inclusion of normal cases in the training set.

There are some limitations to our study. Our study only included images obtained using an MRI dataset from a single institution. Previous studies have attempted to acknowledge inter-hospital or inter-protocol variability by introducing images from multiple institutions or from multiple scanners. As our model only used data sources from a single institution, it might have lost generalization and thus requires data collection involving multiple institutions or scanners.

Realistic data curation was performed to address how deep learning can be used to expedite the meningioma segmentation process. We used nnU-Net along with the most renowned optimizers (i.e., Adam and/or LeakyRelu optimizers). However, we could use different sets of optimizers and loss function to determine the robustness of the model. There are many recent studies that suggest the possibility of improvements, such as the AdamP by Heo B. et al. [37]. Performance gain is the expected result, because several studies have explained the link between optimizer and model performance [38]. Additionally, in the future, we could determine the loss function, which is especially built for tasks such as meningioma segmentation. Class imbalance, where the lesion volume is much smaller than the whole brain volume, affects the model performance. However, we could try to use Unified Focal loss, which is able to handle class imbalance [39].

留言 (0)

沒有登入
gif