Clinically oriented automatic three-dimensional enamel segmentation via deep learning

Data acquisition and image processing

The CBCT image data was selected from the outpatient database of Peking University School and Hospital of Stomatology. The female patient aged 26 underwent CBCT scanning using a CBCT machine i-CAT (Imaging Sciences International, Hatfield, PA, USA) at 120kVp, 18.45mAs, 20-second acquisition time, and 16 × 13 cm field of view. Voxel size is 250 × 250 × 250 μm. The data were stored in the Digital Imaging and Communications in Medicine (DICOM) 3.0 format, following the guidelines for medical image data. The Hounsfield Unit (HU) values in CBCT data, representing gray scales, were normalized to a range of 0-1 for consistency and ease of analysis.

Data annotation and data augmentationData annotation

The CBCT image DICOM data sets of 30 teeth were imported into the image processing and 3D reconstruction software Dragonfly (v. 2024.1, Objects Research Systems, Montreal, Canada) for preprocessing and annotation. The process of image annotation and training neural networks can be conducted in the segmentation wizard module of the Dragonfly software. Due to issues such as poor enamel segmentation results on the occlusal surface previously, the coronal plane was chosen for image annotation and segmentation. The methodology of this study is depicted in Fig. 1.

The data processing in this study focuses on the enamel region of the teeth. We attempted to annotate and train the enamel using three different data generated from CBCT in three different directions. The data before and after annotation are shown in Fig. 1. To acquire gold-standard labels, every selected Frame was manually annotated into six different classes (enamel, dentin, cortical bone, trabecular bone, and pulp). “Frame” is defined as a single layer slice or a portion of this layer slice. Manual annotation was finished by an engineer and confirmed by an oral and maxillofacial radiologist.

Data augmentation

To expand the volume and diversity of image data, we developed a data augmentation strategy that employs targeted transformations with adversarial training instances. This method enriches the training dataset, enabling the model to generate a broader range of inputs. Data augmentation techniques were applied to all frames in the dataset, resulting in a tenfold increase in the quantity of annotated slices. The data augmentation parameters used during training in this study were set as follows: Data Augmentation times = 10, which involved applying horizontal flipping, vertical flipping, rotation up to 180 degrees, shearing up to 2 degrees, scaling between 90% and 110%, and adjusting brightness between 0.75 and 1.25. The dataset comprised 25,481,018 voxels; voxels used were 326,519(1.28%), of which 8,393 (2.57%) were marked as enamel. The total number of training patches is 5,615, and the total number of validation patches is 72. To develop and evaluate the model, we randomly partitioned our institutional dataset into training and validation subsets, with 80% of the data allocated for training and 20% for validation purposes. The validation subset was used exclusively to assess the model’s performance.

Fig. 1figure 1

Overview of the study recruitment process, gold standard generation, and network architecture of 2.5D Attention U-Net

Neural network2.5D U-Net

The U-Net neural network, a convolutional neural network, derives its name from its network architecture’s resemblance to the letter “U.” The network comprises two primary components: the encoding and decoding layers, which enable image segmentation across any size.

Given the superior performance of Attention U-Net in small sample training and the benefits of U-Net for three-dimensional model reconstruction, we employ a 2.5D Attention U-Net approach (using the current layer and adjacent layers) for data training. This method aims to progressively expand the dataset to achieve an optimal neural network model for enamel segmentation.

The dataset was divided into two subsets, with 80% allocated for training and 20% for validation, to effectively monitor model performance and reduce the risk of overfitting during the training process. Additionally, an independent validation set comprising manually annotated regions with diverse anatomical variations was utilized. This external validation step was designed to assess the model’s robustness and generalizability, ensuring that its performance was not overly optimized for the training data. The Attention U-Net model was provided by Dragonfly software, with input slices count set to 3, depth level to 4, and an initial filter count of 64. We employed Early Stopping techniques to prevent overfitting, which halted training when the Dice coefficient did not decrease over ten consecutive epochs. The resulting model, consisting of 5,528,393 parameters, was implemented within the Dragonfly on a Windows workstation, leveraging an NVIDIA Quadro RTX 5000 GPU for optimization.

Model evaluation

To quantify the 2.5D Attention U-Net network segmentation accuracy, the Dice similarity coefficient [19] was used. Surface deviations between the gold standards and the 2.5D Attention U-Net network-based 3D models were calculated to evaluate the segmentation accuracy around the edges of enamel structures.

Generation of the 3D surface model of enamel

Upon obtaining a satisfactory image segmentation model, the model will be applied to all sections to generate a comprehensive 3D segmentation model. Utilizing the distance map and watershed algorithm, teeth will be segmented from the entire region of interest (ROI) into multiple ROIs, with each tooth representing an individual ROI. Subsequently, Boolean operations will be employed to separate the enamel of each tooth into distinct ROIs, facilitating the creation of a 3D enamel model for each tooth. The thickness mesh tool will then generate thickness grids, which will undergo Laplacian smoothing with a single iteration to produce the final 3D enamel thickness model.

We computed thickness values from the mesh of a single, closed 3D shape using a directional method known as normal ray tracing. This method normally passes a ray to the mesh surface, and its intersection with the opposite surface is computed to obtain the thickness value. The Euclidean (straight line) distance between the intersections is considered the thickness at the given point. If the two surfaces containing the intersection points are not parallel, then the thickness values computed at the two opposite points will differ depending on the ray direction. This may lead to conflicting results for the pair of points.

The enamel segmentation was performed on CBCT images using the trained deep learning model, accurately delineating enamel regions as ROIs. For each tooth, the enamel volume was calculated by determining the total number of voxels within the segmented ROI and multiplying this by the voxel volume, derived from the known CBCT voxel dimensions. The specific calculation method is as follows:

The enamel volume (\(\:V\)) was calculated by multiplying the total number of voxels (\(\:N\)) within the segmented region of interest (ROI) by the volume of a single voxel (\(\:v\)), where \(\:v\) is determined by the cube of the voxel size.

留言 (0)

沒有登入
gif