Accuracy of automated segmentation and volumetry of acute intracerebral hemorrhage following minimally invasive surgery using a patch-based convolutional neural network in a small dataset

Approval of the institutional review board was obtained and the requirement for informed consent was waived. We selected patients suffering from supratentorial ICH that were treated with MIS from a retrospective database. Inclusion criteria were age ≥ 18 years and available CT-imaging. No exclusions were made based on scanner model, settings, voxel size or presence of artefacts.

Ground truth (GT) annotation and development of the CNN were carried out using a local instance of the Nora imaging platform (https://www.nora-imaging.com). Image calculations were done using MATLAB (MATLAB R2021a, The MathWorks). Statistical evaluation of the results and plotting were done using R software version 4.2.0 [15].

Imaging datasets

Thirty nine scans from 29 patients examined between years 2011 and 2018 were randomly selected from our database. To avoid data leakage, we partitioned our data on the patient level, thus ensuring that repeat examinations of all patients were assigned to the same group. We randomly divided the data into training (n = 21 patients / 29 scans), validation (n = 3 patients / 4 scans) and testing (n = 5 patients / 6 scans). To avoid the effect of random patient selection on the results, we added not yet included consecutive patients examined between 2010—2012 to the testing dataset for a total of 59 scans belonging to 44 patients. The mean age (SD) was 70 (± 13.56) years and there were 36 male patients (52.9%).

Ground truth

Non-overlapping segmentation masks of the ICH and the intracranial part of the drain were manually delineated by a neuroradiologist, with three years of experience (A.E.). Overlapping voxels were subsequently identified and classified to their corresponding mask by applying a threshold operation with voxels ≥ 100 HU assigned to the drain mask.

CNN segmentation of ICH and drain

No preprocessing was applied to the CT data. The development of the CNN model relied on the Patchwork CNN Toolbox [12]. Here, the input for the CNN was the CT image in HU units. Instead of normalizing/cropping the image, an initial channel splitting layer was used. This channel splitting layer separates the input value range into 11 feature channels that are sensitive to a particular HU range. This method was inspired by the windowing approach that a radiologist uses when reading images by dividing the entire HU area into detachable image parts, e.g., CT windows for soft tissue or bone. The ranges are initialized with the following centers [-1000, -500, -100, -50, -25, 0, 25, 50, 100, 500, 1000], and further refined during training. Three hierarchical scales (patches) were used. The finest scale was reformatted to 1-mm isotropic voxels.

To determine the best model parameters we initially tested six different combinations on 106 image patches, experimenting with two different versions of three model parameters.

1.

Feature dimensions in each scale: [8, 16, 16, 32, 64] or [8, 16, 16, 32, 64, 64]

2.

Loss function: categorical or binary cross-entropy.

3.

Augmentation at each level of the network: rotation angle 0.2, right-left flipping and zooming 10–20% or rotation angle 0.4, flipping in all dimensions, zooming 10–20% and random uniform scaling of the voxel values in each scale.

Performance measures

We employed the Deepmind library (https://github.com/deepmind/surface-distance) to measure overlap and spatial distance metrics.

1.

Dice similarity coefficient (DSC) which measures the overlap of two sets of points

2.

Surface DSC, which measures the overlap of the surfaces of two sets of points at a specific tolerance (1 mm). The surface DSC is better suited than DSC for assessing performance in 3D segmentation tasks [16].

3.

Surface overlap measures the average overlap at a specific tolerance (1 mm) returning two values. The average overlap from the GT surface to the predicted surface and vice versa.

4.

Hausdorff distance measures the distance between two sets of points. To alleviate its sensitivity to outliers, both the Hausdorff100 and Hausdorff95 (top 95% of the distances are taken into account) were evaluated.

5.

Average surface distance, which measures the distances between the surfaces of two sets of points at a specific tolerance (1 mm) and thus returning two values. The average distance from the GT surface to the predicted surface and vice versa [16, 17].

The top-performing model on the validation dataset was trained using 1.2 × 107 patches. The model output is a 4D NIfTI object with two 3D 1-mm isotropic NIfTI volumes, indicating the probability of each voxel belonging to ICH/drain or to the background. Binary masks were produced using a threshold to optimize performance measures of the CNN. The volume of ICH was calculated by summing the 1-mm isotropic voxels of the ICH mask.

Comparison with no-new-U-Net (nnU-Net)

Isensee et al. [18] published a self-adapting semantic segmentation method that was tested on a wide variety of medical imaging datasets with good results, as well as achieving top placements in multiple segmentation challenges. Using the same data partitions, we trained and tested an nnU-Net model using our datasets.

Statistical tests

To assess the agreement between predicted and GT ICH volume we calculated the intraclass correlation coefficient (ICC) [19]. We also generated concordance plots and Bland–Altman plots [20] to visualize the agreement between the two measurements.

留言 (0)

沒有登入
gif