Validity and accuracy of automatic cobb angle measurement on 3D spinal ultrasonographs for children with adolescent idiopathic scoliosis: SOSORT 2024 award winner

Patient population

Ethics approval was received from the University of Alberta health research ethics board to extract 144 3D spinal ultrasonographs of children with AIS from the local scoliosis clinic. The inclusion criteria were: (1) diagnosed with AIS, (2) had a major radiographic Cobb angle between 10° and 55°, (3) had no prior surgery, and (4) had an out-of-brace PA radiograph taken on the same day. A 70-image training set was used to train a machine learning model, with data augmentation of random zooming, translation, rotation, contrast adjustment, and noise addition to increase the effective size of the data set to > 100,000. The remaining 74 ultrasonographs constituted a measurement validation set to evaluate the accuracy and reliability of the final algorithm. All ultrasonographs in the measurement validation set were randomly selected. This meant that the 74 ultrasonographs were selected without looking at a subject’s ultrasonograph, PA radiograph, and Cobb angle measurements.

Each ultrasonograph was obtained using a SonixTablet integrated with a SonixGPS system (BK medical, Massachusetts, USA). The subjects were instructed to stand in a standardized posture, similar to when PA radiographs were taken. The C5-2/60 GPS curvilinear convex transducer was then moved along the surface of the subject’s back, following the spinal lateral coronal curvature, from the cervical vertebra (C7) to the lowest lumbar vertebra (L5). The system tracked the transducer's position and orientation throughout the scan. The ultrasound scanning parameters were set to a 2.5 MHz frequency, 6 cm penetration imaging depth, and 10% gain with linear time gain compensation. Approximately 700–1000 axial B-mode images were acquired per spine, along with the position and orientation data of the transducer, at a resolution of 0.2 mm per pixel [14]. Using an in-house program called Medical Imaging Analysis System (version 10.3.41.6), the axial B-mode images were stitched together using the position and orientation data to generate 3D reconstructions. The matching PA radiographs were imaged on the same day using either a conventional digital X-ray system (Philips, Canada) or the EOS system (EOS Imaging Inc., France).

Automatic measurement method

A convolutional neural network (CNN) was trained to predict the locations of the centers of laminae on an input 3D spinal ultrasonograph. The CNN is a machine learning model that was designed to deal with input image data. To train the CNN to predict the centers of laminae, a rater with over 20 years of scoliosis research experience labelled the centers of laminae in 3D on the 70-image training set. Labelling was performed on a custom user graphical interface developed in Python while looking at the subject’s matching PA radiograph to guide the center of lamina placement.

Once training was complete, the CNN could predict the 3D coordinates of the centers of laminae given an initial 3D ultrasonograph. A post-processing algorithm was developed to derive the coronal curve angle measurements from the predicted centers of laminae. This consisted of pairing up centers of laminae on the same vertebra and calculating the tilt angle for each lamina pair. The coronal curve angle was then measured by calculating the difference between the angles of the steepest opposing tilted vertebrae. While the centers of laminae were predicted in 3D coordinates, only the coordinates in the coronal plane, as defined by the global positioning system in the SonixGPS system, were used in the coronal curve angle calculation.

After the measurement was complete, the machine learning algorithm also outputted how the coronal curve angles were measured. The image of the ultrasonograph’s coronal projection was output, along with the predicted lamina pairs overlaid on top of it. The relevant vertebral tilts used for measurement are highlighted.

Measurement validation

The performance of the developed measurement algorithm was assessed by performing all pairwise comparisons between automatic ultrasound (A-US) coronal curve angle measurements, manual ultrasound (M-US) coronal curve angle measurements, and manual radiographic (M-Xray) Cobb angle measurements on the 74-image measurement validation set. The A-US versus M-Xray comparison was investigated because measuring the Cobb angle on a PA radiograph is the gold standard, and so evaluating the accuracy and reliability between the two directly informs us of the feasibility of replacing radiography with ultrasonography. The M-US versus M-Xray comparison was added as a control group and informed us whether any errors in the A-US measurements when compared with the M-Xray measurements could possibly be explained by any inherent discrepancies in imaging modalities as opposed to any faults in the automatic measurement algorithm. All A-US measurements were run on a Windows computer with an NVIDIA GeForce RTX 3060 Ti GPU and i7-12,700 Intel CPU. All M-US measurements were performed by a researcher with over 20 years of scoliosis research experience and were conducted using the aid of the previous radiograph method, which involved overlaying a subject’s previous PA radiograph on top of the coronal projection of the ultrasonograph to improve measurement accuracy [6]. All M-Xray measurements were performed by clinicians with over 20 years of experience. Both the researcher and the clinicians were blinded to the A-US, M-US, and M-Xray measurements.

For each of these comparisons, the accuracy and reliability between the methods were determined using the mean absolute error (MAE), standard deviation of absolute errors (SD), inter-method intraclass correlation coefficient (ICC2,1), and standard error of measurement (SEM). Koo’s definitions of poor (< 0.5), moderate (0.5–0.75), good (0.75–0.90), and excellent (≥ 0.90) were used to evaluate the ICC2,1 qualitatively [15]. The percentage of errors within clinical acceptance was also calculated, where clinical acceptance was defined as when the absolute difference between a pair of measurements was ≤ 5°. The threshold for clinical acceptance is based on the intra-observer and inter-observer manual Cobb angle measurement variation in both radiographs [16] and ultrasonographs [6]. Bland–Altman analysis was also performed to assess the levels of agreement between the different measurement methods [17]. Finally, the error-index was calculated to evaluate the vertebral level agreement in each comparison [18].

To identify potential systematic biases in the measurement methods, results were analyzed by curve severity and curve region. Curve severity was separated into two groups: mild (< 25°) and moderate (≥ 25°). This threshold was chosen because it roughly coincides with clinicians considering bracing as a treatment option over observation [1]. ICC2,1 values were not reported for the groups in curve severity due to the attenuation that comes with restricting the population variance on the coronal curve angle. The curve region was separated into four groups: upper thoracic (UT), main thoracic (MT), thoracolumbar (TL), and lumbar (L). These regions were defined according to the location of the curve's apex, with UT being T2–T6, MT: T7–T11, TL: T12–L1, and L: L2–L4 [19]. One-way analysis of variance (ANOVA) was conducted to evaluate whether the differences in the MAEs between groups within the same category were significantly different. Statistical analysis was conducted using the pandas [20] and SciPy [21] Python libraries with an alpha level of 0.05.

留言 (0)

沒有登入
gif