Development and validation of a reliable method for automated measurements of psoas muscle volume in CT scans using deep learning-based segmentation: a cross-sectional study

Introduction

Sarcopenia is characterised by age-related loss of muscle mass, strength and physical performance.1 2 Previous studies have shown a high prevalence of sarcopenia among patients with diabetes3 4 and cardiovascular disease,5 6 leading to worsening health outcomes, such as falls and fractures. Therefore, an accurate diagnosis of sarcopenia is crucial for improving health outcomes.

Clinical diagnostic methods for assessing muscle mass include bioimpedance analysis and whole-body Dual-energy X-ray Absorptiometry (DXA).7 DXA is advantageous owing to its lower radiation exposure and cost. However, this is associated with low accuracy when measuring individual muscle mass.8

Recent advances in imaging technology have enabled more precise assessment of muscle mass to improve sarcopenia diagnosis.9–12 Using existing CT images, opportunistic screening becomes feasible.13–16 The skeletal muscle index evaluated from the cross-sectional area of several muscles visible within the CT images acquired at the third and fourth lumbar vertebrae (L3 and L4) is among the most widely used indicators for the diagnosis of sarcopenia.10 17 18 In particular, the cross-sectional area of psoas muscle has been used in some studies to identify sarcopenia.19 20 However, a measurement of the total volume rather than the cross-sectional area of the psoas muscle volume leads to a more accurate diagnosis.21–23 Automated segmentation is necessary to establish a clinically relevant analysis of the psoas muscle volume based on a large population.

In addition, to ensure the accuracy and generalisability of psoas muscle volume measurements, it is essential to establish a normal distribution for the general population using a large data set. This would require a programme that can analyse a large data set to ensure reliability and validity. Therefore, in this study, we aimed to develop a fast, reliable and automated method for obtaining volume measurements of the psoas muscle from a large data set using CT images. We evaluated the method through qualitative and quantitative assessments without performing hypothesis testing.

Materials and methodsData collection

The study was conducted in accordance with the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of Asan Medical Center (AMC, Seoul, Republic of Korea) (IRB No. 2021-0634). To develop and validate our automated method for measuring psoas muscle volume using CT images, we used a data set comprising 520 participants who underwent routine health check-up at the Health Screening and Promotion Center in AMC between December 2004 and February 2021.

Before undergoing check-up procedures, all subjects consented to their data being used for research approved by the AMC IRB; the consent is not specific to the present study but is intended for various study designs. The IRB waiver was granted for using deidentified data, acknowledging the retrospective nature of the study design.

We selected a data set of N=520, with participants aged 20–92 years (mean age 55.1 ± 18.5 for men and 60.5 ± 18.9 for women), predominantly Korean (96%), alongside a minority of foreign participants. To maintain anonymity and enable precise comparisons, each data set was deidentified and assigned a random index number before analysis, allowing for tracking and comparison between automatically and manually segmented images.

The data set included abdomen and pelvis CT, CT pelvis, CT lower extremity venography and CT lower extremity angiography. These scans were used for artificial intelligence (AI) development and external validation purposes. The data set used for training our AI model for psoas muscle volume measurement consisted of 320 participants, with an approximate 2:3 male-to-female ratio (126 men, 194 women). This data set was further divided randomly into the training, validation and test sets for AI model development in the ratio 8:1:1. We used CT images from 200 participants for the external validation data set, with an approximate 1:1 men to women ratio (102 men and 98 women). Table 1 summarises the demographics of our data set.

Table 1

Subject demographics

Data augmentation for the segmentation model

To increase the amount of CT data for training our neural network model, we used a data augmentation method, considering the characteristics of medical data. This method involved applying various transformations to the original CT images, including scaling (0.7 to 1.4); rotation on X, Y and Z-axes (−30 to 30); and gamma correction (0.7 to 1.5).24 Each data augmentation method was randomly applied during training at each epoch, with scaling, rotation and gamma correction applied with probabilities of 20%, 20% and 30%, respectively.

Manual psoas muscle segmentation

The manual segmentation of the psoas muscle from CT images of all 520 participants was a collaborative effort between a skilled operator and a doctor with more than 20 years of experience. They used a digital paint brush via a computer mouse, ensuring precision and accuracy in the data set (figure 1). For each slice, the psoas muscles were labelled by manually tracing the outline of the muscle (AVIEW, Coreline Soft Co., Ltd., Seoul, Republic of Korea). The space between the labelled slices was automatically masked (by interpolation) in the software. We randomly partitioned the manually segmented data into two groups: one for AI model development labels (N=320) and the other for comparison in the external validation phase (N=200). This division guaranteed a thorough evaluation of our model’s performance, thereby eliminating any overlap between the development and validation phases, and bolstering the integrity of our evaluation.

Figure 1

Mask of the psoas muscle created manually by a skilled operator: (A) labelled boundary in axial view and (B) labelled muscle in 3D view.

AI-based automated segmentation

The nnU-Net architecture, a deep learning model that has recently shown strength in the medical imaging field,24 was used as the model implemented in AVIEW, for the automated segmentation of psoas muscle. To ensure greater efficiency of the model, a region of interest was initially set and the psoas muscle was searched only in the relevant area.

Automated psoas muscle measurement

The development of the psoas muscle segmentation model using deep learning involved a three-step process. The initial version (Version 1) used a two-step inference method with a 2D U-Net model to determine the presence of the psoas muscle in a given slice, followed by segmenting the muscle using the corresponding slice. Version 2 used a 3D V-Net model with a modified goal to locate the approximate position of the muscle (localisation) determined in first step. Subsequently, the volume based on the location found in the second step was cropped and segmented to use 3D spatial information and achieve more precise segmentation. Version 3 improved the existing model by using nnU-Net and addressing the muscle disconnection issue observed in Version 2. The segmentation performance results of the external validation data set for the three versions are summarised in table 2: it can be seen that the third version had the best performance across all considered metrics.

Table 2

Performance evaluation of AI models (average ± standard deviation)

AVIEW (Psoas muscle segmenter Version 3) implemented the nnU-Net method for automated segmentation. The software automatically calculated the volume of the psoas muscle (figure 2) after segmentation. The cross-sectional areas at the endplates of the L3 and L4 vertebrae were also calculated after a skilled operator identified the proper demarcation lines. The muscle volume and cross-sectional area (L3 and L4) were measured for external validation using a sample size of 200.

Figure 2

Mask of psoas muscle generated using automated segmentation: (A) muscle boundary in axial view and (B) segmented muscle in 3D view.

Evaluation

The results of AI-based automated segmentation were compared and evaluated with the external validation data set using a qualitative method that compares and confirms with an expert and a quantitative method that compares with indexvalues.

The difference between the measured volume (cm3) and area (mm2) is defined as follows:

$Embedded Image$

Using the Dice similarity coefficient,25 we compared the psoas muscle segmented manually with that performed automatically using the nnU-Net method. Other evaluation indicators provide additional metrics for measuring the accuracy of the segmentation results.26–29 These include intersection over union (IoU), also referred to as the Jaccard similarity coefficient, which measures the overlap between the ground truth and automatically segmented masks. Other additional metrics were Hamming distance (calculated as the proportion of non-matching voxels between the two masks), Hausdorff distance (measured as the maximum distance between any two points on the surfaces of the two masks), and the mean surface distance (calculated as the average distance between the surfaces of the two masks).

Time measurement for clinical use

We investigated the time efficiency results for use in practical and large-scale analyses (n=20). The duration of manual segmentation was gauged by the time taken to measure the area and volume after finalising the manual masks, including uploading of the images in AVIEW. Conversely, the automated segmentation timing spanned from the upload of images to the point of obtaining area and volume measurements (figure 3).

Figure 3

Comparisons of the time elapsed for manual segmentation and automated segmentation. Manual segmentation (mean 111 min 6 s ± 25 min 25 s) includes contouring and refinement, which the automated segmentation (mean 2 min 20 s ± 20 s) does all at once.

ResultsQualitative evaluation of psoas muscle measurement

The segmentation was found to be accurate and consistent (figure 4) when comparing the muscle contours in the 2-dimensional and 3 dimensional views. The 3D shapes of the psoas muscles were similar, with no significant difference in the overall shape. Additionally, the segmented muscle areas were clearly defined and accurate at the 2D slice level.

Figure 4

Overlapping segmentations generated by the skilled operator (blue) and AI (red) in (A) axial view, (B) coronal view and (C) 3D view. AI, artificial intelligence.

Quantitative evaluation of psoas muscle measurement

No significant variation was observed when repeatedly measuring the psoas muscle volume and area from the same set of CT images of the same individual. The obtained Dice score of 0.8 or higher was deemed acceptable. An average Dice score of 0.927 ± 0.019 on the external validation data set demonstrated the reliability of the automated segmentation method. The individual Dice scores demonstrated that the majority of the data sets achieved very high scores, with no critical number of outliers (figure 5). Furthermore, the reported values of IoU (0.864 ± 0.033), Hamming distance (0.0008 ± 0.0003), Hausdorff distance (21.322 ± 10.133) and mean surface distance (0.612 ± 0.209) for the total psoas muscle area indicated reliable and accurate segmentation results (figure 5).

Figure 5

Violin plots of metrics evaluating the quality of image segmentation on the external validation data set.

The volume and area differences between the two masks of the external validation set (manually and automatically segmented) were compared (figure 6). The average volume of the psoas muscle masked by a skilled operator was 349 ± 146 cm3. In contrast, the average volume of the automatically segmented psoas muscle was 340 ± 139 cm3. The mean absolute difference was 10 ± 17 cm3, indicating an error (residual) rate of 2.9%. The total cross-sectional areas (left and right) segmented manually and automatically averaged 1799 ± 734 mm2 (manual) and 1818 ± 723 mm2 (automated) for the L3; and 2284 ± 867 mm2 (manual) and 2293 ± 856 mm2 (automated) for the L4, achieving mean absolute differences of 19 ± 83 cm2 and 8 ± 77 cm2 and error rates of 1.1% and 0.4%, respectively. The total psoas muscle volume measurements and the total area measurements for the L3 and L4 cross-sections for both manual and automated segmentation are very strongly correlated with Pearson’s correlation coefficient values of 0.994, 0.994 and 0.996, respectively (figure 7). Furthermore, comparing the measurements between the left and right regions revealed high symmetry, as evidenced by Pearson’s correlation coefficient values of 0.985 for the volume, 0.973 for the L3 area and 0.977 for the L4 area (figure 8).

Figure 6

Bland-Altman plots for (A) psoas muscle volume, (B) L3 area and (C) L4 area. The difference between the volume and area measurements from the manual and automated segmentation as a function of the mean value.

Figure 7

Comparison of the measured (A) total psoas muscle volume and total psoas muscle areas at the (B) L3 and (C) L4 image cross-sections using manual segmentation and automated segmentation (blue markers). Ideal measurements are expected to lie on the diagonal (gray dashed line). The best fit line is drawn over the measurements (black solid line) with the corresponding R2 value.

Figure 8

Comparison between left and right sides of the automatically segmented measurements of the (A) psoas muscle volume and (B, C) psoas muscle areas at the L3 and L4 cross-sections. Symmetric measurements are expected to lie on the diagonal (gray dashed line). The best fit line is drawn over the measurements (black solid line) with the corresponding R2 value.

Measurement time comparison

The average time for the manual segmentation process was 111 min 6 s ± 25 min 25 s, whereas it was 2 min 20 s ± 20 s when the segmentation was performed using the automated segmentation system. Hence, the AI system was found to be 48 times faster than the manual process.

Discussion

This study confirmed the accuracy and reliability of the automated segmentation software for segmenting the psoas muscle and measuring its volume. The evalucation metrics in our study were comparable to those reported previously.30 31 Furthermore, the difference between the calculated volumes from the manually segmented and automatically segmented images suggests that the automated segmentation method is highly accurate and reliable. Complementing these findings, our analysis revealed that the total psoas muscle volume and cross-sectional area measurements are consistent with the range of previously reported measures (table 3). This consistency underscores the robustness of our approach and its applicability across different populations.

Table 3

Reference values for psoas muscle volume and cross-sectional area compare to our results

Several previous studies have demonstrated that the nnU-Net method can achieve high segmentation accuracy. The average Sorensen−Dice index was found to be 0.927, indicating clinically significant accuracy compared with other studies.32 Van Erck et al used deep learning-based software for the automated segmentation of the psoas muscle.33 However, their methods were limited because they could only measure the psoas muscle area at the L3 slice. Similarly, Islam et al used a U-Net-based fully connected neural network method for the psoas major muscle segmentation, providing results only at the slice level.34

However, to achieve a more accurate clinical diagnosis of sarcopenia, measurement of the total psoas muscle volume is preferred over that of only the cross-sectional area.21–23 In this study, we successfully developed an automated segmentation method that measures the cross-sectional area and calculates the psoas muscle volume. This provides a basis for a more accurate diagnostic tool for detecting sarcopenia than methods that only measure the cross-sectional area. Using the automated segmentation, we can deliver quick and more reliable results using psoas muscle volume within 2 min (figure 3) compared with almost 2 hours with manual segmentation.

Previous studies have reported the simultaneous segmentation of multiple muscle volumes, such as those of skeletal muscle and rectus abdominis.35–37 However, this approach is associated with low segmentation performance and long segmentation time. Although many studies have examined muscle segmentation, the focus on the measurement of the psoas muscle volume has been relatively limited. For instance, Duong et al applied a convolutional neural network method to measure the psoas major muscle volume; however, their training data set comprised only 34 CT scans,38 demonstrating limitations in the approach, with the requirement of manual measurements for accuracy. In contrast, this study employed a training data set consisting of 320 CT scans, leading to significantly higher accuracy in both qualitative and quantitative comparative analyses.

The results of this study suggest the potential of using CT, acquired for various clinical purposes, for opportunistic sarcopenia screening. Moreover, the method demonstrated in this study represents a significant advancement towards enabling sarcopenia diagnosis using psoas muscle volume measurements. The automated calculation system significantly reduces segmentation time for experts and enables non-experts to obtain quick and accurate results. This approach can be used to investigate the normal range of sarcopenia through the measurement of a large number of subjects, opening new avenues for research in this field.

The use of four different CT modalities with different slice thicknesses—abdomen and pelvis CT (2 mm), CT pelvis (2 mm or 8 mm), CT venography (2 mm) and CT angiography (8 mm)—introduced variability in the psoas muscle volume calculations. Although AI segmentation accurately handled cross-sectional images regardless of slice thickness, volume calculations relying on interpolation between slices may result in errors, especially from thicker slices. Better interpolation techniques can mitigate these issues but is beyond the scope of our work. Furthermore, manual segmentation skills may vary among people and could affect the final volume measurements. Large-scale studies will have an additional uncertainty due to human errors and would be very time-consuming. The technique presented in this paper will allow a more objective, large-scale study of large populations to be sampled based on the existing imaging data.

On the other hand, the measurement of muscle volume may be less accurate in the CT scans of patients with metal artefacts. The streak phenomenon caused by artefacts distorts the CT slices, making it difficult even for experts to identify the muscle area. Despite numerous studies in the past decades, the issue of metal artefacts in CT scans remains a significant problem.39–41 Nevertheless, to achieve clinical sarcopenia diagnosis using CT scans, it is necessary to establish psoas muscle volume data for the general population and compare them with existing sarcopenia measurement methods. The AI software described in this study is expected to make a substantial contribution towards achieving these objectives.

As an initial study, this study presented the results and direction using mostly abdomen and pelvis CT. In order to be widely applied to the reuse of existing CT data and opportunistic screening in the future, it is necessary to study various types of CT protocols and consider deep learning models. Furthermore, it would be desirable if the research on the case of metal artefact slice, which is a challenging problem that many researchers are currently attempting, is further progressed and applied. In the future, if more CT data are secured and used for learning data, the test accuracy can be improved. Therefore, it will be possible to use it for the study of the normal range distribution of the psoas muscle area and volume for accurate sarcopenia diagnosis.

View original article

BMJ OPEN

分享书签

0 0 0 0 0 0 0

More from this channel

Development and validation of a reliable method for automated measurements of psoas muscle volume in CT scans using deep learning-based segmentation: a cross-sectional study

留言 (0)