Machine and deep learning-based clinical characteristics and laboratory markers for the prediction of sarcopenia

Introduction

Sarcopenia is a progressive age-related skeletal muscle disorder involving the loss of muscle mass or strength and physiological function.[1] A multicenter cohort study showed that the prevalence of sarcopenia among older adults living in the community in China ranged from 3.3% to 17.5%.[2] With the continuing growth of the aging population in China and adverse health outcomes associated with sarcopenia, including frailty, falls, disability, and mortality, early diagnosis and intervention are needed. Although a series of promising imaging approaches are employed to measure skeletal muscle mass and strength, such as magnetic resonance imaging (MRI), computed tomography (CT), dual-energy X-ray absorptiometry scans (DXA), and bioelectrical impedance analysis (BIA), there are drawbacks and limitations. On the one hand, MRI adds to the financial burden. On the other hand, CT exposes people to radiation. In addition, although BIA is portable and affordable, the diagnostic cutoff points are restricted to population-specific and device-specific methods, which limits its accuracy.[3,4] In terms of the evaluation of physical function, grip strength and gait speed may not be available for elderly individuals with complications. Therefore, additional methods for diagnosing sarcopenia are needed. Recently, with the rapid development of smart medical care, the use of artificial intelligence technology to assist medical diagnosis has become a trend, and machine learning models for the disease have been established.[5–10] With the aging of society, the diagnosis of sarcopenia needs to be universal, and a less expensive, convenient, and fast machine learning model to diagnose sarcopenia is needed. In this study, we aimed to develop a machine learning model for sarcopenia diagnosis using clinical characteristics and laboratory indicators of aging cohorts.

Methods Ethical approval

This study was approved by the Medical Ethics Committee of West China Hospital of Sichuan University (No. 2017445). All participants provided written informed consent for study participation.

Sarcopenia assessment

We used the the Asian Working Group on Sarcopenia(AWGS) 2019 as the diagnostic criteria, which is widely used in the diagnosis of sarcopenia in Asia, as it considers loss of muscle mass and muscle strength and a decline in physical performance. According to the AWGS, appendicular muscle mass (male: <7.0 kg/m2, female: <5.7 kg/m2) for BIA is considered a loss of muscle mass. The AWGS also suggests that handgrip strength of <28 kg and <18 kg for men and women be defined as low muscle strength. The 6-m walking test <1.0 m/s and the Short Physical Performance Battery (SPPB) ≤9 are recommended for the evaluation of physical ability. Patients are diagnosed with sarcopenia when low muscle strength and poor muscle function are confirmed. Moreover, if low physical performance is also observed, sarcopenia is considered severe.

Data selection and measurement

Trained interviewers collected questionnaire data through face-to-face, one-on-one personal interviews. Trained technicians performed the anthropometric and BIA measurements. Each participant contains 109-dimensional characteristics, including basic information, basic disease, assessments of muscle strength, and serum biomarkers.

Study participants

The main data were selected from the baseline of the West China Health and Aging Trend (WCHAT) study, which was conducted from July to December 2018 and included 7536 people aged ≥50 years in Sichuan, Yunnan, Guizhou, and Xinjiang provinces. Multistage cluster sampling was applied, and the total response rate was 50.2%.[11,12] Patients with cognitive impairment, a recent history of malignancy, and missing laboratory measurements were excluded, resulting in a total of 4057 participants being enrolled in this study, 772 of whom had already been diagnosed with sarcopenia. For holdout validation, we segmented the data into training (80%) and testing (20%) datasets in a stratified manner based on diagnosis. Moreover, the external validation data from the Xiamen Aging Trend (XMAT) study, which was conducted from March to April 2022, consisted of 1874 people aged ≥50 years in the city of Xiamen. According to the same exclusion criteria used in WCHAT, 553 participants were enrolled in this study, 149 of whom had been diagnosed with sarcopenia.

Feature selection

We included samples from WCHAT and XMAT. Each sample comprises 109 biochemical indices (details can be found in Supplementary Table 1, https://links.lww.com/CM9/B466). We compared clinical features using the Mann–Whitney U test for continuous variables and the chi-squared test for classification variables with a P-value <0.05. Eventually, the 12 most relevant features were obtained by dimensional reduction via principal component analysis (PCA).

Development of a machine learning model based on Wide and Deep (W&D)

In this study, the W&D model was our foundation for building a sarcopenia model. In particular, a W&D model is a mixed model that comprises two parts: the wide model consists of a single layer, and the deep model consists of multiple layers. The wide model is usually implemented by a linear model, and the deep model is realized by a deep neural networks (DNN) so that it can benefit from both the memory of linear models and generalization of the DNNs.[13]

The input features are first fed into the wide model, which is a generalized linear model. The deep model is designed as three fully connected layers where each feature is a low-dimensional real number vector. In particular, the input layer has 12 features, the middle part has three fully connected hidden layers, and the last layer is the fusion layer. The final output log odds are obtained by combining the outputs of both the wide model and deep model, which are used for prediction and then fed into a logistic loss function. The W&D model is trained by the testing dataset for feature reorganization and validation. More specifically, the samples in the training dataset are fed into the sallow and deep recommendation models until the logistic loss function converges, i.e., the loss is greater than or equal to the preset threshold.

Performance evaluation of models

The performance of the W&D model was assessed using the area under the receiver operating characteristic curve (AUC). The accuracy (ACC), error, precision, and recall values are also reported. Figure 1 shows the workflow of this study. In the training dataset, our model was evaluated based on five-fold stratified cross-validation. Subsequently, the diagnostic performance was evaluated by testing the dataset. In addition, the external diagnostic performance was evaluated independently using an isolated XMAT dataset. Moreover, we compared the W&D model with support vector machine (SVM), random forest (RF), and eXtreme Gradient Boosting (XGB).

F1Figure 1:

Workflow chart. SVM: Support vector machine; WCHAT: West China health and aging trend; XGBoost: eXtreme gradient boosting.

Results Feature selection

The training and testing dataset from the WCHAT cohort included a total of 4057 participants with 772 patients having sarcopenia, and the external validation dataset from XMAT included a total of 553 participants, of whom 149 had sarcopenia [Table 1]. We found 12 related features based on the combinations of 109 features, including age, weight, triceps skinfold thickness (TST), calf circumference (CC), mid-upper arm circumference (MAC), the ratio of alanine aminotransferase to aspartate aminotransferase(AST/ALT), itchy skin (SBITCS), syncope (SBAPSY), have you had any illness in the last year that required care (MCANEE), did you see a doctor in the last two weeks because of illness or discomfort (MCAILL), has your physical activity decreased in the last month (LFPHY), and have you engaged in household chores of moderate activity in the past two weeks (LFIACT.) In the WCHAT cohort, the mean age of the sarcopenia group (67.89 ± 8.66 years) was higher than that of the non-sarcopenia group (61.01 ± 7.54 years), P <0.001, and the clinical indicators, including weight, TST, CC, and MAC, in the sarcopenia group were lower than those in the non-sarcopenia group, P <0.001. According to the questionnaires about daily physical activity, patients with sarcopenia were less active than the non-sarcopenia group, P <0.001. In the XMAT cohort, the mean age of the sarcopenia group (68.03 ± 7.23 years) was higher than that of the non-sarcopenia group (64.22 ± 6.91 years), P <0.001, and the clinical indicators, including weight, TST, CC, and MAC, of sarcopenia were lower than those of the non-sarcopenia group, P <0.001. The full list of 109 feature importance values is summarized in Supplementary Table 1, https://links.lww.com/CM9/B466.

Table 1 - Clinical characteristics and anthropometric measures of study participants. WCHAT XMAT Items Non-sarcopenia (n = 3285) Sarcopenia (n = 772) P-value Non-sarcopenia (n = 404) Sarcopenia (n = 149) P-value Age (years) 61.01 ± 7.54 67.89 ± 8.66 <0.001 64.22 ± 6.91 68.03 ± 7.32 <0.001 Weight (kg) 64.40 ± 10.51 51.84 ± 8.91 <0.001 60.22 ± 8.95 58.52 ± 10.71 <0.001 TST (cm) Right 25.12 ± 8.37 19.90 ± 7.68 <0.001 22.10 ± 15.59 20.05 ± 16.85 <0.001 Left 25.07 ± 8.38 19.88 ± 7.69 <0.001 22.05 ± 15.65 20.10 ± 16.89 <0.001 CC (cm) Right 35.53 ± 2.88 31.72 ± 2.75 <0.001 34.20 ± 3.09 33.35 ± 2.92 <0.001 Left 35.51 ± 2.85 31.72 ± 2.67 <0.001 34.20 ± 3.11 33.36 ± 2.91 <0.001 MAC (cm) Right 29.59 ± 3.02 25.79 ± 2.73 <0.001 27.16 ± 2.92 26.64 ± 2.72 <0.001 Left 29.61 ± 2.99 25.78 ± 2.67 <0.001 27.12 ± 2.86 26.64 ± 2.71 <0.001 AST/ALT 1.15 ± 0.39 1.44 ± 0.50 <0.001 1.71 ± 0.96 1.91 ± 0.82 <0.001 SBITCS (%) <0.001 0.029 No 86.70 82.10 89.56 82.25 Yes 13.30 17.90 10.44 17.75 SBAPSY (%) <0.001 0.019 No 96.90 94.60 90.72 96.63 Yes 3.10 5.40 9.28 3.37 MCANEE (%) <0.001 0.003 No 33.80 44.20 43.93 58.59 Yes 66.20 55.90 56.07 41.41 MCAILL (%) <0.001 0.012 No 77.84 71.95 89.81 81.88 Sick, to see a doctor 16.92 20.25 8.98 17.45 Sick, not to see a doctor 5.24 7.80 1.21 0.67 LFPHY (%) <0.001 0.001 Regular activity 78.30 72.50 89.14 82.23 Slightly reduced 15.50 16.62 10.19 11.73 Significantly reduced 6.20 10.88 0.67 6.04 LFIACT (%) <0.001 0.005 Yes 72.19 58.44 75.15 63.09 No 27.48 40.55 24.61 36.24 Refuse to answer 0.09 0.63 0.00 0.00 Do not know 0.24 0.38 0.24 0.67

Data are presented as percentage or mean ± standard deviation. AST_ALT: The ratio of Alanine aminotransferase to aspartate aminotransferase; CC: Calf circumference; LFIACT: Have you engaged in household chores of moderate activity in the past 2 weeks; LFPHY: Has your physical activity decreased in the last month; MAC: Mid-upper arm circumference; MCAILL: Did you see a doctor in the last 2 weeks because of illness or discomfort; MCANEE: Have you had any illness in the last year that required care; SBAPSY: Syncope; SBITCS: Itchy skin; TST: Triceps skinfold thickness; WCHAT: West China health and aging trend; XMAT: Xiamen aging and trend.


Implementation

The implementation methods and details of the W&D model, SVM, RF, and XGB are shown in Table 2.

Table 2 - The parameters of the comparison algorithm. Methods Parameters SVM cost = 1, gamma = 2–8, kernel = 'rbf' XGB learning_rate = 0.01, n_estimators = 5, max_depth = 3 RF n_estimators = 15, min_weight_fraction_leaf = 0.0001, max_depth = 4, min_samples_split = 10 W&D

Dense Layer(30,50, 50) activation = 'relu', Output Layer activation = 'sigmoid',

loss = "binary_crossentropy", optimizer = "adam"

RF: Random forest; SVM: Support vector machine; W&D: Wide and deep; XGB: eXtreme Gradient Boosting.


Performance of different models and cross-validation on the training dataset WCHAT

We investigated the cross-validation performance using the metrics of AUC, ACC, error, precision, and recall. Table 3 summarizes the comparison of the training dataset and cross-validation results. Among the four models, W&D provided the highest values of AUC and ACC (AUC = 0.916 ± 0.006, ACC = 0.882 ± 0.006), followed by SVM (AUC = 0.907 ± 0.004, ACC =0.877 ± 0.006), XGB (AUC = 0.877 ± 0.005, ACC =0.868 ± 0.005), and RF (AUC = 0.843 ± 0.031, ACC =0.836 ± 0.024) in the training dataset [Figure 2]. The AUC and Decision Curve Analysis (DCA) details of five-fold stratified cross-validation with W&D are shown in Figure 3. W&D had the best performance on AUC and precision/recall curve [Figures 2 and 4].

Table 3 - Comparison of prediction performance among prediction model in training dataset. Items Dataset AUC ACC Erro Precision Recall SVM Train 0.907 ± 0.004 0.877 ± 0.006 0.123 ± 0.006 0.888 ± 0.009 0.406 ± 0.031 RF Train 0.843 ± 0.031 0.836 ± 0.024 0.164 ± 0.024 0.810 ± 0.083 0.187 ± 0.165 XGB Train 0.877 ± 0.005 0.868 ± 0.005 0.132 ± 0.005 0.730 ± 0.009 0.484 ± 0.044 W&D Train 0.916 ± 0.006 0.882 ± 0.006 0.118 ± 0.006 0.749 ± 0.026 0.577 ± 0.054 ACC: Accuracy; AUC: Area under the receiver operating characteristic curve; FN: False negative; FP: False positive; RF: Random forest; SVM: Support vector machine; TN: True negative; TP: True positive; W&D: Wide and deep; XGB: eXtreme Gradient Boosting. ACC is computed based on the total number of correct predictions defined as:

TP+TNTP+FN+TN+FP

; Precision is the ratio of correctly predicted positive observations to total predicted positive observations, defined as:

TPTP+FP

.
F2Figure 2:

The precision/recall curve for the training and testing datasets for the SVM (A and B), RF (C and D), XGB (E and F), and W&D models (G and H). RF: Random forest; SVM: Support vector machine; W&D: Wide and deep; XGB: eXtreme Gradient Boosting.

F3Figure 3:

The five-fold stratified cross-validation of W&D in the training dataset of WCHAT (A–E). WCHAT: West China health and aging trend; W&D: Wide and deep.

F4Figure 4:

The ROC curves of the SVM, RF, XGB, and W&D models in WCHAT and XMAT. (A) The ROC curves of SVM, RF, XGB, and W&D in the training dataset of the WCHAT cohort; (B) the ROC curves of SVM, RF, XGB, and W&D in the testing dataset of the WCHAT cohort; (C) the ROC curves of SVM, RF, XGB, and W&D in the validation dataset of the XMAT cohort. AUC: RF: Random forest; ROC curve: Receiver operating characteristic curve; SVM: Support vector machine; W&D: Wide and deep; WCHAT: West China health and aging trend; XGB: eXtreme Gradient Boosting; XMAT: Xiamen aging and trend.

Performance of models on testing dataset WCHAT and external validation dataset XMAT

In the testing dataset, the diagnostic efficiency of the different models from large to small was W&D (AUC =0.881, ACC = 0.862), XGB (AUC = 0.858, ACC =0.861), RF (AUC = 0.843, ACC = 0.836) and SVM (AUC = 0.829, ACC = 0.857). Using the isolated external validation dataset, W&D (AUC = 0.970, ACC = 0.911) showed the best performance among the four models, followed by RF (AUC = 0.830, ACC = 0.769), SVM (AUC = 0.766, ACC = 0.738), and XGB (AUC = 0.722, ACC = 0.749) [Table 4 and Supplementary Table 2, https://links.lww.com/CM9/B549]. Figure 4 shows the receiver operating characteristic curve (ROC curves) for model comparison.

Table 4 - Comparison of average prediction performance among prediction model in testing and validation dataset. Items Dataset AUC TP TN FP FN ACC Error Precision Recall SVM Test 0.829 50 645 11 105 0.857 0.143 0.819 0.323 Validation 0.766 1 407 0 145 0.738 0.262 1.000 0.007 RF Test 0.843 29 650 7 126 0.836 0.164 0.810 0.817 Validation 0.830 20 406 1 126 0.769 0.231 0.941 0.136 XGB Test 0.858 70 628 29 84 0.861 0.139 0.714 0.456 Validation 0.722 22 392 15 124 0.749 0.251 0.595 0.151 W&D Test 0.881 78 622 36 76 0.862 0.138 0.696 0.507 Validation 0.970 103 400 7 43 0.911 0.089 0.944 0.708

ACC: Accuracy; AUC: Area under the receiver operating curve; FN: False negative; FP: False positive; RF: Random forest; SVM: Support vector machine; TN: True negative: TP: True positive; W&D: Wide and deep; XGB: eXtreme Gradient Boosting.


Discussion

In this study, sarcopenia was predicted by using machine and deep learning techniques such as SVM, RF, XGB, and W&D. Compared with the AUC and ACC of different models, the W&D module was superior. The W&D model had the highest AUC (0.916, 0.881, and 0.970) and ACC (0.882, 0.862, and 0.911) in the training, testing, and external validation datasets, respectively. Features were eventually enrolled in the module, including one serum biomarker, anthropometric measures, and case-finding questionnaires evaluating physical activity. These features are economically accessible and easily available. The procedure could be used for sarcopenia screening in primary hospitals.

Sarcopenia is an age-related skeletal muscle disorder involving the loss of muscle mass, strength, and physiological function. Age is the most relevant feature of sarcopenia. Furthermore, with the aging process, skeletal muscle deteriorates quantitatively and qualitatively.[14] Therefore, muscle mass is an essential parameter in the diagnosis of sarcopenia, including weight, MAC, CC, and TST, all of which were measured by standardized measurement. In the last century, CC was first used to estimate the loss of skeletal muscle mass; now, CC is widely used in the diagnosis of sarcopenia.[15] A study comparing four standard screening instruments, including CC, handgrip, six-meter walking speed, and questionnaires, showed that CC was an accurate and inexp

留言 (0)

沒有登入
gif