Deep-Learning-Based Radiomics to Predict Surgical Risk Factors for Lumbar Disc Herniation in Young Patients: A Multicenter Study

Introduction

Low back pain (LBP) is one of the most disabling musculoskeletal conditions globally.1,2 Various factors contribute to LBP, with lumbar disc herniation (LDH) being the most common.3 LDH can compress the spinal nerve roots, resulting in radiculopathy, which is often accompanied by severe lower limb pain, numbness, and diminished muscle strength.4 Although most patients with LDH experience symptomatic relief after conservative treatment, some experience progressive neurological dysfunction, necessitating early surgical intervention to reduce short-term symptoms and improve long-term outcomes. LDH often affects working adults aged 20–40, and the burden on families and society caused by prolonged activity restrictions or disabilities is a significant issue.5,6 The assessment of medical history and clinical signs for surgical risk, along with the interpretation of lumbar MRI findings, is a complex and labor-intensive process. With the increasing number of young patients with LDH, there is an urgent need to devise methods for the swift and accurate identification of patients suitable for surgical treatment.

LDH diagnosis requires a comprehensive assessment of the patient’s medical history, clinical signs, and imaging findings. Lumbar MRI plays a crucial role in evaluating the location, size, direction, and free nucleus of the disc, and provides essential information for surgical decision making. The Philips mDixon sequence is a rapid and noninvasive technique that can generate four images in a one-stop scan (water, fat, in-phase, and out-of-phase images), significantly reducing scanning time and simultaneously improving image quality through water–fat separation and B0 correction, and has been widely applied in clinical practice.7

Radiomics extracts numerous quantitative features from regions of interest (ROIs) within medical images and correlates them with clinical conclusions using machine learning (ML). Radiomics utilizes not only the shape, intensity (first-order), and texture (second-order) features of the original images, but also more advanced and abstract features such as wavelets.8,9 Radiomics is widely employed in spinal disease diagnosis, including in the differential diagnosis of metastatic tumors and osteoporosis.10

Deep learning, which is a subset of artificial intelligence (AI) algorithms, offers techniques for extracting features from images to detect and classify objects. Neural networks are particularly well suited for medical imaging problems.11 Moreover, ChatGPT demonstrates the potential to supplement and enhance neurosurgical practices.12 In a deep-learning system, vertebrae and intervertebral discs can be detected by training a neural network, including feature extraction and image segmentation. The deep-learning radiomics nomogram (DLRN) is a graphical representation of a model that integrates radiomic and clinical features, enhancing the predictive efficiency for disease diagnosis. It can also incorporate deep-learning features autonomously learned from a convolutional neural network (CNN), thereby further improving the efficacy of the model.13 Zhao et al7 developed a deep-learning radiomics (DLR) model based on 222 radiomics features extracted from lumbar MR mDixon images for predicting vertebral osteoporosis; their results demonstrated the DLR model had excellent efficacy. Zhang et al14 also reported on the efficacy of a model that combines CT-derived DLR features with traditional radiomics features in distinguishing acute and chronic vertebral compression fractures. The results showed that the DLRN model significantly outperformed traditional radiomics models.

The evaluation process for the indications for LDH surgery is complex. To date, the application of radiomics and deep-learning fusion technology to assess the surgical risk factors for LDH in young patients has not been explored. Therefore, in this study, the extraction and fusion of radiomics and deep-learning features from sagittal and axial lumbar MR mDxion sequences was combined with clinical features to develop a DLRN, followed by external validation to swiftly and precisely identify young adult patients suitable for surgical treatment.

Material and Methods Patients

A retrospective cohort of 1066 patients aged 16–44 with LBP, treated at Shengjing Hospital of China Medical University from January 1, 2022, to January 1, 2024, was used as the training cohort, including 404 patients diagnosed with LDH who underwent surgery. Additionally, 191 patients with LBP treated at the China Medical University Shenyang Fourth People’s Hospital were selected as the test cohort, with 47 patients diagnosed with LDH who underwent surgery. The inclusion criteria were as follows: (1) age 16–44 years and (2) availability of complete clinical data or ability to supplement by telephone follow-up. The exclusion criteria were (1) blurred lumbar MR (n = 28); (2) congenital diseases or spinal deformities (n = 22); (3) history of spinal fractures, tuberculosis, or metastatic spinal tumors (n = 21); and (4) previous spinal surgery (n = 9). The workflow of this study is illustrated in Figure 1. This study was approved by the Shengjing Hospital Ethics Committee (approval number KYCS2024040), and a waiver of informed consent was granted given the retrospective nature of the study and the minimal risk involved.

Figure 1 (A) Flowchart of this multicenter study and (B) workflow of deep learning radiomic nomogram (DLRN) modeling; MR: magnetic resonance.

Dataset Collection

The clinical data of surgical inpatients and outpatients were extracted from the Neusoft Hospital Information Management System (HIS, version 5.0), and the integrity of the clinical information was ensured through telephone follow-up. Image data were extracted using a picture archiving and communication system (PACS, version 5.5.0.20072). The two hospitals used the same Ingenia 3.0-T MR system (Philips Healthcare, Netherlands) with a spine coil. The MRI sequences are listed in Table 1.

Table 1 Parameters of MRI mDxion Sequences Corresponding to the Patient

ROI Segmentation

The lumbar MR images were imported into ITK-SNAP software (version 3.8.0, www.itksnap.org). Two senior radiologists and one experienced spine surgeon (with over 10 years of experience) separately outlined the intervertebral disc of the surgical segment on the sagittal plane, and the axial lumbar spinal canal included the intervertebral disc or nucleus pulposus tissue and dura mater. The non-surgical group was selected on the basis of the most evident LDH levels. Disagreements, if any, were resolved through discussion, and intra-and interobserver consistency was assessed using intraclass correlation coefficients (ICCs). Features with an ICC value greater than 0.75 were considered to have good consistency.

Radiomics and Deep-Learning Feature Extraction

The feature extraction in this study encompassed traditional handcrafted features, including geometric shape, intensity, and texture, along with deep-learning features that were transferred from the ImageNet database using CNNs.

All handcrafted features were extracted using Pyradiomics (http://pyradiomics.readthedocs.io) to implement an internal feature analysis program. The software processes both the original images and ROIs, categorizing the features into three groups: (I) geometric shape, (II) intensity, and (III) texture. Geometric shape features characterize the three-dimensional form of the ROI. The intensity features depict the primary statistical distribution of voxel intensities within the ROI. Texture features illustrate the spatial distribution of patterns or intensities in the second and higher orders. Various techniques have been utilized to extract texture features, including the gray-level co-occurrence matrix (GLCM), gray-level difference method (GLDM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM).

We used Resnext50_ 32 × 4d as the CNN framework to extract the deep-learning features. Initially, the model was pretrained on the ImageNet database. Subsequently, we extracted the ROIs, fed them into Resnext50_32×4d, and employed the model’s penultimate average pooling layer for transfer learning, thereby extracting features.

Feature Selection and Fusion

We conducted the Mann–Whitney U-test and feature screening for all features, with P < 0.05. For features with high repeatability, Spearman correlation coefficient was also used, and one of the features with a correlation coefficient greater than 0.9 was retained. To retain the ability to depict features as much as possible, we used a greedy recursive deletion strategy for feature filtering; that is, the feature with the greatest redundancy in the current set was deleted each time.

The least absolute shrinkage and selection operator (LASSO) regression model was used for signature construction. Depending on the regulation weight λ, LASSO shrinks all regression coefficients towards zero and sets the coefficients of many irrelevant features exactly to zero. To find an optimal λ value, 10-fold cross validation with minimum criteria was employed, where the final value of λ yielded the minimum cross-validation error. The retained features with nonzero coefficients were used for regression model fitting and were combined into a radiomics signature. Subsequently, we obtained a radiomics score (rad-score) for each patient using a linear combination of the retained features weighted by their model coefficients. The Python scikit-learn package was used for the LASSO regression modeling.

Owing to the characteristics of the deep-learning dimension of 2048, we adopted principal component analysis (PCA) to balance and reduce its dimensions to 32, simultaneously improving the model’s generalization and reducing the risk of overfitting. Subsequently, the radiomics features extracted from the ROI and deep-learning features selected by the model were initially fused. The Z-score method was used to standardize and calculate the mean and variance of each feature. Each feature column was transformed into a standard normal distribution by subtracting the mean, dividing it by the variance, and constructing the DLR features. Subsequently, feature selection was performed using the same process as for the radiomics features to achieve the optimal subset of fusion features.

The screening of clinical features began with baseline statistical analysis, employing univariate and multivariate logistic regression to identify significantly different variables and extracting features with P < 0.05. The feature selection process mirrored that of radiomics. To select the optimal clinical features, a receiver operating characteristic (ROC) curve was drawn and the top five features with the best area under the curve (AUC) values were included in the final model.

Model Construction and Validation

After feature screening and fusion, we obtained the Clinic, Rad_SAG, Rad_AXI, DL_SAG, DL_AXI, and DLR features, a total of 6 sets of features. These features were input into machine-learning models such as logistic regression (LR), support vector machine (SVM), k-nearest neighbor (KNN), decision trees, random forest (RF), extremely randomized trees, eXtreme gradient boosting (XGBoost), multi-layer perceptron (MLP), and light gradient boosting machine (LGBM), to construct the risk model. We compared the performance of the different models, and to prevent overfitting, a five-fold cross test was used to obtain the final DLR model. According to the performance of the model, the optimal rad-score was selected and fused with clinical features to construct a nomogram. The efficacy of the model was assessed by plotting the ROC curves and calculating the AUC, precision, sensitivity, and specificity. The DeLong test was employed to compare the performance of different models. Decision curve analysis (DCA) was used to evaluate clinical utility.

Statistical Analysis

Statistical analyses were conducted using SPSS (version 26.0; SPSS Inc., Armonk, NY, USA), R language (version 4.2.0), and GraphPad Prism V.8.2 (GraphPad Software Inc., San Diego, CA, USA). The Mann–Whitney U-test and Kruskal–Wallis test were used to compare non-normally distributed data, whereas the T-test or Fisher’s exact test was used for normally distributed data. The X2 test was employed for comparing counting data. Univariate and multivariate logistic regression analyses were conducted to determine the clinical parameters. ROC curves were generated to compare the predictive performance (including sensitivity and specificity), with the Yoden index set as the highest performance threshold. A nomogram combining radiomics and clinical features was developed. The ROC curves were plotted, calibration curves were used to evaluate calibration efficiency, and Hosmer–Lemeshow analytical fit was applied to assess its calibration capability. Statistical significance was defined as P < 0.05.

Results Clinical Baseline Characteristics

The data of 1257 patients were included in this study; 1066 patient datasets were used for internal training, and 191 patient datasets were used for external testing. There was no significant difference between the two groups in gender, age, body mass index (BMI), occupation, anxiety score and LBP (all P > 0.05). Significant differences were found in visual analogue scale (VAS), Oswestry Disability Index (ODI), lower limb muscle strength, reflex, straight-leg raising test (SLRT), Michigan State University (MSU) classification of intervertebral disc, Pfirrmann grade, and fat infiltration of multifidus muscle (MMFI) (all P < 0.05); see Table 2. After univariate and multivariate logistic regression analyses were performed, some factors that may cause bias due to the small sample size, such as diabetes, hypertension, adverse drug use, alcohol abuse, and education, were excluded. The remaining factors are drawn to ROC, and the top five features with the highest AUC values (ODI, Pfirrmann grade, SLRT, MMFI, and MSU classification) were included in the final clinical model (Table 3 and Figure 2).

Table 2 Clinical Characteristics and Lumbar MR Imaging Parameters (Baseline Analysis)

Table 3 Univariate and Multivariate Analysis of Clinical Features and Lumbar MR Imaging Parameters

Figure 2 ROC curves for the clinical characteristics. The final selected 5 features had the following AUC values: ModA was ODI (AUC=0.628), ModB was SLRT (AUC=0.593), ModC was Pfirrmann grading (AUC=0.61), ModD was MSU classification (AUC=0.894), and ModE was MMFI (AUC=0.623).

Features Selection

A total of 107 radiomics features were initially extracted from the T2W_mDixon_TSE sagittal images. After screening by LASSO and 10-fold cross testing, the penalty coefficient (λ = 0.0015) was determined, and 35 features were obtained. Similarly, 95 radiomics features were initially extracted from the T2W_mDixon_TSE axial images. Using the same method, the penalty coefficient was determined (λ = 0.0063), and 18 features were obtained.

By using the Resnext50_32x4d model architecture, the network was initially pretrained on the ImageNet database and then fed with the maximum-level ROIs, resulting in deep-learning features with 2040 dimensions. After PCA reduced the dimensions to 32, using LASSO and 10-fold cross-validation testing, the sagittal penalty coefficient (λ = 0.0020) and axial penalty coefficient (λ = 0.0036) were determined, and finally 31 sagittal and 30 axial deep-learning features were extracted (Figure 3 and Supplementary Figure 1).

Figure 3 (A) Coefficient distribution and (B) mean standard error of 10-fold cross validation in the LASSO feature selection, also showing the optimal penalty coefficient λ of 0.0041 for deep-learning radiomics (DLR).

DLR was obtained by fusion of these features after LASSO and a 10-fold cross-validation test, with a penalty coefficient λ of 0.0041, yielding 114 feature parameters (Figure 4 and Supplementary Figure 2).

Figure 4 Histogram of the deep-learning radiomics (DLR) feature importance score.

Predictive Performance of the Models

In this study, multiple features were incorporated into various machine-learning models. External testing results show that, in Rad_SAG, the three models with best performance were LR (AUC = 0.879), MLP (AUC = 0.874), and SVM (AUC = 0.863), and the three models with the best performance in Rad_AXI were SVM (AUC = 0.85), XGBoost (AUC = 0.842) and MLP (AUC = 0.833). SVM (AUC = 0.818), LR (AUC = 0.8), MLP (AUC = 0.792) were the three models with the best performance in DL_SAG, and MLP (AUC = 0.856), LR (AUC = 0.831) and SVM (AUC = 0.811) were the three models with the best performance in DL_AXI. The three models with the best performance in DLR were SVM (AUC = 0.939), LR (AUC = 0.932) and MLP (AUC = 0.918). The three best-performing clinical models were LightBGM (AUC = 0.922), XGBoost (AUC = 0.914), and SVM (AUC = 0.904); see Supplementary Figure 3 and Supplementary Tables 1–7. According to these results, no algorithm performed the best among all the models; therefore, the top three results were counted and voted according to the number of times. Finally, we chose SVM as the framework for model construction. The efficiency of the nomogram in the test cohort was 0.941 (95% CI 0.894–0.989), which was good and avoided the bias caused by the confusing algorithm (Figures 5 and 6 and Table 4).

Table 4 Performance of Each Model in the Training and Test Cohorts Under the SVM Algorithm

Figure 5 Visualization of deep-learning models: (A) DL_SAG and (D) DL_AXI showed the local entropy image; (B) DL_SAG and (E) DL_AXI showed the ROI of lumbar MR, (B) was the intervertebral disc, (E) including the intervertebral disc protruding into the spinal canal and the dura in the spinal canal; (C) DL_SAG and (F) DL_AXI were the results of the cluster showing that the model had a good fusion effect.

Figure 6 Comparison of AUC values for each group of features under the SVM algorithm in the (A) training and (B) test cohort; in both cohorts, the AUC of DLR was higher than any other radiomics feature models, and DLRN was the optimal model.

The SVM was judged the optimal model for radiomics. Among all the models, DLR had the best performance (Figure 7 and Supplementary Figure 3), with an AUC of 0.991 (95% CI 0.985–0.997) in the training cohort and 0.939 (95% CI 0.891–0.987) in the test cohort. The nomogram constructed by the fusion of clinical features had an AUC of 0.994 (95% CI 0.989–0.999) in the training cohort and 0.941 (95% CI 0.894–0.989) in the test cohort. Analysis using the DeLong test found statistical differences between the radiomics, DL, and DLR models vs nomogram in the training and test cohort (P < 0.05). In the training cohort, the difference between the clinical model and the nomogram was statistically significant (P < 0.05), but not in the test cohort (P = 0.338) (Figure 8). The calibration curves showed close agreement between the predicted and observed values in the training and test cohorts, and the Hosmer–Lemeshow test (P > 0.05) showed good adaptability (Figure 9). DCA showed that the nomogram had significant clinical benefits in predicting probabilities over a wide range, and its efficacy was higher than that of the other models (Figure 10).

Figure 7 ROC curves of DLR features for different deep-learning algorithms in the test cohort; the results showed the SVM was the optimal algorithm.

Figure 8 Delong test for the (A) training and (B) testing cohorts; (A) revealed that the nomogram outperformed models constructed with other features (P < 0.05), and (B)indicated that the nomogram surpassed models constructed with radiomics features, deep learning, or DLR (P < 0.05), yet its performance enhancement was not notably distinct when compared to clinical models (P = 0.338).

Figure 9 Calibration curves of different models in the (A) training and (B) test cohorts; the curves demonstrated close agreement between model predictions and actual observations. Nomogram exhibited the best performance, with P > 0.05 obtained from the Hosmer–Lemeshow test. The horizontal axis represents the predicted probability, and the vertical axis represents the actual probability. The diagonal dotted line in the graph signifies perfect alignment between predicted and actual probabilities under ideal conditions.

Figure 10 DCA in the (A) training and (B) test cohorts; the X-axis represents threshold probability, and the Y-axis represents net benefit. The black line represents all positive assumptions, and the dashed line represents negative assumptions. The results indicate that the nomogram achieved significant clinical benefit in both the training and testing cohorts; furthermore, when compared to clinical features or DLR, the threshold probability range of the nomogram surpassed that of other features.

Nomogram Construction

The five selected clinical features (ODI, Pfirrmann grade, SLRT, MMFI, and MSU classification) and DLR were combined to construct a DLRN, which more intuitively demonstrated the efficacy of the model and was conducive to individualized risk prediction and clinical promotion (Figure 11).

Figure 11 Constructing the deep learning radiomics nomogram: In the nomogram, a vertical line was drawn on the point axis to obtain the individual points corresponding to each model under different values. The scores of all features were added together to obtain the total points of the patient, and then a vertical line was drawn downward at the position of the total score to finally obtain the prediction probability.

Discussion

The aim of this study was the innovative use of lumbar MR T2W_mDxion sagittal and axial images with separately handcrafted ROIs to extract traditional radiomic features. The Resnext50_32x4d model architecture was pre-trained in the ImageNet database and used to extract deep-learning features, after which LASSO and 10-fold cross tests were used for feature screening, multiple machine-learning algorithms were used for modeling, and external tests were conducted simultaneously to verify the model efficiency and build the DLRN. In this study, nomogram showed excellent predictive performance and good calibration of surgical risk factors for LDH in young adults, with an AUC of 0.994 (95% CI 0.989–0.999) in the training cohort and 0.941 (95% CI 0.894–0.989) in the test cohort; DCA showed better clinical benefits.

Currently, the use of artificial intelligence (AI) in lumbar disease diagnosis mainly includes automatic positioning and sizing of the disc and positioning and measurement of the spinal canal. Zheng et al15 used BianqueNet to automatically segment the vertebra and disc and predict the degree of disc degeneration with an accuracy of 92%. Zhou et al16 applied a CNN and used a transfer-learning method to automatically locate lumbar vertebrae from L1 to S1. Its advantage was that it did not use labeled MR images for training, and the algorithm achieved 98.6% accuracy. Hallinan et al17 developed a deep-learning model using Resnet101, and found that it was reliable for diagnosing central canal stenosis (AUC = 0.82), lateral recess stenosis (AUC = 0.72), and foraminal stenosis (AUC = 0.75). Won et al18 reported no significant difference between the grading of spinal stenosis using a deep-learning model and that by spinal surgeons. However, these studies mainly focused on imaging, lacked clinical signs, and could not predict the risk factors for lumbar diseases.

Studies have also shown that AI can be helpful in predicting LDH prognosis. Harada et al19 developed a risk prediction model for preventing postoperative lumbar disc recurrence by using XGBoost, and finally included seven factors such as ODI, symptom duration, and BMI (AUC = 0.72); they found that young patients with LDH have a higher risk of postoperative recurrence, which is similar to the findings of Abdu et al, who published a study on the effect of surgery on recurrent lumbar discs.20 Saravi et al4 extracted the preoperative radiomics features of lumbar MR and used SVM, XGBoost, and random trees combined with clinical features to predict the postoperative effect of LDH; the results showed that the accuracy was higher than that of the clinical prediction model alone. Nevertheless, these studies mainly focused on clinical signs. Here, we used radiomics-DL fusion feature modeling and included the patients’ clinical characteristics to predict the surgical risk of young patients with LDH. We found that DLR had better prediction performance than the above single models, and DLRN combined with clinical characteristics was more convincing in predicting efficiency.

In this study, after a voting process was conducted, SVM was ultimately chosen to build the model, and a nomogram with an AUC of 0.941 in the test cohort showed good performance. Gaonkar et al21 also used an SVM to segment the lumbar canal area on axial lumbar MRI images, and the Deep-U-Net model was used to segment the disc in the sagittal plane, which can assist in the diagnosis of spinal stenosis. Hashia et al22 reported a texture feature (GLRLM) based on sagittal lumbar MR and modeling by SVM to distinguish disc herniation, with an accuracy of 0.833. In summary, SVM has been widely used in the diagnosis of spine diseases and has achieved good performance. We used ResNext50_32x4d to pretrain the model in the ImageNet database and extract DL features. ImageNet, the world’s largest image database with excellent generalizability, has been used extensively across clinical domains. For instance, Zhang et al23 employed three DL models (ResNet50, ResNet101, and DenseNet121), pre-training on ImageNet to develop a model for predicting sacroiliac arthritis. The test cohort achieved an AUC of 0.91, demonstrating superior performance. ResNext merges the Inception philosophy with ResNet, inheriting the strengths of both. It innovatively introduces the cardinality dimension, surpassing the classification accuracy of ResNet, and addresses the complexity inherent in Inception’s structural design. It enhances accuracy while maintaining model complexity, requires fewer hyperparameters for migration, and exhibits strong scalability.24

Our results showed that clinical factors including ODI, Pfirrmann grade, MSU classification, SLRT, and MMFI objectively reflect the LDH severity. ODI can comprehensively assess the patient’s pain, life self-care ability, and other factors; and VAS is less effective than ODI.25 The Pfirrmann grade is widely used to clinically assess disc degeneration. This system categorizes the disc degeneration process into five levels, with Grades I–III indicating mild degeneration and normal disc height and Grades IV and V indicating severe degeneration and collapse of the disc height. Diverse research has employed radiomics or DL to automate the identification of disc degeneration, emphasizing the critical role of the Pfirrmann grade in disc disorders; the MSU classification of discs on axial MRI images is regarded a crucial indicator for surgical indications. This method divides disc herniation (degree and size) into categories 1-2-3, and the protrusion site (location) into A-B-C regions. Surgical validation showed 98% reliability with postoperative ODI improvement, indicating the widespread clinical applicability of this method.26 Zhang et al27 categorized the MSU classification into grades 0, I, II, and III by employing faster R-CNN localization to identify the effective LDH area. Subsequently, a prediction model was developed using ResNext101, which demonstrated effective performance. The straight-leg raise and strengthening tests, particularly the contralateral versions, are critical indicators of LDH severity. For patients with progressive neurological dysfunction, early surgery is advised,28 with greater functional recovery than that in the non-surgical group; six months may serve as a threshold.29,30 The paraspinal muscles play a key role in maintaining the stability and dynamic regulation of the spine, with the multifidus muscle located on the innermost side having a prominent effect.31 Current research indicates that MMFI is associated with LBP in young patients.32,33 Additionally, studies have demonstrated a correlation between the MMFI and lumbar disc degeneration (LDD). Faur et al34 assessed the degree of LDD at L4/5 and L5/S1 by analyzing lumbar MRI from 35 patients with CLBP and found a strong correlation between LDD and MMFI. In this study, we qualitatively analyzed MMFI using the Goutallier grading system, a widely accepted standard for evaluating fat infiltration in paraspinal muscles. The criteria were as follows: 0 points for almost no fat in the muscle, 1 point for fat stripes, 2 points for less fat than muscle, 3 points for roughly equal amounts of muscle and fat, and 4 points for more fat than muscle tissue.35,36 We further classified 0–2 points as mild and 3–4 points as severe. The results confirmed that the MMFI grade in patients undergoing surgery for LDH was higher than that in the non-surgery group, which also indicated that the MMFI was correlated with the severity of LDH. In summary, we included ODI, which represents patients’ subjective feelings; SLRT, which represents patients’ clinical signs; lumbar MR parameters such as Pfirrmann grade, MSU classification, and MMFI; and combined DLR to construct a nomogram.

MR mDixon is a technique that utilizes the chemical shift effect to perform water-fat separation through in-phase and anti-phase imaging. This enables quantitative fat analysis, which has demonstrated exceptional efficacy.37 Currently, the use of mDixon is expanding in the assessment of lumbar paraspinal muscle fat infiltration and sarcopenia. Khil et al35 reported the use of mDixon to evaluate lumbar and dorsal muscle fat infiltration in asymptomatic volunteers and yielded promising results. Chen et al38 reported that fat infiltration in the multifidus muscle increases with age. mDixonTSE enabled T1-, T2-, and PD-weighted imaging, offering precise water-fat separation technology that was particularly effective in situations involving difficult field homogenization, irregular anatomical structures, or metal implants. Additionally, it produces superior fat suppression images, especially suited for intricately structured areas such as the spine, thereby facilitating the provision of highly accurate imaging parameters that are helpful in model feature extraction and the improvement of model efficiency.

Limitations: 1. Despite being a multicenter trial, the target population exhibited regional characteristics and bias, necessitating future testing with large samples from multiple centers and ethnicities. 2. The focus of this study was young adults; conversely, elderly patients primarily suffer from lumbar spinal stenosis, which has a more complex etiology than LDH and could be a focus of future research. 3. Factors such as smoking, basic diseases, adverse medication history, income, and education level did not contribute to the final clinical features, owing to the limited representation of young patients in this study. Prospective multicenter studies are recommended to enhance the efficacy of the model.

Conclusion

The results of this study confirm that the DLR prediction model surpasses standalone clinical models in assessing surgical indications in young patients with LDH. Additionally, a comprehensive nomogram was developed to aid clinicians in better managing surgical indications, thereby guiding patients towards early treatment, symptom relief, and improved long-term prognosis.

Data Sharing Statement

The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

Ethics Approval and Consent to Participate

This study was approved by the Shengjing Hospital Ethics Committee (approval number KYCS2024040), in conformity to the Declaration of Helsinki, and a waiver of informed consent was granted given the retrospective nature of the study and the minimal risk involved.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This work was supported by grants from the Liaoning Province Key Research and Development Project (JH2/202, 1686036606770), 345 Talent Project and Outstanding Scientific Fund of Shengjing Hospital.

Disclosure

The authors report no conflicts of interest in this work.

References

1. Buchbinder R, van Tulder M, Öberg B, et al. Low back pain: a call for action. Lancet. 2018;391(10137):2384–2388. doi:10.1016/S0140-6736(18)30488-4

2. Angarita-Fonseca A, Pagé MG, Meloto CB, et al. The Canadian version of the National Institutes of Health minimum dataset for chronic low back pain research: reference values from the Quebec low back pain study. Pain. 2023;164(2):325–335. doi:10.1097/j.pain.0000000000002703

3. Özüdoğru A, Canlı M, Ceylan İ, Kuzu Ş, Alkan H, Karaçay BÇ. Five times sit-to-stand test in people with non-specific chronic low back pain-a cross-sectional test-retest reliability study. Ir J Med Sci. 2023;192(4):1903–1908. doi:10.1007/s11845-022-03223-3

4. Saravi B, Zink A, Ülkümen S, et al. Clinical and radiomics feature-based outcome analysis in lumbar disc herniation surgery. BMC Musculoskelet Disord. 2023;24(1):791. doi:10.1186/s12891-023-06911-y

5. Arts MP, Kuršumović A, Miller LE, et al. Comparison of treatments for lumbar disc herniation: systematic review with network meta-analysis. Medicine (Baltimore). 2019;98(7):e14410. doi:10.1097/MD.0000000000014410

6. Matsuyama Y, Chiba K, Iwata H, Seo T, Toyama Y. A multicenter, randomized, double-blind, dose-finding study of condoliase in patients with lumbar disc herniation. J Neurosurg Spine. 2018;28(5):499–511. doi:10.3171/2017.7.SPINE161327

7. Zhao Y, Zhao T, Chen S, et al. Fully automated radiomic screening pipeline for osteoporosis and abnormal bone density with a deep learning-based segmentation using a short lumbar mDixon sequence. Quant Imaging Med Surg. 2022;12(2):1198–1213. doi:10.21037/qims-21-587

8. Song Y, Zhang J, Zhang YD, et al. FeAture Explorer (FAE): a tool for developing and comparing radiomics models. PLoS One. 2020;15(8):e0237587. doi:10.1371/journal.pone.0237587

9. Mayerhoefer ME, Materka A, Langs G, et al. Introduction to Radiomics. J Nucl Med. 2020;61(4):488–495. doi:10.2967/jnumed.118.222893

10. Song MX, Yang H, Yang HQ, Li SS, Qin J, Xiao Q. MR imaging radiomics analysis based on lumbar soft tissue to evaluate lumbar fascia changes in patients with low back pain. Acad Radiol. 2023;30(11):2450–2457. doi:10.1016/j.acra.2023.02.038

11. Kocaman H, Yıldırım H, Gökşen A, Arman GM. An investigation of machine learning algorithms for prediction of lumbar disc herniation. Med Biol Eng Comput. 2023;61(10):2785–2795. doi:10.1007/s11517-023-02888-x

12. Bečulić H, Begagić E, Skomorac R, Mašović A, Selimović E, Pojskić M. ChatGPT’s contributions to the evolution of neurosurgical practice and education: a systematic review of benefits, concerns and limitations. Med Glas. 2024. doi:10.17392/1661-23

13. Dong D, Fang MJ, Tang L, et al. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Ann Oncol. 2020;31(7):912–920. doi:10.1016/j.annonc.2020.04.003

14. Zhang J, Liu J, Liang Z, et al. Differentiation of acute and chronic vertebral compression fractures using conventional CT based on deep transfer learning features and hand-crafted radiomics features. BMC Musculoskelet Disord. 2023;24(1):165. doi:10.1186/s12891-023-06281-5

15. Zheng HD, Sun YL, Kong DW, et al. Deep learning-based high-accuracy quantitation for lumbar intervertebral disc degeneration from MRI. Nat Commun. 2022;13(1):841. doi:10.1038/s41467-022-28387-5

16. Zhou Y, Liu Y, Chen Q, Gu G, Sui X. Automatic lumbar MRI detection and identification based on deep learning. J Digit Imaging. 2019;32(3):513–520. doi:10.1007/s10278-018-0130-7

17. Hallinan JTPD, Zhu L, Yang K, et al. Deep learning model for automated detection and classification of central canal, lateral recess, and neural foraminal stenosis at lumbar spine. MRI Radiology. 2021;300(1):130–138. doi:10.1148/radiol.2021204289

18. Won D, Lee HJ, Lee SJ, Park SH. Spinal stenosis grading in magnetic resonance imaging using deep convolutional neural networks. Spine. 2020;45(12):804–812. doi:10.1097/BRS.0000000000003377

19. Harada GK, Siyaji ZK, Mallow GM, et al. Artificial intelligence predicts disk re-herniation following lumbar microdiscectomy: development of the “RAD risk profile. Eur Spine J. 2021;30(8):2167–2175. doi:10.1007/s00586-021-06866-5

20. Abdu RW, Abdu WA, Pearson AM, Zhao W, Lurie JD, Weinstein JN. Reoperation for recurrent intervertebral disc herniation in the spine patient outcomes research trial: analysis of rate, risk factors, and outcome. Spine. 2017;42(14):1106–1114. doi:10.1097/BRS.0000000000002088

21. Gaonkar B, Villaroman D, Beckett J, et al. Quantitative analysis of spinal canal areas in the lumbar spine. an imaging informatics and machine learning study. AJNR Am J Neuroradiol. 2019;40(9):1586–1591. doi:10.3174/ajnr.A6174

22. Hashia B, Mir AH. Texture features’ based classification of MR images of normal and herniated intervertebral discs. Multimedia Tools Appl. 2020;79(21):15171–15190. doi:10.1007/s11042-018-7011-4

23. Zhang K, Liu C, Pan J, et al. Use of MRI-based deep learning radiomics to diagnose sacroiliitis related to axial spondyloarthritis. Eur. J Radiol. 2024;172:111347. doi:10.1016/j.ejrad.2024.111347

24. Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks. IEEE. 2016. doi:10.1109/CVPR.2017.634

25. Hara S, Andresen H, Solheim O, et al. Effect of spinal cord burst stimulation vs placebo stimulation on disability in patients with chronic radicular pain after lumbar spine surgery: a randomized clinical trial. JAMA. 2022;328(15):1506–1514. doi:10.1001/jama.2022.18231

26. Beyaz SG, Ülgen AM, Kaya B, et al. A novel combination technique: three points of epiduroscopic laser neural decompression and percutaneous laser disc decompression with the Ho:YAG laser in an MSU classification 3AB herniated disc. Pain Pract. 2020;20(5):501–509. doi:10.1111/papr.12878

27. Zhang W, Chen Z, Su Z, et al. Deep learning-based detection and classification of lumbar disc herniation on magnetic resonance images. JOR Spine. 2023;6(3):e1276. doi:10.1002/jsp2.1276

28. Costa F, Oertel J, Zileli M, Restelli F, Zygourakis CC, Sharif S. Role of surgery in primary lumbar disk herniation: WFNS spine committee recommendations. World Neurosurg X. 2024;22:100276. doi:10.1016/j.wnsx.2024.100276

29. Petr O, Glodny B, Brawanski K, et al. Immediate versus delayed surgical treatment of lumbar disc herniation for acute motor deficits: the impact of surgical timing on functional outcome. Spine. 2019;44(7):454–463. doi:10.1097/BRS.0000000000002295

30. Siccoli A, Staartjes VE, de Wispelaere MP, Schröder ML, de Wispelaere MP. Association of time to surgery with leg pain after lumbar discectomy: is delayed surgery detrimental? J Neurosurg Spine. 2019;32(2):160–167. doi:10.3171/2019.8.SPINE19613

31. Bailey JF, Fields AJ, Ballatori A, et al. The relationship between endplate pathology and patient-reported symptoms for chronic low back pain depends on lumbar paraspinal muscle quality. Spine. 2019;44(14):1010–1017. doi:10.1097/BRS.0000000000003035

32. Özcan-Ekşi EE, Ekşi MŞ, Turgut VU, Canbolat Ç, Pamir MN. Reciprocal relationship between multifidus and psoas at L4-L5 level in women with low back pain. Br J Neurosurg. 2021;35(2):220–228. doi:10.1080/02688697.2020.1783434

33. Fan Z, Wang T, Wang Y, Zhou Z, Wu T, Liu D. Risk factors in patients with low back pain under 40 years old: quantitative analysis based on computed tomography and magnetic resonance imaging mDIXON-quant. J Pain Res. 2023;16:3417–3431. doi:10.2147/JPR.S426488

34. Faur C, Patrascu JM, Haragus H, Anglitoiu B. Correlation between multifidus fatty atrophy and lumbar disc degeneration in low back pain. BMC Musculoskelet Disord. 2019;20(1):414. doi:10.1186/s12891-019-2786-7

35. Khil EK, Choi JA, Hwang E, Sidek S, Choi I. Paraspinal back muscles in asymptomatic volunteers: quantitative and qualitative analysis using computed tomography (CT) and magnetic resonance imaging (MRI). BMC. Musculoskelet Disord. 2020;21(1):403. doi:10.1186/s12891-020-03432-w

36. Hu X, Feng Z, Shen H, et al. New MR-based measures for the evaluation of age-related lumbar paraspinal muscle degeneration. Eur Spine. Eur Spine J. 2021;30(9):2577–2585. doi:10.1007/s00586-021-06811-6

37. Kukuk GM, Hittatiya K, Sprinkart AM, et al. Comparison between modified Dixon MRI techniques, MR spectroscopic relaxometry, and different histologic quantification methods in the assessment of hepatic steatosis. Eur Radiol. 2015;25(10):2869–2879. doi:10.1007/s00330-015-3703-6

38. Chen P, Zhou Z, Sun L, et al. Quantitative multi-parameter assessment of age- and gender-related variation of back extensor muscles in healthy adults using Dixon MR imaging. Eur Radiol. 2024;34(1):69–79. doi:10.1007/s00330-023-09954-w

View original article

JOURNAL OF MULTIDISCIPLINARY HEALTHCARE

分享书签

0 0 0 0 0 0 0

More from this channel

Deep-Learning-Based Radiomics to Predict Surgical Risk Factors for Lumbar Disc Herniation in Young Patients: A Multicenter Study

留言 (0)