Development of machine learning models aiming at knee osteoarthritis diagnosing: an MRI radiomics analysis

Patients

This retrospective study consecutively enrolled 148 patients with single knee MRI images acquired during the month of September, 2021. The subjects were divided into the KOA and non-KOA groups in line with the KOA diagnostic codes in Guideline (of China) for diagnosis and management of osteoarthritis (2018 edition) (Table 1) [30]. There were 78 left knees and 70 right included in total; the KOA group included 72 cases (34 males, 38 females; 39 left, 33 right; mean age, 52.32 ± 13.95 years; range, 23–83 years). The non-KOA group included 76 case (61 males, 15 females; 39 left, 37 right; mean age, 33.16 ± 11.24 years; range, 20–81 years). The data of body mass index (BMI, 24.30 ± 1.98 kg/m2, derived from body weight [67.85 ± 7.84 kg] and height [1.67 ± 0.09 m]), were also collected, yet those of only 53 subjects out of 148 (35.8%) were available, for these statistics are not routinely acquired at clinic of our centre.

Table 1 KOA diagnostic codes in Guidelines (of China) for Diagnosis and Management of Osteoarthritis (2018 edition) [30]Image data acquisition

All MR images were obtained with 1.5 T MR scanners (EchoStar 16-channel head coil, Alltech Medical Systems, Chengdu, China; Signa Highspeed 8-channel head coil, GE Healthcare, Milwaukee, USA). The MR protocol included fast spin-echo (FSE) T1-weighted images (T1WI) plus FSE T2-weighted images (T2WI) in the axial, coronal and sagittal planes.

Image segmentation

A flow chart depicting image preparation, feature extraction, feature selection and model construction is presented in Fig. 1. To obtain the volume of interest (VOI) for further analysis, we uploaded all data to Radcloud platform (Huiying Medical Technology Co., Ltd). The VOIs of KOA were delineated manually by a radiologist with 10 years of experience in knee imaging (radiologist 1). The delineated VOIs were from cartilage of three regions, namely the medial and lateral compartments of tibiofemoral joints and patellofemoral joints, respectively. The medial and lateral VOIs corresponded to sagittal and coronal views of the tibiofemoral surfaces, and the VOIs of the patella to the sagittal and transverse positions of the patellofemoral surfaces. Regions of interest (ROIs) were thus delineated manually in the MRI for 148 patients, and VOIs were constructed by piling the slices of the corresponding ROIs in sequence. Thirty patients (with all VOIs delineated by radiologist 1) were then randomly selected from all subjects, and all VOIs were again delineated by a senior radiologist with 15 years of experience in imaging the knee joint (radiologist 2) for these patients. The interclass correlation coefficient (ICC) among 1049 features of each sequence was calculated for the latter 30 patients. ICC greater than 0.80 was considered as in good agreement, and radiomic features with ICC below 0.8, which are generally considered to be unreproducible among radiologists, were deleted [31,32,33]. Eventually, the work of radiologist 1 was used for further analysis. The two radiologists were blinded to the information of each subject. An example of the manual segmentation is shown in Fig. 2.

Fig. 1figure 1

A flow-chart presenting raw-image preparation, feature extraction, feature selection and model construction

Fig. 2figure 2

An example of manual segmentation. These were the MRI images of a female patient aged 69 y/o at clinic. Images (a), (b) and (c) are the original DICOM images in axial view, coronal view and sagittal view, respectively; (d), (e) and (f) are the manual annotation diagrams of (a), (b) and (c), respectively

Feature extraction

For MR image data, 1049 radiomic features were extracted from MR image data using a tool (Features Calculation) from the Radcloud platform (https://mics.huiyihuiying.com/#/subject). All the extracted radiomic features came from four categories: first-order statistical features, shape features, texture features and higher-order statistical features. First-order statistics described the intensity information of ROIs in the MR images such as maximum, median, mean, standard deviation, variance and range. Shape features reflected the shape and size of the region, such as volume, compactness, maximal diameter and surface area. Texture features could quantify regional heterogeneity differences. Higher-order statistical features consisting of the texture and intensity features produced by filtering transformation and wavelet transformation of the original MR Images: exponential, square, square root, logarithm and wavelet. Features are compliance with definitions as defined by the imaging biomarker standardisation initiative (IBSI) [34].

Feature selection

All datasets were used to assign 80% of datasets to the training cohort and 20% of datasets to the validation cohort. Optimal features were selected from the training cohort. Prior to the steps of feature selection, all radiomic features were standardised using the StandardScaler function (in Python) by removing the mean and dividing by its standard deviation, and each set of feature value was converted to a mean of 0 with a variance of 1. Although radiomic features with ICC lower than 0.80 were removed, there still remained a great quantity of features. To improve the accuracy of model prediction and reduce the influence of features redundancy, it is necessary to remove redundant features and select the optimal features. The variance threshold method (variance threshold = 0.8) and Select-K-Best method were adopted. The Select-K-Best method used P < 0.05 to determine optimal features related to the KOA. The least absolute shrinkage and selection operator (LASSO) regression method was used to decrease the degree of redundancy and irrelevance. The optimal \(\alpha\), which is the coefficient of regularisation in the LASSO method, was selected using inner tenfold cross-validation in the training cohort with the maximum iteration of 5000 via minimum average mean square error (MSE). Subsequently, the radiomics parameters with nonzero coefficients in the LASSO algorithm generated by the whole training cohort with the optimal \(\alpha\) were selected.

Model construction

The selected features were taken as the inputs for model construction to differentiate KOA from all patients. Images were classified as KOA or non-KOA using ML methods in combination with the selected features listed above. Models were constructed with ML algorithms including logistic regression (LR), K-nearest neighbour (KNN) and support vector machine (SVM) in the training cohort. In the process of model building, every classifier was tuned and the hyperparameters were optimised to maximise the diagnostic performance. In SVM algorithm, the hyperparameters of C (including 0.1, 0.8, 0.5, 1, 3, 5) and kernel (‘rbf’, ‘linear’, ‘sigmoid’) were included; in KNN algorithm, they were n_neighbours (the range is from 2 to 10) and algorithm (‘auto’, ‘ball_tree’, ‘kd_tree’); and in LR algorithm, the included hyperparameters were penalty (‘l1’, ‘l2’) and C (including 0.1, 0.5, 0.8, 1, 3, 5). The classification results were evaluated with a receiver operating characteristic (ROC) curve with the associated area under the ROC curve (AUC), accuracy, sensitivity and specificity.

In a single algorithm, 11 models were, respectively, constructed for comparative analysis. Three models of medial tibiofemoral VOIs were constructed, respectively, including sagittal model (M-S model), coronal model (M-C model) and combined model of the sagittal-coronal (M-S-C model). Similarly, three models of lateral tibiofemoral VOIs were constructed, respectively, as sagittal model (L-S model), coronal model (L-C model) and combined model of the sagittal-coronal (L-S-C model). In patellar VOIs, sagittal model (P-S model), transverse model (P–T model) and combined model of the sagittal-transverse (P-S-T model) were constructed. In addition, we combined all the features to build a comprehensive model (Final model, Final-M). After training, estimations of the generalisation performance of each model were validated in the validation cohort. Besides, clinical data of age, gender and BMI were taken into the construction of an additional model for clinical statistics analyses (Clnc model) rather than being mixed into the former 10 models mainly because of obvious missing of relevant BMI statistics.

Statistical analysis

All statistical analyses were performed using R software version 3.3.0. Normalisation of features, selection of features and model construction were undertaken using Python 3.7.0, Scikit-learn package 0.19.2 and Pyradiomics package 2.2.0. Other statistical analyses were performed using R software version 3.3.0. ROC curve analysis was used to evaluate the diagnostic performances of ML classifiers [95% confidence intervals (CIs), specificity and sensitivity were also calculated], and four indicators including P (precision = true positives/(true positives + false positives)), R (recall = true positives/(true positives + false negatives)), f1-score (f1-score = P*R*2/(P + R)), support (total number in test set) to evaluate the performance of classifier in this study. The statistical analysis was performed in Radcloud platform (https://mics.huiyihuiying.com/).

留言 (0)

沒有登入
gif