Automatic measurement of lower limb alignment in portable devices based on deep learning for knee osteoarthritis

This study was approved by the Ethics Review Board of the First Medical Center of Chinese PLA General Hospital.

We retrospectively reviewed 1041 digital LLR X-rays (from 623 patients) in the dataset of the General Hospital of the People’s Liberation Army. These X-rays were captured using the uDR 780i Pro Fully Automatic Ceiling-mounted DR (UNITED IMAGING, Shanghai, China) between January and December 2021. Each patient may have undergone multiple pre- and post-operative radiographs. All X-rays were performed for clinical or perioperative measurements.

The inclusion criteria were as follows: (1) patient age ≥ 40; (2) knee OA diagnosis established based on the Chinese Guidelines for the Diagnosis and Treatment of Osteoarthritis (2019 edition) [21]; (3) standing LLR X-rays of both lower limbs; (4) for the same patient, over 90 days between examinations; (5) previous total knee arthroplasty history or no surgery history. The exclusion criteria were as follows: (1) non-standard standing LLR X-rays; (2) severe deformity of the femur or tibia; (3) comorbidities of other diseases that may cause severe knee deformities or joint fusion; (4) comorbidities including infectious arthritis or postoperative joint infection; (5) LLRs showing other implants like uni-compartmental knee prosthesis or internal fixation; (6) incorrect posture (such as rotation or flexion of lower limbs); (7) poor image quality.

Model architecture

We divide the included LLRs into a training set (837), a validation set (101), and a test set (204) according to approximately 80%:10%:20%. For the training set (LLRs were stored with an average size of 2021 × 2021 and a pixel spacing of 0.2 mm), these were calibrated landmarks by orthopaedic specialists, which was implemented in MATLAB (Mathworks, Natick, MA). Then all images were resized to 640 × 320 pixels with the isotropic spacing of 0.79 mm using a bilinear interpolation algorithm and then normalized for model development.

As Liu et al. [22] demonstrated in their study about preoperative planning of total hip arthroplasty, we formulated the alignment measurement to a task including two branches: (1) landmarks detection branch, and (2) edges prediction branch. Each resized LLR was firstly input into the backbone network to extract high-level features for two branches.

As for the backbone network, we used the HRNet model for 2 branches above(the proposed model is shown in Fig. 1) [23]. For landmarks detection, each landmark was converted into a heatmap with a 2D Gaussian distribution centered at its coordinates (the hyperparameter σ of Gaussian heatmaps is chosen to 2), and the distribution is normalized to [0,1]. For edge prediction, each edge connecting two landmarks was denoted as a vector and normalized during the experiment for faster convergence, which was a constraint for helping correct the detection deviations implicitly during training.

Fig. 1figure 1

The framework of the proposed method and examples of corresponding landmarks. v1, Centre of the femoral head; v2, Centre of the femoral diaphysis; v3, Lowest point of lateral femoral condyle or prosthesis; v4, Centre of the knee joint or prosthesis on the femoral side; v5, Lowest point of medial femoral condyle or prosthesis; v6, Lowest point of lateral tibial plateau or prosthesis; v7, Centre of the knee joint or prosthesis on the tibia side; v8, Lowest point of medial tibial plateau or prosthesis; v9, Centre of the tibia diaphysis; v10, Centre of the ankle joint. LLR was from a 65-year-old female patient diagnosed with knee OA and authorized consent was obtained from the patient before the use

The proposed model was implemented using PyTorch and ran on a machine with 4 Nvidia P100 GPUs. The parameters of the network were initialized with the pre-trained model from the large public dataset ImageNet [24]. In addition, the training set was augmented by several methods commonly used [25]. According to the study of Adam [26], the backbone network was optimized with an initial learning rate of 1e-3, which was decreased to 1e-4 and 1e-5 at the 120th and 170th epochs. During the model development, the training loss curve normally decreased and stabilized without overfitting. The overall optimization is carried out for 200 epochs with a batch size of 32.

Surgeons’ evaluation

To test the accuracy of the tool, We used 204 images as a training set from knee OA patients who had not undergone surgery, and those who had undergone unilateral or bilateral total knee replacement surgery (Some patients may have multiple images pre- and post-operatively). There was no duplication of the test set with either the training or validation sets. To assess the measurements of the proposed model, after removing patient information, each LLR was measured by two senior surgeons on a blinded basis independently in PACS, including HKA, JCLA(Joint line convergence angle), AMA(Anatomical mechanical angle), mLDFA(mechanical Lateral distal femoral angle), and mMPTA(mechanical Medial proximal tibial angle). They accomplished this by manually identifying and connecting the corresponding landmarks. The average of the measurements for the landmarks and angles of alignment were considered to be the ground truths. In the portable device, all LLRs were captured using mobile phones (iPhone X, Apple Inc.). During the photo session, the phone was mounted on a tripod, positioned 40 cm away from the LLRs, and centered with them for constant indoor lighting. The images were saved in JPG format and had a resolution of 4000 × 3000 pixels. These images were not optimized or pre-processed before being fed into the model, and the model output the predictions.

Statistical analysis

To compare the performance of landmarks detection, we adopted mean radial error (MRE) for quantitative comparison, which was defined as

\(n\) denoted the number of detected landmarks and \(_\) was the Euclidean distance between the predicted landmarks coordinates obtained by extracting the maxima on heatmaps and the ground truths. To assess the results of the proposed model, we adopted the Chi-Square test. P ≥ 0.05 was considered to represent no significant difference between manual measurements and model calculations. Bland-Altman was adopted to assess the consistency between the two methods. For angles, previous studies considered a difference of > 2° as clinically relevant [3], so we adopted >1° and >2° respectively. The data were analyzed using R (4.0.0) and SPSS Version 25.0.0.2 (SPSS, Inc., Chicago, Ill.).

留言 (0)

沒有登入
gif