This study was approved by the Institutional Review Board of Shanghai Changzheng Hospital (2022SL071) before patient information was accessed, and the requirement for informed patient consent was waived due to the retrospective nature of the analysis and the anonymity of the data.
Data CollectionWe retrospectively collected 2,212 knee joint plain radiographs from 1208 patients from the Picture Archiving and Communication System (PACS) of Shanghai Changzheng Hospital (also referred to as Center 1) to train and validate the proposed AI model. Of these radiographs, 910 were AP radiographs, and 1302 were LAT radiographs. Specifically, 1638 plain radiographs from 796 patients (including 597 AP radiographs and 1041 LAT radiographs) were randomly selected as the training cohort, while the remaining 574 images from 412 patients (including 313 AP radiographs and 261 LAT radiographs) were used as the internal validation cohort. It is worth mentioning that we used a patient-wise partitioning strategy for the training and validation cohorts, ensuring that images from a single patient were only included in either the training or validation dataset, but not both.
To further validate the generalizability of the proposed AI-based QC model across different hospitals, an independent external validation cohort was collected from six other hospitals (referred to as Centers 2–7) that included 1572 knee radiographs from 753 patients, including 912 AP radiographs and 660 LAT radiographs, as shown in Fig. 1. In this study, we focused on performing QC for individual images rather than patient disease diagnosis, and so QC performance was evaluated at the individual-image level rather than at the patient level.
Fig. 1Inclusion and exclusion criteria for this study. A total of 3784 knee plain radiographs were used to train and validate the generalization performance of the proposed AI-based QC model
The data collected for this study adhered to the following inclusion and exclusion criteria. Radiographs were included if they (1) were taken from patients over 18 years old; (2) were plain knee joint radiographs; and (3) were obtained in accordance with standard guidelines [29]. Radiographs were excluded if (1) they were not AP or LAT projections of the knee joint; (2) they were blurred or occluded, thus affecting the observation of knee joint structures; (3) the knee joint depicted on the radiograph exhibited fractures, foreign bodies, postoperative changes, or severe osteoarthritis; or (4) they showed multiple knee joints in a single image.
All images were captured using equipment from Philips, General Electric or Canon, and any sensitive information was fully anonymized. Table 1 shows the data distribution for all cohorts.
Table 1 Data distribution for different cohortsData AnnotationsPlain knee radiographs are commonly used to diagnose knee joint diseases due to their ability to reveal the structural information of the knee. In this study, we selected three of the most critical and computationally challenging QC criteria for knee radiographs to evaluate the performance of an AI-based model against clinicians. These criteria are defined as follows:
1.Anteroposterior fibular head overlap ratio (AP overlap ratio): measures the overlap ratio between the fibular head and the tibia on AP knee plain radiographs.
2.Lateral fibular head overlap ratio (LAT overlap ratio): measures the overlap ratio between the fibular head and the tibia on LAT knee plain radiograph.
3.Flexion angle of the lateral knee (LAT flexion angle): measures the angle between the femur and the tibia on LAT knee plain radiograph.
To ensure the accuracy of the annotations, two associate chief musculoskeletal (MSK) radiologists with 10 and 13 years of experience first annotated all plain knee radiographs with key points. A committee of two chief MSK radiologists with 26 and 36 years of experience then reviewed all annotations and corrected any misplaced key points. Two other experts simultaneously reviewed all annotations, and any ambiguous labels were discarded. All annotations were then confirmed to be consistent and indisputable.
PreprocessingAll AP/LAT knee radiographs were converted from raw DICOM format to npy format using Python and SimpleITK [30]. To enhance the visualization of skeletal features and remove redundant information, we adjusted the displayed details using window width and window level as calculated by adaptive histogram equalization with limited contrast.
Computing of QC CriteriaComputing QC results for overlap ratios or flexion angle directly from images is challenging. To address this problem, we defined key points that describe the important positions of knee joints in an image. According to the QC requirements, for the AP knee plain radiographs, we used 5 key points, and for the LAT knee plain radiographs, we used 9 key points. Table 2 describes the definitions of these key points.
Table 2 Detailed description of key pointsFigure 2 shows examples of predefined key points (A–I) and their corresponding auxiliary lines on AP and LAT knee plain radiographs. The line connecting key points A and B represents the diaphyseal orientation of the fibula, defined as L1. The distance from key point C to line L1 is defined as Sc, the distance from key point D to line L1 is defined as Sd, and the distance from key point E to line L1 is defined as Se. The overlap ratio is calculated using \((S_-S_)/(S_-S_)\), as shown in Fig. 2a, if key points E and C are located on the same side of straight line L1; otherwise, it is calculated using \((S_-S_)/(S_-S_)\), as shown in Fig. 2b. The line connecting key points F and G represents the diaphyseal orientation of the femur, defined as L2. The line connecting key points H and I represents the diaphyseal orientation of the tibia, defined as L3. The LAT flexion angle is defined as the angle between line L2 and line L3.
Fig. 2Example annotations of predefined key points and their corresponding auxiliary lines. a AP knee plain radiograph. b LAT knee plain radiograph. Auxiliary lines L1, L2, L3, vertical lines Sc, Sd, Se and flexion angle are all shown
It is important to note that key points A, B, F, G, I, and H are used to determine the diaphyseal orientation of the tibia, femur, and fibula. However, these key points are not unique, and slight movement along the diaphyseal orientation will not affect the finalization of the diaphyseal orientation. For instance, key points A and B can be slightly adjusted along line L1, but it is essential to ensure that the point is in the middle of the backbone cross-section (in the vertical direction of L1).
The Proposed AI-Based QC ModelIn this study, we used an HR-Net-based framework [31] to design our automatic QC model for knee joint radiographs, as shown in Fig. 3. Our model was trained to detect a set of predefined key points, and auxiliary lines were drawn to aid in the interpretation of key measurements, as precise values for knee flexion angle and overlap ratios are not directly available. Finally, we used a set of simple but effective geometric calculations to compute the overlap ratio of the fibular head with the tibia on AP and LAT projections, as well as the flexion angle on LAT projections.
Fig. 3Pipeline of the proposed AI-based QC model
More specifically, we first applied an HR-Net [31] model pretrained using ImageNet [32] as a feature extraction backbone to detect predefined key points (key points A–E for AP knee radiographs and A–I for LAT knee radiographs). Auxiliary lines were then drawn to interpret key measurements such as the diaphyseal orientation of the tibia, femur, and fibular head and the overlap between the fibular head and the tibia. Subsequently, geometric calculations were performed to calculate the overlap ratio of the fibular head and the tibia and the angle between the femur and the tibia.
As shown in Fig. 3, HR-Net is a parallel multiresolution and multibranch network framework that ensures semantic information interaction between different branches and maintains high resolution throughout the whole process. Here, semantic information refers to the computed image features at different scales. The model starts from a stem block that decreases the input resolution to 1/4 by using two stride-2 3 × 3 convolutions; the resulting image then serves as the input of the multiresolution and multibranch network. A high-resolution subnetwork is then used as the first stage (S1 in Fig. 3), and the previous high resolution is maintained (1/4 of the original input resolution) throughout the whole process. At each new stage, a high-to-low resolution stream is added in parallel and connected to the multiresolution streams. The later stages not only consist of the resolutions from the previous stage but also have an extra lower resolution stream. Four stages are applied in the whole process, and the number of channels C is doubled while the resolution gradually drops to half (i.e., C = 32, 64, 128, and 256 for feature maps F1, F2, F3, and F4, respectively).
To make better use of multiresolution information, an exchange model is used to exchange information across parallel subnetworks and is repeated several times (e.g., every 4 residual units; only 2 residual units are shown in Fig. 1). In the exchange model, information from different subnetworks is downsampled/upsampled to the same resolution, and 3 × 3 convolutions with stride 1 are used to maintain channel consistency. For example, if the feature \(I_,r=1, 2, 3\) in stage 3 (S3) is associated with the output feature \(O_,r=1, 2, 3\) after an exchange model, and the final output is the sum of the three inputs \(o_=\int^_(I_)+\int^_(I_)+\int^_(I_)\), where r is the resolution index, an extra output \(o_=\int^_(I_)+\int^_(I_)+\int^_(I_)\) is obtained across stages (from S3 to S4). The model repeats the information exchange across the multiresolution subnetworks, with S2, S3, and S4 containing 1, 4, and 3 exchange models, respectively. This enables more effective multiscale fusion learning and allows subnetworks with different resolutions to contribute different pieces semantic information, leading to a more expressive final feature map. Subsequently, features F2-F4 are converted to be consistent with feature F1 using upsampling and 1*1 Conv (H*W*C, F1 is only 1*1 Conv), and then features F1-F4 are concatenated as O1. Finally, a 1 * 1 conv is used to obtain the final output with shape H*W*9. Afterward, the location with the highest probability (maximum activation) in the output probability map is considered the detected key point.
Implementation DetailsIn this study, we used the mean square error (MSE) loss to measure the deviation between the regressed heatmaps and the ground-truth heatmaps, which were generated using a 2D Gaussian distribution with sigma = 2. It should be recalled that the LAT knee joint radiograph has four additional key points over the AP knee joint. To manage this difference, we set the regression objective to 0 for these four key points on the AP knee radiographs. This approach offers two benefits: the model can handle both AP/LAT knee radiographs, and the input image can be automatically identified as an AP or LAT knee radiograph based on the number of detected key points.
We trained the model using stochastic gradient descent (SGD) with an initial learning rate of 0.002, which decayed by 10 after 50 epochs and 56 epochs. The momentum was set to 0.9, and the weight decay was set to 0.0001. We used a mini-batch size of 4 and trained the model for a total of 60 epochs. The short side of the input image was resized to 288 while keeping the original aspect ratio. To increase the diversity of the data, data augmentation strategies including random flips and random inversions with a probability of 0.5 were used. Experiments were implemented using the open-source toolbox mmdetection and pytorch [33]. To speed up training, we used four NVIDIA 1080TI GPUs to train our model.
留言 (0)