AI for detection, classification and prediction of loss of alignment of distal radius fractures; a systematic review

Of the included studies, fourteen studies described detection [15, 18, 20, 26, 28, 30,31,32,33,34,35,36,37,38], one study both detection and classification [29], two studies both detection and localization [13, 21] and one study localization and classification [39] of DRFs. No studies on the prediction of loss of threshold alignment were found. Four studies used posterior-anterior (PA) and lateral radiographs [15, 32, 36, 40], in five studies anterior-posterior (AP) and lateral [18, 28,29,30, 38], and in three studies [26, 34, 37] an extra oblique projection was used. Three studies only used lateral [33], AP [31], or PA [39] radiographs, and in three studies [20, 21, 35], the projection was not clearly described. As the ground truth, fifteen [15, 18, 20, 21, 26, 28, 29, 31,32,33, 36,37,38,39,40] studies used one or more radiologists’ or surgeons’ expertise to detect DRF. In addition, one study [34] used the radiological reports, checked and verified by a radiology registrar competent, and one study [30] used the clinical diagnosis of orthopaedic surgeons. In one study [35], the ground truth was not reported. The number of included radiographs ranged from 221 [21] to 31,490 [15] and from 32 [21] to 3500 [15] for training and testing sets, respectively. Validation sets were used in six studies [15, 20, 21, 26, 28, 30], ranging from 54 [20] to 1461 [28] radiographs. The total number of fractures on the radiographs used in the studies ranged from 221 [21] to 4452 [34] DRFs.

Detection

The sensitivity of fracture detection was reported in fourteen studies [15, 18, 22, 26, 28, 30,31,32,33,34,35, 37, 38, 40], ranging from 80% [13] to 99% [18]. Specificity was also reported, from 73% [28] to 100% [13, 18]. The AUC was reported in twelve studies [15, 18, 27, 28, 30,31,32,33, 36, 37, 40, 29] ranging from 0.87 [13] to 0.99 [30]. The accuracy was reported in nine studies [18, 29,30,31,32, 34, 35, 37, 38] ranging from 82% [22] to 99% [18]. In addition, Raisuddin et al. [36] reported a balanced accuracy of 76%. See Table 1.

Two CNN models were compared by Kim et al. [34], where the sensitivity, specificity, AUC and accuracy were similar for both models. Lindsey et al. [15] reported the performance of different test sets separately, where the AUC was 0.97, 0.98, and 0.99 for the internal, external, and clinical data test sets, respectively.

Classification

Two studies reported the performance of the classification of DRFs [29, 39]. The AUC assessed separately by Tobler et al. [29] on fragment displacement, joint involvement, and detection of multiple fractures was 0.59, 0.68, and 0.84, respectively. The accuracy was 60%, 64% and 78%, respectively [29]. Min et al. reported an AUC of 0.82, accuracy of 81%, sensitivity of 83%, specificity of 72% and a F1-score of 0.86.

AI versus clinicians

Among the included studies, eight [15, 18, 26, 29, 31, 36, 37, 40] compared the performance of AI and clinicians’ performance. According to Blüthgen et al. [40], radiologists’ performance was comparable to internal data and better on external data. Cohen et al. [26] found AI sensitivity significantly higher than initial radiology reports (IRR), with combined AI and IRR showing even greater sensitivity. Gan et al. [31] demonstrated that AI outperforms radiologists in accuracy, sensitivity, specificity, and Youden index. Comparisons with orthopaedic surgeons showed similar results. Lindsey et al. [15] revealed comparable sensitivity and AUC of aided and unaided emergency medicine clinicians by CNN. Notably, the model showed higher specificity compared to unaided clinicians. Raisuddin et al. [36] showed higher radiologist performance in normal cases and similar performance in hard cases.

Suzuki et al. [18] showed equal to better accuracy, sensitivity and specificity of CNN versus orthopaedic surgeons, though without statistically significant differences.

In Lee et al. [37], the sensitivity, specificity, accuracy, and AUC of two reviewers aided by AI increased in all fields compared to unaided. In addition, this study showed a decrease in mean interpretation time when aided by AI. Lastly, Tobler et al. [29] reported higher AUC for radiology residents than AI’s assessment of DRFs without osteosynthetic material or cast.

留言 (0)

沒有登入
gif