Diagnostics, Vol. 13, Pages 104: Adaptive IoU Thresholding for Improving Small Object Detection: A Proof-of-Concept Study of Hand Erosions Classification of Patients with Rheumatic Arthritis on X-ray Images

1. IntroductionRheumatoid arthritis (RA) is a chronic inflammatory autoimmune disease that mainly affects the joints of the hands and feet and can lead to irreversible damage of the affected joints [1,2]. Therefore, early detection of erosions is essential, as drug treatment strategies can delay the potential joint destruction [3].Due to its wide availability, relatively low cost, and high spatial resolution, conventional radiography, in particular, has emerged in recent decades as the essential tool for assessing RA stages and early diagnosis [2,4]. While imaging has become increasingly time-efficient due to technological advances, and biosensitive sequences such as Chemical Exchange Saturation Transfer (CEST) [5,6,7], delayed gadolinium-enhanced MRI of cartilage (dGEMRIC) [8], and T1rho have shown impressive performance in detecting bony lesions in current research [9], subsequent clinical assessment and documentation remain time-consuming and highly subjective.Deep Learning (DL)-based algorithms have proven to be an excellent and timesaving alternative to human assessment [10,11]. Numerous studies have investigated and successfully validated the potential of DL for medical image classification and segmentation [10,11,12,13,14]. Some previous works have also explored these concepts for their suitability in assessing RA patients [13,15]. In numerous previous studies, detecting finger joints and erosions proved difficult and not yet clinically applicable [13,15,16]. Either they used two-stage approaches, such as cascade classifiers [16], in which joint localization and classification were developed separately, which may lead to errors, especially in destructed joints, or the models failed, especially in joint localization of the carpal bones, which could not be assessed separately because of their spatial distances [16]. Therefore, there is considerable scientific and clinical interest in new concepts for excellent classification of destructions and accurate and complete joint localization.In the past two years, Retina networks (RetinaNet) [17], in particular, have emerged as a powerful tool for object detection in medical imaging and other fields [18,19,20]. Despite this, small object detection is still a challenging task [13,21], and many DL frameworks failed. Due to low resolution and background interference, negative areas (where no joint is shown) outweigh positive areas (where joint is shown) [22]. To alleviate this problem, Yan et al.’s study showed that the detection of small objects is especially poor when a high Intersection over Union (IoU) threshold is chosen. In contrast, a low IoU threshold leads to poor location accuracy [22]. However, in the field of RA research, there is considerable interest in small objects with close spatial relationships compared to other medical areas.

Therefore, the study aimed to investigate whether a novel approach based on adaptive IoU thresholds would provide better localization and detection accuracy. Our hypotheses were that (1) lower erosion detection thresholds in RA would result in higher detection performance, analogous to previous studies, and (2) that detection performance, as well as joint localization accuracy, could be substantially increased with a new adaptive IoU method.

4. Discussion

In this study, we successfully presented a new method to substantially improve the recognition accuracy of small objects by adaptively adjusting the IoU thresholds during the training process.

In recent years, the research field of artificial intelligence has expanded in all areas of medicine, including radiology and rheumatology [11]. Especially in image processing, innovative techniques have shown great promise in improving and speeding up clinical workflows and reducing the workload of medical staff. Given the clinical experience, requirements, and human need to monitor DL pipeline decisions, we decided to implement joint erosion classification using a RetinaNet. Compared to feedforward neural networks (FNN) that only have a vector of classifications at the end of the convolutions, for example, to decide if the patient has corona on the CT image or not [14], the RetinaNet allows a visual representation of which region the respective decision is based on. The radiologist or rheumatologist is thus able to verify and confirm the results.Although joint damage is increasingly assessed with echography and MRI examinations, radiographs still provide a comprehensive or panoramic view of joints and are the clinical standard for classifying RA stages [16]. Deep-learning algorithms could greatly improve the clinical assessment of radiographs in many cases, e.g., Pneumonia detection in chest X-ray images or bone lesion detection in musculoskeletal radiographs [34,35]. In addition, new radiographic findings of joint destruction could be discovered. Recently, numerous studies have reported that Deep Learning or CNN was used to assess joints or bones. In this regard, different types of osteoarthritis have been investigated, including osteoarthritis of the hip [36], and osteoporosis of the knees [37], but also the assessment of bone age [38] has been the focus of the studies. However, these previous studies considering large joints have limited applicability to RA as a polyarthritis with central involvement of small joints. Our study overcomes the difficult task of identifying small joints, thus closing the gap in RA joint classification.In our study, we observed a dependence between detection accuracy and the IoU thresholds used, analogous to Yan et al. [22]. Furthermore, no trained RetinaNet with fixed IoU values over the training epochs achieved sufficient accuracy for clinical applicability.In particular, many joints of different sizes are located close to each other in the carpal region. Analogous to the study of Hirano et al., who achieved a localization accuracy of 95.3% for the finger joints using a two-stage approach [16], which is comparable to our study, the intercarpal joints tended to be neglected in their study because these areas are complex and have a closer spatial relationship. We observed similar results with single-stage RetinaNet without adaptive adjustment of the IoU threshold. However, the final model with adaptive IoU adjustment was able to capture complex regions by adaptive adjustment during training. Therefore, it can be assumed that adaptive adaptation is not only suitable for small objects but also of interest for complex structures with close spatial relationships.

Due to the lack of localization and low accuracy in erosion detection, none of the models we tested without adaptive IoU values achieved sufficient accuracy for routine clinical use. In comparison, with the proposed adaptive approach and end-Pos-IoU\end-Neg-IoU values of 0.4\0.3 and 50 adaptive epochs or end-Pos-IoU\end-Neg-IoU values of 0.5\0.3 for 100 adaptive epochs, an accuracy of more than 94% was achieved with an mAP of 0.81 ± 0.18 (50 adaptive epochs) and an mAP of 0.79 ± 0.22 (100 adaptive epochs). These results are comparable to the repeatability of an experienced rheumatologist (mAP = 0.79 and accuracy 88.5%) within one evaluator.

Similar results were observed by Wang et al. in their study on JSN classification in patients with RA [22]. They achieved only a maximum mAP of 0.71 using the classical You only look once (YOLO) version 4 approach. Their proposed adjustment of error functions based on the distance to the GT box, loss generalization, consideration of aspect ratios, and separation of hand and finger joints increased the performance to mAP = 0.87 for two-hand radiographs. However, the performance was determined using validation data rather than test data, and that hands in advanced stages of degeneration were excluded, making a comparison difficult. Nonetheless, our proposed fit is straightforward, requires no prior assumptions, and is extensible to any data set. In addition, we could classify both wrists and finger joints simultaneously in models without requiring additional computational steps. This significantly reduces the computational power and, thus, the availability of usable hardware in the clinical setting. On a standard workstation without GPU used in clinic, a complete evaluation requires only 5 s, which allows a significant acceleration of the clinical routine.Our study, as well as the study by Wang et al., impressively show that the recognition accuracy can be significantly improved by adjusting the loss as a function of spatial relationship compared to previous RA studies, in which only recognition accuracies between 70.6 to 77.5 could be achieved [16,39], we targeted mAP values of 0.81 and 0.87, respectively.

In addition, it was notable that the best model we trained had a higher agreement with the rater than the rater had with himself at a delay of six months. This could be because the model generalized the subjective decision-making of the rater for the first time. Nevertheless, the RetinaNet sometimes differed from the rheumatologist by more than one SvH score. In contrast, at six months, the rheumatologist differed from his previous evaluation by no more than one score. Here, the model tended to classify joints with a score of 1–4 as score 0, which could be due to the class imbalance of the individual scores. While score 0 was present in 75.26% of the joints, scores 1–4 were present in only 3.13–6.9% of the joints.

Furthermore, our study shows that the medical care of RA patients can be optimized in terms of time by using deep learning frameworks. Experienced rheumatologists need 9 ± 13 min for a complete erosion history and documentation. In contrast, the RetinaNet we used took about 5 s for equivalent documentation. This time savings could allow physicians to spend less time on documentation alone in the years to come. In this way, rheumatologists can spend more time with their patients and perform tasks, such as face-to-face discussions with patients about clinical problems and limitations, that cannot be performed equivalently by DL frameworks. Nevertheless, some limitations have to be mentioned. First, the number of patients was limited, mainly due to the fact that there was no freely available data set. Consequently, we need to prepare our own dataset, which is time-consuming work. Second, we only examined CR from Siemens Healthineers. However, compared to MRI measurements in which numerous different scanner coil configurations are available, X-ray images, on the other hand, can be considered comparatively uniform. However, this study did not consider the effects of variability between different providers, platforms, and institutes. In addition, the effects of rings or other interfering objects on the accuracy of the assessment were not investigated. Therefore, further studies are needed to validate its applicability across multiple institutions and other x-ray manufacturers. Third, our proposed approach was only studied for one retinal network. Although retinal networks have been shown in numerous studies to have higher accuracy compared to other network configurations such as YOLO, single-shot multi-box detectors (SSD), etc. [18,40], further studies are needed to investigate the benefits of adaptive adjustment of IoU thresholds as well as to evaluate the different model types in assessing erosion values. Fourth, the paragraph we gave was applied exclusively to images in which all objects were small but of comparable size. Its usefulness for classification tasks in which objects of different sizes are to be detected must therefore be investigated in subsequent studies.

留言 (0)

沒有登入
gif