Opportunistic Identification of Vertebral Compression Fractures on CT Scans of the Chest and Abdomen, Using an AI Algorithm, in a Real-Life Setting

Performance of the Algorithm and Fracture Prevalence

Our objective was to validate the HealthVCF algorithm within a Danish hospital environment and to explore the impact of its implementation on osteoporosis diagnosis and management over a 6-month follow-up period. The sensitivity for finding moderate/severe VCFs were 68%, and the specificity 91%. This is, especially for the sensitivity, below what is shown in the FDA approval of the HealthVCF, where sensitivity and specificity were 90.20% (95% CI [86.35%;93.05%]) and 86.89% (95% CI [82.63%;90.22%]), respectively.

A recent meta-analysis comparing the performance of similar algorithms showed better average performance across 14 algorithms, with sensitivity 85.7% (95% CI 78.6–90.7) and specificity 93.5% (95% CI 89.5–96.1) [25]. Comparing our results to the results of the meta-analysis suggests that the tested HealthVCF has poorer performance than what could be expected.

Comparing our results with results of Kolanu et al. [23], who tested the same algorithm, we found an almost identical performance. Together with results from Australia, our results point to a poorer performance of the HealthVCF than what is shown in the regulatory data. The subpar performance is a testament to the lack of generalizability to the Danish population.

The accuracy of the HealthVCF was 88.9%, and it demonstrated a Youden Index of 0.59.

A Youden Index of 0 is equivalent to a coin toss, and a Youden Index of 1.0 represents the perfect test. With a Youden Index of 0.59, the HealthVCF has a fairly good performance. To determine its usefulness, sensitivity and specificity must be considered. With a sensitivity of 0.68, approximately 32% out of all scans flagged positive for vertebral fracture will be false positive. With such a low sensitivity, the HealthVCF cannot be used to diagnose vertebral fractures without human oversight, as stated by the manufacturer in their FDA application. This over-calling of fractures means that all positive scans must be analyzed by a specialist, leading to increased expenditure. In our study radiographers spent 5–10 min on average analyzing each image generated by the HealthVCF, including lookup, analysis, and potential reporting. In 1000 scans, this translates to somewhere between 12 and 24 h of work by radiographers to correctly diagnose fractures in 65 patients out of 146 flagged scans.

Focusing on the specificity of 0.91, things are looking better. About 9 out of 10 scans not flagged by the HealthVCF will be true negative. This means that the radiographic specialist can be fairly sure that no vertebral fracture is present when a scan is not flagged by HealthVCF. Though this is not perfect, the HealthVCF might be used by a Department of Radiology to increase their reporting of vertebral fractures without the need to analyze the spine of every CTAB. This minimizes the resources spent on opportunistic identification of vertebral fractures.

The variability in performance compared to previous studies may be explained by differences in population fracture risk and the variability in evaluation based on the method of evaluation [17, 21]. Morphometric evaluation methods, whether quantitative or semi-quantitative have been thoroughly documented [17, 26] and diagnosing Grade 1 (mild) VCFs presents a diagnostic challenge due to notable discrepancies in interpretation among radiologists, leading to an elevated risk of false positive results.

Both clinical and non-clinical VCFs are associated with future fractures risk with lower association for grade 1 compared to grade 2 and 3. This might be explained by the lower specificity of the methods in diagnosing mild fractures [8, 10, 27,28,29]. Based on this knowledge, we focused on identifying only moderate/severe VCFs in this study.

The prevalence of patients with moderate/severe VCFs in the validation study was 9.5%. This is a low prevalence compared to previous studies. The prevalence of VCFs varies in different studies and populations, which may in part be explained by difference in population-risk, method used for evaluation, but also the grades of fractures included in the studies. One study reported the prevalence of morphometric VCFs in Scandinavia to be 26% and 18% in Eastern Europe [30]. Another study found a prevalence of 12% [31] and a third a prevalence of 9.5% [20]. Though age ranged from 20 to 88 years in the third study, no VCFs were identified in the younger patients (< 44 years). A different study found an overall prevalence of 35%; 73% of these were grade 1 fractures, 19% grade 2 and 9% grade 3 [18]. The prevalence for moderate/severe VCFs in the study was 9.5%, equivalent to our report, indication that that part of the variability in prevalence between studies is related to the diagnosis of mild fractures.

In our study, 48% of patients in the retrospective cohort had known osteoporosis at baseline; of these 74.4% were women. This is anticipated due to the higher incidence of osteoporosis in women and the tendency for osteoporosis in men to go unnoticed [32, 33]. Many studies have evaluated incident findings of VCF on chest radiographs or CTAB, but few studies report on patients’ osteoporosis- and treatment status at the time of identification. Barton et al. evaluated the clinical outcome of VCF identification and reported that at time of VCF diagnosis, 21% were receiving anti-osteoporosis treatment [34]. In comparison, we found that 31% of the retrospective cohort were receiving treatment at baseline. Patients in this study were labeled known with osteoporosis based on either an osteoporosis diagnose code in the EPJ, DXA results with a diagnostic T-score in the EPJ or current treatment with anti-osteoporosis drug. This method may lack precision due the existence of different EPJ systems in Denmark. If patients have previously resided in other geographical regions, their DXA results may go unnoticed. Adding to this, patients diagnosed at their GP may not have a diagnosis in the EPJ, resulting in an underestimation of patients with an osteoporosis diagnosis. Conversely, information about current treatment and diagnosis is nationally centralized. Our analysis revealed that 48% of patients with known osteoporosis did not possess an official diagnosis but were either undergoing treatment or had a diagnostic T-score. To be able to provide an accurate representation of disease prevalence, proper diagnosis registration is essential.

Effect of Intervention

At the 6-months follow-up, 97 of the patients in the retrospective cohort had died, resulting in a mortality rate of 18%. The mortality was higher in the group of patients not known with osteoporosis (68%) compared to the group known with osteoporosis (32%). This was not explained by differences in reason for referral (indication for CT scan). Comorbidity, like cancer, and previous osteoporotic fractures, especially hip fractures, may be confounding factors explaining the difference in mortality between the two groups, but we have no data on these variables. Bisphosphonates were used for treatment in 79% of the patients under current osteoporosis treatment in the cohort. The mortality in patients treated with bisphosphonates was significantly lower compared to the rest of the cohort, i.e. patient on other treatments or not in current treatment. Bisphosphonates were only registered if the indication was osteoporosis. Other studies have shown that treatment with bisphosphonates reduces overall mortality, but there is not sufficient data to support this hypothesis [35,36,37,38]. Another explanation for a higher mortality in the group not known with osteoporosis, may be lack of attention toward osteoporosis in presence of other chronic- or severe disease. It is known that a large treatment gap is present in osteoporosis, and it is assumed that the gap is linked to both a gap in diagnosis and lack of awareness. It is not known if the presence of other chronic or severe diseases is the cause of the gap in osteoporosis diagnosis [39].

The aim of opportunistic identification of VCFs is to ensure appropriate assessment and treatment of the patients to reduce risk of future fractures and mortality. At 6 months, 18% of the retrospective cohort were referred for a DXA scan and 11% were either started on anti-osteoporosis treatment or changed to a more potent anti-osteoporosis treatment. As noted earlier, few studies report on clinical outcomes of fracture identification, but Barton et al. reported that within 2 years after a clinical VCF, 2% of patients had a DXA scan and 7% initiated anti-osteoporosis treatment [34]. Thus, our results show a higher DXA referral rate compared to the study by Barton et al.

Workflow of Action

Analysis of the workflow data showed that GPs and medical specialists had the highest rate of acting on the reporting of a VCF. This can be due to both GPs and medical specialists being specialties with knowledge on diagnosis and treatment of osteoporosis.

The analysis also showed that 91% VFC’s detected were described in the primary radiology report, and that the highest rate was seen at the end of the study. This indicates increased awareness among radiologists during the study, but it also demonstrates that a high rate of VCF detection is only part of the solution to closing the treatment gap, given that only 25 of the 233 patients not known with osteoporosis received an osteoporosis diagnosis within 6 months. Higher efficiency could be accomplished by integrating HealthVCF with a fracture prevention program like Fracture Liaison Service (FLS), an internationally validated method of systematically identifying, evaluating and initiating anti-osteoporosis treatment for patients with recent fractures [40]. The programs are shown to increase likelihood for subsequent BMD testing [41, 42].

The rate of VCF reporting in this study is higher than in the literature, where the reporting of incident VCFs on CT scans was found to be 9% [18], 13% [19], 14.6% [20], 32.56% [43] and 24.7% [13]. The high report rate in our study is likely due to the radiologists’ knowledge of the ongoing study, whereas the underreporting noted in other studies might be due to radiologists’ lack of focus on bone-status, as opportunistic screening for VCFs is a non-acute finding during exams often performed in an acute setting [44]. The problem of underreporting might be solved by using a dedicated team to evaluate the HealthVCF findings outside the clinical context, as done in this study. With such a workflow, the radiologists sparse time and resources are preserved for more acute findings.

In this study, 46.7% of the deaths that occurred during the 6-month follow-up period had an acute exam or an oncological follow-up as indication for the CTAB. This shows that opportunistic screening might be inefficient in this subset of the population. Additionally, we observed that the radiographers opted to exclude certain patients from the cohort due to their assessment of a short life expectancy. However, the mortality in this group was no different from that of the rest of the cohort, thus demonstrating that it is difficult to predict mortality. The high mortality rate of 18% in the studied population underscores the need to incorporate a way to spare the system of ineffective use of resources and to spare very ill patients the time and energy of attending consultations, going to scans and initiating unnecessary treatment. This could be done by developing a clinical decision tool to help the FLS team to identify which patients to refer for further assessment and treatment based on their risk of dying and the predicted benefit of attending an FLS. Such a tool has previously been developed by Ong et al. [45].

Strengths and Limitations

One limitation of this study is the choice to use only one radiographer instead of two to determine the presence of fractures in each of the 1000 scans used for evaluation of performance. Ideally two radiographers would conduct a separate analysis of each scan to ensure validity, though the use of only one radiographer does not seem to have affected the result, as our sensitivity and specificity are similar to that found in a recent study of the same algorithm [23]. Another limitation is the short follow-up period, as it might be too limited a time frame for every physician to act on the radiologist’s report, especially in the presence of other severe diseases. Finally, the number of patients diagnosed with osteoporosis during the follow-up period might be underestimated due to osteoporosis diagnoses not being registered in the EPJ when given by a GP.

A strength of this study is the use of a retrospective cohort, giving data on indication for scan, diagnosis, medications and referral to DXA. This data is what makes it possible to determine the real-life impact of an algorithm such as the HealthVCF, as the end goal is not only to detect but to prevent fractures.

留言 (0)

沒有登入
gif