Assessing the Image Quality of Digitally Reconstructed Radiographs from Chest CT

In this pilot study, we compared four DRR techniques to explore which best approaches the diagnostic image quality of CXRs. No significant differences were observed in CheXNet performance between DRRs and CXRs (0.75–0.82 vs 0.80 AUC, p > 0.25). This aligns with previous findings by Mortani Barbosa et al. [14], who demonstrated DRRs to be non-inferior to CXRs for COVID-19 disease classification in 86 patients, suggesting that this non-inferiority may extend to other disease classes.

While the quantitative analysis using CheXNet is not novel in isolation, it serves as an important benchmark for situating our results within the broader body of DRR research. Unlike prior studies, we extend this analysis by focusing on DRRs generated from ULDCT scans and complementing it with qualitative evaluations by radiologists. Radiologists in the qualitative evaluation expressed uncertainty about the diagnostic image quality of the DRRs, rating them as neutral (3.0 to 3.5) compared to CXRs. Key differences in resolution, noise, and overall look-and-feel were noted across DRRs and between DRRs and CXRs, which may impact both AI model performance and human interpretation. In the following discussion, we explore the specific issues raised during the qualitative evaluation, highlighting areas for improvement to better align DRRs with clinical expectations.

Image Quality Factors: CXR Versus DRR

First, the different DRR techniques use varied attenuation calculations, affecting pixel values and contributing to the look-and-feel of the DRR. Techniques optimised for specific applications, such as emphysema quantification or pelvis image registration, may not perform optimally in general chest imaging contexts. For example, methods focused on lung visualisations might underrepresent other anatomical regions. Conversely, the SoftMip technique, despite not being originally developed for chest imaging, showed the highest performance in our evaluations. This is potentially due to its design for ULDCT data, which aligns well with our study conditions.

Second, DRR resolution is inherently limited by the corresponding CT scan’s resolution. DRRs constructed from a standard 512 × 512 grid result in a resolution of 512 × Z, where Z, typically ranging from 300 to 400, represents the number of reconstructed slices. In contrast, CXRs are acquired at a resolution exceeding 3000 × 3000 pixels. All radiologists noted the higher resolution of CXRs favourably. However, this resolution disparity did not impact the performance of CheXNet, as the model down-samples all input images to 224 × 224 pixels, effectively neutralizing the difference in the original resolution between DRRs and CXRs. Nonetheless, applying established super-resolution algorithms [15] to either the DRR or the source CT images could boost radiologists’ perception of DRRs.

Third, noise levels in DRRs, particularly in soft tissue regions, were noted by most radiologists. In CXRs, the level of noise in this anatomic region is inherently lower compared to ULDCT-derived DRRs. Given that no pathology was present in these regions within the dataset, we could not assess the impact on CheXNet performance. While the SoftMip technique was designed to address the high noise levels in ULDCT datasets, this study did not quantify noise levels. To mitigate noise in DRRs, de-noising networks [16] can be applied to the ULDCT data, or to the DRRs itself. Alternatively, using regular dose CT data to construct DRRs may reduce noise levels.

Fourth, the geometry of the DRR technique affects DRR perception. Radiologists viewed point-source projections less favourably due to issues with over-projection and distortion, particularly at the caudal and cranial edges of the DRR. These edges tend to project over the central DRR, as shown in Fig. 4, obscuring the view of anatomical structures at the extremes of the image. This issue is not encountered in conventional CXRs.

Finally, variations in window-level settings and contrast enhancements were investigated to optimise DRR appearance but did not yield significant insights into CheXNet performance. Consequently, these variations were not included in our final analysis. Further studies could explore the effects of these adjustments on both machine learning performance and radiologists’ evaluations.

Clinical Relevance

DRRs can be generated within seconds using standard CPU resources, resulting in four DRRs in under a second on a regular clinical workstation. Furthermore, DRRs can be derived from CT images regardless of anatomical region, contrast use, or other imaging parameters, and can be produced from any viewing angle, extending beyond the standard CXR views. Additionally, DRRs can be freely added to standard DICOM image series, serving as a versatile tool with potential value in both acquisition and diagnostic workflows. Further research is necessary to identify the specific workflows that could benefit from DRRs, determine optimal presentation methods, and assess their clinical value.

Limitations

Our study has several limitations. The size of the dataset, where both an ULDCT and a (same-day) CXR are available for the same patient, is small, limiting the sample size available for the quantitative evaluation. Consequently, AUC scores were calculated for as few as ten cases for specific disease classes, which may limit the generalizability of these results. In this pilot study, a formal power analysis was not performed. Larger studies with diverse pathologies are necessary to confirm these findings. Additionally, the CheXNet model was trained on a dataset known to contain labelling inconsistencies, which could have affected performance [17]. In the qualitative evaluation, only DRRs without pathology were assessed, and it is possible that the presence of findings could have influenced radiologist’s opinions. On average, radiologists expressed neutral opinions regarding the diagnostic image quality of the DRRs, but the underlying motivations of a neutral rating were not surveyed. The study would have benefitted from a more structured survey to better understand the radiologist’s indecision about the diagnostic image quality of the DRRs.

In this study, DRRs were compared to CXRs rather than the source CT data because DRRs were intended to simulate the diagnostic utility of CXRs for 2D imaging applications. Comparing DRRs to CTs, which offer 3D volumetric data, would not align with this clinical use case. This does, however, represent a limitation of our study, as DRR utility relative to CT data has not been evaluated.

留言 (0)

沒有登入
gif