Diagnostic performance and image quality of an image-based denoising algorithm applied to radiation dose-reduced CT in diagnosing acute appendicitis

Study design and participants

The institutional review board approved this retrospective study, and the requirement for informed consent was waived. We extracted clinical data and CT images of patients from a previous multicenter trial (Low-dOse CT for Appendicitis Trial, LOCAT; ClinicalTrials.gov number, NCT01925014) that demonstrated the non-inferiority of low-dose CT (with a target effective dose of 2 mSv) compared to standard-dose CT (with a target effective dose less than 8 mSv) in the diagnosis of appendicitis in adolescents and young adults [11]. The eligibility criteria for the trial were patients aged 15–44 years who were referred from the emergency departments for CT examination under the suspicion of appendicitis. The final diagnosis of appendicitis is based on the trial data, including surgical, pathologic, and follow-up results. In this study, we included 180 patients (15 to 44 years of age; 91 female) who underwent 2-mSv CT examinations from February 2014 to August 2016. We randomly selected 15 patients per each type of CT machine (Fig. 1). The characteristics of the study population are summarized in Table 1. We wrote this report in line with a reporting guideline (Standards for Reporting of Diagnostic Accuracy; STARD) [14].

Fig. 1figure 1

Flowchart of the study. ULDCT ultralow-dose CT, D-ULDCT denoised ultralow-dose CT

Table 1 Baseline patient characteristicsCT image acquisition

The patients underwent 2-mSv CT from machines with 16 to 640 channels from various manufacturers (64 and 128 channels from GE Healthcare; 16, 64, 128, and 256 from Philips; 16, 64, 128, and 128 (64 × 2) from Siemens; 64 and 640 from Toshiba). All patients underwent CT scans using intravenous contrast agents. The abdominopelvic CT images were obtained during the portal venous phase and reconstructed using filtered back projection with a slice thickness of 4 mm and a slice interval of 3 mm. The mean volumetric CT dose index was 2.6 ± 0.8 mGy, and the mean dose-length product was 139.3 ± 45.7 mGy·cm.

Simulation of ultralow-dose CT and application of denoising algorithm

From the 2-mSv CT images, we simulated ULDCT images with an image-based reduced-dose CT simulation technique [15], reducing the dose by at least 50%. This technique is based on sinogram synthesis and image reconstruction using only CT images while not requiring raw sinogram data. Previous studies validated the dose reduction technique, which provided realistic low-dose images including the noise and textual appearance [16,17,18].

Then, we used an image-based deep-learning denoising algorithm (ClariCT.AI™, ClariPI) to generate denoised ULDCT (D-ULDCT) images [19]. This deep-learning algorithm is a vendor-neutral image reconstruction technique based on a modified U-net type convolutional neural network model [20]. The algorithm was trained using a dataset of over 1 million CT images, covering 2,100 combinations of scan and reconstruction conditions, including variations in kVp, mAs, automatic exposure control, slice thickness, contrast enhancement, and convolution kernels. The dataset encompassed 24 scanner models from four CT manufacturers (GE Healthcare, Siemens, Philips, and Canon). The algorithm’s performance has been validated in several studies [17, 21,22,23,24].

Qualitative image analysis

Six radiologists with different experience levels (three board-certified abdominal radiologists with 6 to 7 years of clinical experience, one third-year, and two second-year residents) independently reviewed the ULDCT and D-ULDCT images. We randomly assigned 180 patients to two groups (90 patients in groups A and B, respectively). In the first session, each radiologist assessed the ULDCT images of Group A and the D-ULDCT images of Group B. In the second session, they evaluated the D-ULDCT images of Group A and the ULDCT images of Group B. To reduce recall bias, the first and second sessions were separated with a washout period of at least 4 weeks, and the order of the CT images was randomized and different for each session. All readers were informed that patients underwent CT examinations for suspected appendicitis but were blinded to other patient information, study date, radiation dose, and reconstruction algorithm.

Firstly, the radiologists were asked to rate appendiceal visualization and likelihood score for appendicitis using the standardized CT report form (Supplementary Table 1). Appendiceal visualization was rated using a 3-point Likert scale (grade 0, not identified; grade 1, unclearly or partially visualized; and grade 2, clearly and entirely visualized). If the CT image showed phlegm or abscess with clear continuity with the remaining appendiceal base, grade 2 was assigned. The likelihood score for appendicitis was rated on a 5-point Likert scale. The primary diagnostic criterion was appendiceal enlargement (larger than 6 mm in diameter) with mural thickening and periappendiceal fat stranding. Secondary diagnostic criteria were abnormal mural enhancement, appendicolith, phlegmon, and abscess. For diagnostic sensitivity and specificity calculation, the likelihood score for appendicitis ≥ 3 was considered positive for the diagnosis [25].

Secondly, the radiologists independently rated the image quality of ULDCT and D-ULDCT by using a 5-point Likert scale (Supplementary Table 2). The following attributes were evaluated: subjective image noise (defined as the degree of mottling or graininess in the images), diagnostic acceptability (defined as the reader’s confidence in making a reasonable diagnosis from the image), and artificial sensation (defined as the degree of plastic-looking, smooth, paint-brushed, or unnatural texture).

Quantitative image analysis

A quantitative analysis of the image quality was conducted based on image noise, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR) [26]. A board-certified radiologist with 7 years of abdominal imaging experience, who was independent of the six readers, measured the mean Hounsfield unit and standard deviation (SD) of the hepatic parenchyma, paraspinal muscle, abdominal aorta, and subcutaneous fat on ULDCT and D-ULDCT images using oval-shaped regions of interest (ROIs; 50 to 200 mm2 in size). Hounsfield unit of the hepatic parenchyma was obtained by averaging values of the four liver sections (left lateral, left medial, right anterior, and right posterior). The ROIs were placed in homogeneous areas at the level of the umbilical portion of the left portal vein, avoiding structures such as large vessels, intramuscular fat, or calcified vessel walls. The ROIs were positioned at the same location and size on ULDCT and D-ULDCT images from the same patient.

We calculated the image noise as the mean SD of the ROIs. SNR and CNR for each target region were determined using the following equations [26]:

SNRi = ROIi / SDi.

CNRi = (ROIi - ROIfat) / SDfat.

where ROIi is the mean attenuation of the region, ROIfat is the mean attenuation of the subcutaneous fat, SDi is the image noise of the region, and SDfat is the image noise of the subcutaneous fat.

Reference standard.

In previous trial, independent outcome assessors, who were two emergency department physicians and five radiologists, adjudicated the final diagnosis of appendicitis based on the trial data, including surgical findings, pathologic findings, and follow-up results [27]. All final diagnoses of appendicitis included in this study were established based on surgical and pathologic findings [11]. Histopathologic diagnosis of appendicitis was defined as neutrophil infiltration in the appendiceal wall [28].

Statistical analysis

We used the Wilcoxon signed-rank test to compare appendiceal visualization from each reader between ULDCT and D-ULDCT. We used the receiver operating characteristic (ROC) analysis to assess the diagnostic accuracy of readers in diagnosing appendicitis. We compared the area under the receiver-operating characteristic curve (AUC) of each reader between ULDCT and D-ULDCT using DeLong’s test for two correlated ROC curves. Additionally, we calculated diagnostic sensitivity and specificity in the ULDCT and D-ULDCT. For interobserver agreement, we calculated the quadratic weighted Kappa coefficient for board-certified radiologists and residents regarding ULDCT and D-ULDCT. We used z tests to compare the interobserver agreement between ULDCT and D-ULDCT.

We used the Wilcoxon signed-rank test to compare the subjective qualitative scores between ULDCT and D-ULDCT in each reader and the paired t-test to compare quantitative parameters (image noise, SNR, and CNR) between ULDCT and D-ULDCT.

Statistical analyses were performed using R software version 3.6.3 (www.R-project.org, R Foundation for Statistical Computing). A two-sided p-value of < 0.05 was considered statistical significance.

留言 (0)

沒有登入
gif