Does deep learning software improve the consistency and performance of radiologists with various levels of experience in assessing bi-parametric prostate MRI?

Acibadem University Review board approved this retrospective study (ID: 2022-05/08) and waived the need for informed consent for the retrospective analysis of anonymized medical data. We reviewed consecutive patients who underwent a prostate MRI scan due to suspicion of PCa (i.e., increased prostate-specific antigen or suspicious digital rectal examination) or active surveillance between January 2019 and December 2020.

The inclusion criteria were the followings: (1) having whole-mount pathology or biopsy for patients with a PI-RADS ≥ 3 score assigned during routine clinical reading; (2) having a prostate MRI scan obtained at 3 T without an endorectal coil following PI-RADS version 2; and (3) ≥ 18 months of follow-up without any clinical, laboratory, or imaging evidence of PCa for patients with a PI-RADS score ≤ 2 [13].

The following patients were excluded from the study: (1) patients who underwent prostate MRI at 1.5 T; (2) patients who underwent prostate MRI with an endorectal coil; (3) patients with PI-RADS ≥ 3 examinations without any histopathological confirmation; and (4) history of any treatment for PCa.

MRI acquisitions

All patients underwent prostate MRI on one of our 3 Tesla MRI units (Vida or Skyra, Siemens Healthcare) using an 18-channel phased-array surface coil. The MRI protocol was consistent with PI-RADS version 2, as version 2.1 was unavailable during the study period [4]. To minimize bowel movements, Butylscopolamine bromide (Buscopan, Bohringer Ingelheim) was given to the patients.

The bi-parametric prostate MRI protocol encompassed tri-planar T2-weighted imaging and diffusion-weighted imaging. The diffusion-weighted imaging was performed with echo-planar imaging in axial planes at b-values of 0, 50, 500, and 1000 s/mm2. We excluded dynamic contrast-enhanced images since the DL software could not process them. The detailed parameters of the MRI protocol are given in Table 1.

Table 1 The detailed prostate multiparametric magnetic resonance imaging parametersDL software

The DL software (Prostate AI, Version Syngo.Via VB60, Siemens Healthcare) used in this study has three modules: (i) preprocessing module, (ii) DL-based lesion detection module, and (iii) DL-based lesion classification module. In this study, we did not perform any model training or fine-tuning and only used the model for performance testing.

Preprocessing module

The preprocessing module parses the DICOM files to select the axial T2-weighted and DWI with various b-values (e.g., 0 s/mm2 and 800 s/mm2). Then, the preprocessing module computes the ADC maps and synthetic DWI with a b-value of 2000 s/mm2 using a linear least-square fitting with all acquired b-values (i.e., b-values of 0, 50, 500, and 1000 s/mm2 for this study). Afterward, it performs prostate segmentation on T2-weighted images using a DL method proposed by Yang et al. [14] and rigid registration of T2-weighted and DWI.

DL-based lesion detection module

Preprocessed images are propagated into the DL-based lesion detection module. This module has two subcomponents: (1) DL-based lesion candidate detection model and (2) multi-scale false-positive reduction network.

DL-based lesion candidate detection model is a simple 2D U-Net consisting of descending and ascending pathways inter-connected with skip connections at different levels and convolutional blocks at the bottom, resembling a U shape. This model takes 3D volumes of ADC, DWI with a b-value of 2000 s/mm2, and T2-weighted images but processes them slice by slice. The model outputs 2D heatmaps fused to create 3D connected components (i.e., lesion candidates). The detected lesion candidates then propagated into the false-positive reduction model.

The false-positive reduction model is a 2.5D multi-scale deep network previously trained and validated on radiologists-annotated 2170 bi-parametric prostate MRI scans from 7 institutions. The model takes the patches of ADC, DWI, and T2-weighted images of lesion candidates provided by the DL-based lesion candidate detection model.

A 2D DL model can assess the in-plane information within an image (i.e., x and y axes), while it cannot capture the out-of-plane information (i.e., z-axis). Given that the prostate images contain relevant information in the x, y, and z axes, it is essential to consider the information of all planes in evaluating prostate MRI, particularly for eliminating false-positive lesions. Hence, the false-positive reduction model takes two adjacent slices of a 2D input slice as additional channels, making it a 2.5D network. For instance, a T2-weighted image harboring a lesion is fed to the model along with a slice above and below it. This design allows the network to capture the information z-axis and improves consistency and performance. At the same time, it mitigates the need for using fully 3D DL networks, which are resource intensive. In addition, this model is fed by prostate images with a varying field of view (i.e., multi-scale) to empower the model in capturing additional contextual information.

DL-based lesion classification module

The final module of the DL software is the lesion classification module. This module takes the lesion candidates offered by the preceding lesion detection module and provides the PI-RADS scores of the lesion, if present, as PI-RADS 3, 4, or 5, and highlights the lesions on the axial T2-weighted images. Supplementary Document S1 illustrates the components of the DL software. A further detailed description of the DL software can be found in Yu et al. [15].

Radiologists reading

Four radiologists with varying experience levels interpreted the scans with and without the DL software on a dedicated workstation (Syngo.Via, Siemens Healthcare) equipped with a 6-megapixel diagnostic color monitor (Radiforce RX 660, EIZO). All reviewed images were in Digital Imaging and Communications in Medicine (DICOM) format. The first reader was a radiologist with > 20 years of experience. The remaining three radiologists had 5, 3, and 2 years of prostate MRI experience and were routinely interpreting less than 50 prostate MRI scans yearly (hereafter, these radiologists were denoted as less-experienced radiologists 1, 2, and 3, respectively). All radiologists were briefly instructed about the software before the reading.

The radiologists evaluated the scans following PI-RADS version 2, as the DL software used in this study was developed following PI-RADS version 2. With multiparametric prostate MRI, PI-RADS 3 lesions of the peripheral zone showing focal or early contrast-enhancement are upgraded to PI-RADS 4 (i.e., PI-RADS 3 + 1) following PI-RADS version 2 [4]. However, as the contrast-enhanced sequences are not available in bi-parametric MRI, lesions of the peripheral gland are scored using only the diffusion-weighted sequences. Thus, in this study, none of the PI-RADS 3 lesions of the peripheral zone were upgraded to a higher score.

In the initial readings, the radiologists were provided with bi-parametric MRI scans including high b-value DWI and asked to identify the index lesion (i.e., the lesion with the highest PI-RADS score or the largest lesion if there were ≥ 2 lesions with the same score). First, the radiologists marked the index lesion with its PI-RADS score using the standard prostate reading template [4]. Then the radiologists were provided with the decision of DL software overlaid on a T2-weighted image and asked to re-evaluate the scans to assess whether they changed their initial PI-RADS score. Likewise, the PI-RADS scores of the radiologists with the DL software were recorded in the same template. Supplementary Document S2 shows how radiologists read the cases with and without the DL software step by step.

Whole-mount histopathology and biopsy

All biopsy procedures involved a combination of transrectal 12-core systematic and 3–4-core MRI/ultrasound fusion-guided biopsies (Artemis, Eigen) following up-to-date evidence [16]. Biopsy and whole-mount specimens were prepared and evaluated by a genitourinary pathologist with over 20 years of experience following international guidelines [16]. The lesion with the highest Gleason score was defined as the index lesion. A lesion with a Gleason score ≥ 3 + 4 was defined as a clinically significant PCa following the 2014 International Society of Urological Pathology consensus [17].

Statistical analysis

The statistical analyses were performed using the SciPy library of the Python programming language. The continuous variables are presented using the mean and standard deviations with the minimum and maximum; the categorical and ordinal variables are presented with frequencies and percentages. The PI-RADS scores of the radiologists were calculated and compared on a scan level. The inter-rater agreement among the radiologists in PI-RADS scoring with and without the DL software was evaluated using Fleiss’ kappa [18]; the pair-wise inter-rater agreements were investigated using linearly weighted Cohen’s kappa [19]. The kappa scores were interpreted as follows: a kappa score of < 20, a poor agreement; 21–40, a fair agreement; 41–60, a moderate agreement; 61–80, a good agreement; and 81–100, an excellent agreement. The kappa scores were compared following the prior work [20]. We calculated the area under the receiver operating curve (AUROC) in assessing csPCa and compared the AUROCs using DeLong’s test. A p value less than 0.05 was accepted as significant.

留言 (0)

沒有登入
gif