Estimated diagnostic performance of prostate MRI performed with clinical suspicion of prostate cancer

This HIPAA-compliant retrospective study was approved by our institutional review board with an informed consent waiver. Thousands of patients overlapped with published works regarding CDR of prostate MRI [7, 8, 18,19,20].

Study population

Patients who underwent prostate MRI at three facilities of a single institution from 2018 to 2022 were included. Patients who had known prostate cancer (Grade group ≥ 1) at the time of MRI or had incomplete examinations were excluded (Fig. 1).

Fig. 1figure 1

Patient flowchart. PI-RADS, Prostate Imaging-Reporting and Data System

MRI

All examinations were performed under PI-RADS version 2.0 or 2.1 technical specifications. Most studies were performed on 3-T scanners (GE Medical Systems, Illinois, U.S., or Siemens Healthineers, Erlangen, Germany), but 1.5-T scanners were used when the 3-T magnet was contraindicated. Contrast material was used unless contraindicated. Board-certified, fellowship-trained abdominal radiologists interpreted the MRI based on PI-RADS criteria using the same standardized report template. Prostate segmentation and 3-dimensional lesion markings were made on DynaCAD (Philips Healthcare, Best, the Netherlands) for a targeted biopsy when PI-RADS ≥ 3 lesions existed.

Prostate biopsy

Trans-rectal or trans-perineal ultrasound-guided targeted biopsy (3–5 cores per lesion) was performed by urologists utilizing fusion software (UroNav, Philips Healthcare). Systematic biopsies were performed simultaneously (10–12 cores). In PI-RADS 1–2, systematic biopsies were performed if clinical suspicion of csPCa was high.

Estimation of csPCa proportion

In overview, the csPCa proportion in patients who did not have pathological confirmation within 1 year after the MRI was estimated using logistic regression models developed from patients who had pathological confirmation for each PI-RADS score.

Data collection

The target variable was the presence of csPCa within 1 year after the MRI, which was extracted from pathology reports between January 2018 and October 2023. The csPCa was defined as Grade group ≥ 2 or prostate cancer pathologically diagnosed only from the metastatic foci without a Gleason score. The following ten predictive variables at the time of MRI were extracted from clinical notes or MRI reports: age; facility; prostate-specific antigen (PSA) values at the time of MRI; presence of previous benign prostate biopsy [16]; patient-level highest PI-RADS score; prostate volume measured on MRI; family history of prostate cancer; family history of breast cancer (alternative of possible BRCA gene mutation [17, 21]); presence of prostate nodule on digital rectal exam [22]; and race [13]. Additionally, the PSAD value was calculated by dividing the PSA value by prostate volume.

To extract clinical notes, an in-house software, MedTagger (https://github.com/OHNLP/MedTagger/), [23] was used. Then, patient-level categorization was performed by applying a developed natural language processing pipeline using Bidirectional Encoder Representations from Transformers [18].

Data preprocessing and feature selection

First, PSA and prostate volume were log-transformed to make the originally skewed distribution more normal. Second, continuous variables, including those two and age, were standardized to have means of 0 and standard deviations of 1. Facility information, a categorical variable with three classes, was binarized using one-hot encoding. Third, missing value imputations were performed using multiple imputations by chained equations technique [24] for variables with missing values of transformed PSA and transformed prostate volume. Then, PSAD was calculated, log-transformed, and standardized. Features to be included in the logistic regression model were selected using a subset of data with a forward feature selection technique. The most parsimonious set of features with low error was chosen using the area under the receiver operating characteristic curves and the one standard error rule [25]. The final included features were age, previous history of benign prostate biopsy, facility, PI-RADS score, PSAD, and prostate volume. The details of those steps will be reported separately.

Bootstrap aggregation

A thousand prediction models were created using different subsets of data generated by random sampling with replacement [26] from each PI-RADS population with pathological confirmation. Then, the models were applied to patient data without pathological confirmation to estimate the csPCa proportion, defined as the average of the model’s outputs of all patients at each PI-RADS score. Similarly, the degree of estimation bias, a difference between observed and estimated csPCa proportions, was calculated using the population with pathological confirmation but not selected at random sampling (out of bag). Further details regarding this process can be found in Fig. 2.

Fig. 2figure 2

The schema of estimating the proportion of clinically significant prostate cancer in patients without pathological confirmation. The proportion of clinically significant prostate cancer (csPCa) in patients without pathological confirmation was estimated through a thousand repeated processes called bootstrap aggregating, a commonly used machine learning technique for reducing variance. For each repetition, a model was independently created using a subset of the dataset called “bag”, which was chosen by sampling with replacement. The remaining data not chosen in the sampling process was called “out of bag” and used for calculating the degree of estimation bias in the population with pathological confirmation. A model consisted of three calibrated logistic models created through threefold cross-validation and outputted the average of their outputs. In each fold, a logistic regression model was developed using two-thirds of the “bag”. Then, its prediction on the remaining subset was used to fit the agreement to the observed csPCa proportion through a sigmoid regressor. The thousand estimated results were aggregated to calculate the mean with the 95% confidence interval using the percentiles of the bootstrap distribution. This process was performed separately for data per PI-RADS score

The above data preprocessing step was independently performed for each bootstrap repetition to avoid data leakage. Standardization was based on training sets and applied to other data. The average of bootstrap statistics was taken to compute a more accurate estimate with degrees of uncertainty.

Analyses

The primary analysis estimated the PI-RADS score-level csPCa proportion in patients who did not have pathological confirmation of the prostate within 1 year after the MRI. Then, the AUC and the following statistics were calculated:

$$=\frac\, }\, \left(\ge -\cap +\right)}}(+)}$$

$$=\frac\, \, }\, \, \left( < \, -\cap -\right)}\,\,(}\, -)}$$

$$\, \left(\right) \, =\frac \, \left(\right)}\, \left(\ge -\, i\cap +\right)}}(\ge -\cap \, +)}$$

$$ \, \left(\right) \, =\frac\, \left(\right)}\left(-\, 1-2\cap -\right)}}(-1-2\cap \, +)}$$

$$ \, \left(\right) \, =\frac \, \left(\right)}\left(-\ge 3\cap +\right)}}()}$$

$$=\frac}(-\ge 3)}}\,()}$$

$$=\frac}\, (- \, i\cap +)}}\, (-i)}$$

$$=\frac\, }\, (+)}}\, ()}$$

The “observed” diagnostic performance metrics were defined for patients with pathological confirmation, whereas the “estimated” metrics were for patients with and without pathological confirmation, if appropriate. The CDR and AIR were calculated at PI-RADS ≥ 3. Note that the “estimated” items in a whole population, regardless of the presence of prostate pathology, were derived from the sum of the observed number of patients with csPCa in the population with pathology and the estimated number of patients with csPCa in the population without pathology.

For comparison with our estimated statistics, published studies that evaluated the diagnostic performance of prostate MRI were searched. Although most studies are for patients who have already been planned for prostate biopsy, we considered that two multi-center prospective studies evaluating the diagnostic performance of MRI-guided biopsy may potentially reflect the entire prostate MRI populations performed with clinical suspicion of csPCa [27, 28]. Van der Leest et al enrolled 626 biopsy-naive patients with clinical suspicion of csPCa and performed biopsies in all cases including 309 (49.3%) PI-RADS 1–2 examinations. Rouvier et al enrolled 251 biopsy-naive patients with clinical suspicion of csPCa and performed biopsies in all cases including 53 (21.1%) had PI-RADS 1–2 examinations. Since those studies focused on biopsy-naive patients, the above statistics in the current study were reported separately for all patients, biopsy-naive patients, and those with previous benign prostate biopsies. We considered AUC as a more suitable performance metric for comparison than other metrics because it is theoretically independent of disease prevalence and is invariant to shifts in PI-RADS assignments. The AUC between the current study cohort and published studies [27, 28] were compared but without statistical test due to the limited number of studies for comparison (n = 2).

The secondary analysis evaluated the association between the presence of pathological confirmation and age, PSAD, and the presence of previous benign biopsy for each PI-RADS score. The PI-RADS score-level breakdown of csPCa was shown using the Gleason grading system [29]. All analyses were performed at the exam level and summarized at each facility. Python 3.11 was used with an alpha level of 0.05.

留言 (0)

沒有登入
gif