Physics-Informed Discretization for Reproducible and Robust Radiomic Feature Extraction Using Quantitative MRI

Radiomics, the use of high-throughput data mining of medical images for statistical and machine learning methods, has led to considerable growth in quantitative image analysis.1–3 By providing objective, numerical metrics of image intensity, morphometry, and texture,1 radiomics offers significant potential value in numerous clinical contexts such as musculoskeletal disease,4 as well as cancer diagnosis and treatment.5,6 However, recent phantom7,8 and in vivo9–11 studies have demonstrated that differences in image acquisition and processing methods may negatively impact the reproducibility of extracted radiomic features12 and downstream models.13 This issue is particularly severe in magnetic resonance imaging (MRI) radiomics, as almost all MRI-based radiomic models use weighted image sequences (eg, T1- or T2-weighted), which provide qualitative, relative intensity values that can vary greatly across scan acquisitions.3,11,14,15

Quantitative MRI can address poor radiomic feature reproducibility stemming from relative MRI intensities by measuring intrinsic tissue properties such as T1 and T2 relaxation times or diffusion. Unlike the relative image intensities in weighted MRI, quantitative MRI values correspond to an absolute physical measurement: for example, T1-weighted MRI voxels are unitless, whereas a T1 relaxation time (or T1) map has units of time.15 As a result, studies14,16 have found quantitative MRI data to be more consistent than weighted MRI across distinct scan protocols, scanners, and time points.14,17 Of these, the quantitative MRI framework known as magnetic resonance fingerprinting (MRF)18 possesses several advantages that make it particularly suitable for reproducible radiomics. The MRF acquisition allows for rapid and robust multiparametric3 tissue property mapping in a clinically acceptable timeframe.5,18 Notably, MRF acquisition includes a B1+ field measurement step19 that corrects intensity variations from field inhomogeneities. Multiple phantom20 and in vivo21–23 studies report MRF18 maps to have excellent (coefficient of variation < 10%) repeatability and reproducibility, with initial radiomic studies24 suggesting MRF radiomic features to have similarly high (intraclass correlation coefficient [ICC] > 0.85) repeatability. Initial test-retest studies25–27 have found comparable radiomic feature reproducibility and stability trends for quantitative apparent diffusion coefficient (ADC) maps derived from diffusion-weighted MRI.

As existing radiomic feature extraction methods were developed for weighted MRI, they cannot fully use the advantages of quantitative MRI. In particular, image discretization is a key step in radiomic feature extraction that transforms variable image intensities to a common set of gray levels.28 In general, this involves fixed bin number (FBN) or fixed bin size discretization, which are predicated on the assumption that image intensities are relative8,29–31: this is due to these methods being developed for weighted MRI data where there is no reference point or voxel value for intensity standardization. As a result, even minor differences in discretization method,8,29 intensity range,32,33 and gray level count8,28,29 have been shown to negatively impact the reproducibility of the resulting radiomic features. In addition, there are currently no specific guidelines30 regarding choice of discretization method due to disagreement in literature29,34 and lack of nonempirical rationale.35,36

By using highly reproducible and quantitative MR images as the new foundation for image analysis, we hypothesize that the image feature extraction steps can be further optimized to improve image feature reproducibility. In particular, the physical and physiological characteristics of quantitative MRI data can be exploited to improve the consistency of image discretization and reproducibility of downstream radiomic features. For example, quantitative MRI values inherently fall within a certain range determined by the physiological properties of the tissue property of interest: the T1 and T2 relaxation times of biological tissue are by definition nonnegative and lower than the relaxation times of pure water.37 Furthermore, by possessing a reference measurement method such as NMR relaxometry,17,38,39 quantitative MRI values from distinct regions of interest (ROIs), acquisitions, and scanners automatically share the same standard intensity range and are thus directly comparable without need for further preprocessing: this inherent data calibration is impossible for weighted MRI data that require imperfect intensity standardization or harmonization.35,40

In this study, we introduce a novel, physics-informed discretization (PID) method that leverages physical characteristics of quantitative MRI data to improve discretization consistency and radiomic feature reproducibility and robustness. To evaluate the utility of PID, we prospectively acquired multiscanner, scan-rescan data of 3D whole-brain quantitative and weighted MRI from healthy subjects in order to evaluate the radiomic feature interscanner reproducibility of quantitative and weighted MRI sequences. To our knowledge, this study is the first to directly compare the radiomic feature reproducibility of quantitative and weighted MRIs acquired in the same dataset. We first assessed baseline image sequence reproducibility (ie, independent of discretization) as measured by first-order feature reproducibility. We then compared the reproducibility of second-order texture features extracted from quantitative MRI using PID and multiple conventional FBN discretization settings. To measure the potential benefit of using quantitative MRI sequences for radiomics, the reproducibility of texture features extracted from quantitative and weighted MRI sequences was then compared. Lastly, we measured the robustness of PID and FBN discretization to simulated segmentation errors.

MATERIALS AND METHODS Physics-Informed Discretization

Briefly, image discretization can be formalized using the following discretization function28:

I=φI'

where the original image I′ with voxel intensities with range R : [a, b] is transformed using the discretization function φ:

φ:ab→1N

to the discretized image I containing N total gray levels ranging from [1, N]. Weighted MRIs are most commonly discretized using FBN discretization,29,30,41 which assigns image intensities to a predetermined number of gray levels or bin number BN that uniformly span the intensity range R of the original image I′ or ROI. To achieve this, a bin size BS is defined as

BS=RBN

Gray levels in the discretized image I are then generated according to

nk∈1+k−1∗BSk∗BS

where each gray level n(k) contains voxels with values within the defined range. As R varies across ROIs and images, FBN discretization adjusts the bin size BS on an individual ROI and image basis (Eq. 3), which may cause equivalent voxel intensities from distinct image acquisitions being assigned to different gray levels. For example, a voxel with value 100 may be discretized to gray level 5 in ROI 1 and gray level 8 in ROI 2 due to the ROIs having unequal intensity ranges and discretized bin sizes. The reverse situation, where differently valued voxels (eg, 50 and 100) are discretized to the same gray level, is similarly problematic. This can be seen in Figure 1A, where FBN discretization generates overlapping discretized histograms for gray matter (GM) and white matter (WM) ROIs, despite GM and WM having distinct MRI intensities (WM is hyperintense on T1-weighted MRI; T1 values of WM are lower).

FIGURE 1:

Physics-informed discretization (PID) rationale and advantages. A, Conventional fixed bin number (FBN) discretization of magnetic resonance fingerprinting (MRF) T1 gray matter (GM) and white matter (WM) tissues results in distinct T1 values from GM and WM being assigned to equivalently arbitrary gray levels. By assigning specific T1 ranges to discretized bins, PID accurately reflects tissue-specific T1 distribution characteristics, such as the T1 of WM being lower and more uniform compared with that of GM. B, FBN discretization is sensitive to outliers arising from measurement or segmentation error. In this scan-rescan comparison of 2 pons regions of interest (ROIs), the presence of a single outlier (denoted by the yellow arrow) drastically distorts the FBN discretized T1 histogram. In comparison, PID appropriately bins the outlier voxel and retains the shared T1 distribution between ROIs.

The PID leverages characteristics of quantitative MRI in order to determine standardizable image discretization settings. For the sake of clarity, consider the application of PID for MRF T1 map discretization. The quantitative nature of MRF provides information regarding 2 key discretization parameters: intensity range R and bin size BS. First, because MRF measures underlying tissue properties such as T1 or T2 relaxation time, the MRF map intensity range is determined by the physical reference range of the tissue property of interest. For example, T1 relaxation time is always nonnegative and has an absolute upper limit equal to the T1 of pure water (3000 milliseconds at 3 T)37 with standard reference to NMR relaxometry.17,38,39 As a result, MRF map values will always fall within a standardized intensity range spanning a known minimum value (zero for nonnegative properties such as T1 or T2) to the absolute upper measurement value (eg, 3000 milliseconds for T1 acquired at 3T37). Second, based on reported21,22 relative deviations of MRF relaxometry, a standardized bin size can be determined. For effective discretization, the bin size should be determined by the scan sequence's measurement sensitivity. For example, Körzdörfer et al21 reported a relative deviation half-width of 3.4% for interscanner MRF T1 values corresponding to a maximal scan-rescan T1 variation and sensitivity of approximately 20 milliseconds. To avoid discretizing equivalent voxel values to different gray levels based on scan variation or noise, the bin size should be no smaller than the original measurement sensitivity (20 milliseconds for MRF T1). Once a standardized intensity range and bin size are determined, the proper bin number can be calculated. Applying a bin size of 20 milliseconds for MRF T1 discretization in solid tissue results in the following bin number:

BN=RT1BS=300020=150

Importantly, the determined bin number and bin size (BN = 150, BS = 20) are applied for discretization of all MRF T1 maps, independent of individual ROI intensity information and without need for intensity standardization or image harmonization. As a result, every discretized MRF T1 map shares an explicitly defined and identical mapping of original intensity values to discretized gray levels. As shown in this study, the PID method is generalizable for consistent discretization (standardized bin number and bin size) of other quantitative MRI sequences such as MRF T2 and ADC.

The PID offers 2 specific advantages over conventional image discretization, which contribute to more reproducible feature extraction. First, because the mapping of intensity values to gray levels is standardized and independent of individual ROI data, key intensity distribution characteristics, including shape, range, and measures of central tendency, are preserved after discretization. This is clearly seen when comparing PID and FBN discretization of GM and WM ROIs (Fig. 1A) where PID gray level histograms retain distinguishing features between GM and WM: T1 values of WM are lower and more uniform.22 Second, radiomic features extracted using PID are robust to intensity distribution changes. As shown in Figure 1B, an outlier in the pons ROI from a repeated scan completely distorts the FBN gray level histogram. This extreme change results from the increased intensity range due to the outlier, which greatly increased the bin size and led to the majority of voxels being assigned to the first few gray levels. In comparison, the PID gray level histogram is unaffected by the outlier. Because PID discretization settings are determined independently of individual ROI data, the outlier is correctly binned and the overall gray level histogram shape is well preserved.

Study Design

This study aimed to evaluate the reproducibility of quantitative and weighted MRI features obtained from distinct scan acquisitions, scanners, and discretization methods. To accomplish this, a multiscanner, scan-rescan dataset was prospectively acquired comprising both quantitative (MRF T1, MRF T2, and ADC) and weighted MRI sequences (T1w MPRAGE, T2w SPACE, and T2w FLAIR).

The complete study design and workflow is detailed in Figure 2. Scan-rescan quantitative and weighted MRI was acquired from 5 healthy subjects scanned over 3 scanners. Each subject was scanned twice per scanner, for a total of 30 scans (5 subjects × 3 scanners × 2 scans). After a standard image processing workflow consisting of interscanner registration, brain extraction, bias field correction (weighted MRI only), and intensity standardization (weighted MRI only), 56 brain tissue ROIs from the LONI Probabilistic Brain Atlas42 were identified for radiomic analysis. First-order statistical (n = 23) and second-order texture (n = 74) features were extracted from each ROI after PID (for quantitative MRI) and FBN discretization (for quantitative and weighted MRI): a total of 97 radiomic features were analyzed for interscanner reproducibility and segmentation robustness.

FIGURE 2:

Image processing workflow and analysis overview. For each subject, the scan-rescan image dataset including quantitative (MRF and apparent diffusion coefficient [ADC]) and weighted magnetic resonance imaging (MRI) (T1w MPRAGE, T2w SPACE, and FLAIR) data was coregistered to a subject-specific common image space for combined brain extraction. After N4 bias field correction, weighted MRI further underwent min-max and z-score normalization. The SRI24 atlas with corresponding LPBA40 brain tissue ROIs was registered to the common image space for atlas-based segmentation. A total of 56 brain tissue ROIs were used for radiomic analysis.

Our data analysis is organized into 3 steps. We first evaluated first-order feature reproducibility to assess baseline image sequence reproducibility independent of discretization. We then evaluated the effect of PID and FBN discretization on texture feature reproducibility of quantitative MRI. Lastly, we compared the texture reproducibility and segmentation robustness of quantitative and weighted MRI sequences to determine the overall reproducibility and robustness advantage of using quantitative MRI and PID discretization for radiomics.

MRF and MRI Acquisition

Whole-brain 3D MRF and MRI data from 5 healthy subjects (21–25 years old at study start) were acquired using 3 distinct 3 T scanners: 2 MAGNETOM Vida and 1 MAGNETOM Skyra Fit (Siemens Healthcare, Erlangen, Germany). Subjects underwent 2 repeated acquisitions per scanner, for a total of 6 scans per subject (30 total scans). The median time difference between scan-rescan acquisitions was 4 days. All imaging was performed using a 20-channel head coil and took place between December 2021 and June 2022 with written informed consent by local ethical approval. All subjects had no history of neurological or psychological disorders.

Each scan included whole-brain 3D MRF acquisitions along with T1-weighted gradient echo (T1w MPRAGE), T2-weighted fast spin echo (T2w SPACE), T2w FLAIR, and diffusion-weighted imaging (DWI RESOLVE) with calculated ADC maps. As previously described,18,21 an MRF scan applies varying acquisition settings, including variable flip angles and repetition times (TRs), on a steady-state free precession sequence framework.43 The acquisition parameters of 3D MRF were the following: field of view, 300 × 300 × 144 mm3; matrix size, 300 × 300 × 144; spatial resolution, 1.0 × 1.0 × 1.0 mm3; and acquisition time, 5 minutes 31 seconds. Before MRF acquisition, a radiofrequency transmit field (B1+) map was acquired (acquisition time, 50 seconds) to correct for bias from B1+ inhomogeneities.

The T1w MPRAGE acquisition parameters were the following: TR/echo time (TE)/inversion time (TI), 2000/2.48/900 milliseconds; FOV, 300 × 300 × 144 mm3; matrix size, 288 × 288 × 144; spatial resolution, 1.0 × 1.0 × 1.0 mm3; flip angle, 8 degrees; and acquisition time, 5 minutes 44 seconds. The T2w SPACE acquisition parameters were the following: TR/TE, 3200/412 milliseconds; FOV, 300 × 300 × 144 mm3; matrix size, 256 × 256 × 144; spatial resolution, 1.2 × 1.2 × 1.0 mm3; echo train length, 282; echo spacing, 3.46 milliseconds; and acquisition time, 3 minutes 22 seconds. The T2w FLAIR acquisition parameters were the following: TR/TE/TI, 5000/195/1600 milliseconds; FOV, 300 × 300 × 144 mm3; matrix size, 256 × 256 × 144; spatial resolution, 1.2 × 1.2 × 1.0 mm3; and acquisition time, 4 minutes 46 seconds. The DWI RESOLVE acquisition parameters were the following: TR/TE1/TE2, 11,380/77/135 milliseconds; FOV, 300 × 300 × 112 mm3; matrix size, 268 × 268 × 56; spatial resolution, 1.1 × 1.1 × 2.0 mm3; b-values, 0 and 1000 s/mm2; and acquisition time, 4 minutes 12 seconds.

The 6 image acquisition sessions for each subject were performed on different dates: the subject was repositioned each time, and all scanner adjustments were newly set. Each individual image acquisition session was completed within 25 minutes.

Image Reconstruction and Processing

Magnetic resonance fingerprinting images were reconstructed using the nonuniform fast Fourier transform18 with the designed spiral trajectory corrected using a 1-time calibration and a generalized eddy current model. An MRF dictionary was generated through Bloch simulation consisting of fingerprints with T1 values from 10 to 3000 milliseconds and T2 values from 2 to 2000 milliseconds. The following dictionary T1 step sizes were used: 100–1900 milliseconds, 20 milliseconds; 2000–3000 milliseconds, 100 milliseconds. The following dictionary T2 step sizes were used: 2–400 milliseconds, 2 milliseconds; 420–2000 milliseconds, 20 milliseconds. A total of 19,770 dictionary entries, each with 480 time points, were used to simultaneously generate T1, T2, and proton density (PD) maps via template matching.18 Neighborhood quadratic interpolation44,45 was performed during inner product search to eliminate the effect of dictionary step size and generate T1 and T2 maps that were used for downstream analysis. All image reconstruction steps were performed using in-house software implemented in MATLAB R2021b (MathWorks, Natick, MA).

Quantitative MRI (MRF T1, MRF T2, and ADC) along with weighted MRI (T1w MPRAGE, T2w SPACE, and T2w FLAIR) data were processed following the workflow detailed in Figure 2. For each subject, images across all acquisitions were first rigidly coregistered to a subject-specific common image space using the Greedy46 algorithm included in the Cancer Imaging Phenomics Toolkit.47,48 The subject-specific common image space was the native space belonging to the first MRF acquisition (eg, scan 1 from scanner 1). All images were interpolated to 1.0 mm3 isotropic resolution during coregistration. Brain extraction was performed using an in-house tissue class-based segmentation method, and N4 bias field correction49 was applied to weighted MRI. To assess the impact of intensity standardization on texture feature reproducibility, weighted MRI further underwent min-max and z-score normalization.

The SRI24 atlas50 and corresponding LONI Probabilistic Brain Atlas42 brain tissue ROIs were then registered to the common image space and used for atlas-based segmentation.51 To reduce partial volume effects, brain tissue ROIs were eroded (spherical structural element, radius 1) and smoothed by morphological closing (spherical structural element, radius 1) to remove all voxels from a brain tissue ROI that were adjacent to another region. All ROIs were confirmed to have sufficient voxel count (greater than 300) for meaningful radiomic feature extraction. A total of 56 brain tissue ROIs were used for radiomic analysis.

Radiomic Feature Extraction

Three-dimensional radiomic features were extracted using code developed in MATLAB R2021b (MathWorks, Natick, MA) benchmarked to IBSI (Image Biomarker Standardization Initiative) guidelines.30 For each brain tissue ROI, first-order statistical (n = 23) and second-order texture (n = 74) radiomic features were extracted from each image sequence. A list of specific feature names is included in Supplementary Table S1, https://links.lww.com/RLI/A856. First-order features were extracted without image discretization. Texture features were extracted from gray level co-occurrence matrices (GLCMs), gray level run length matrices (GLRLMs), gray level size zone matrices (GLSZMs), neighborhood gray tone difference matrices (NGTDMs), and gray level difference matrices (GLDMs). All texture features were computed from merged 3D texture matrices.30 Texture features from quantitative MRI sequences (MRF T1, MRF T2, and ADC) were extracted using PID and FBN discretization with common bin numbers24,52–56 ranging from 16 to 256 (FBN = 16, 32, 64, 128, and 256), whereas weighted MRI texture features were extracted only with FBN discretization. Feature extraction settings for each discretization method are detailed in Table 1. Definitions for first-order features not defined by IBSI are included in Supplementary Table S2, https://links.lww.com/RLI/A857.

GLCM, gray level co-occurrence matrix; GLRLM, gray level run length matrix; GLSZM, gray level size zone matrix; NGTDM, neighborhood gray tone difference matrix; GLDM, gray level difference matrix; PID, physics-informed discretization; MRF, magnetic resonance fingerprinting; ADC, apparent diffusion coefficient; FBN, fixed bin number.

Simulation of Segmentation Annotation Differences

To evaluate the robustness of PID and FBN discretization to outliers from segmentation variation, the brainstem ROI of 1 subject underwent segmentation simulation. Simulated ROIs were generated by rigidly translating the ground truth ROI by a maximum of 6 mm (taxicab distance) in 3D image space. Simulated ROIs with taxicab distance shifts of 1 mm (n = 6), 3 mm (n = 38), and 6 mm (n = 68) were generated. The brainstem was chosen to simulate outlier inclusion from segmentation differences due to its proximity to surrounding CSF, which has higher T1 and T2 values 37 (eg, hypointense on T1w imaging). Texture features were extracted from ground truth and simulated ROIs using PID for quantitative MRI and FBN discretization for weighted MRI.

FIGURE 3:

Whole-brain scan-rescan comparison between MRF T1 and T1w MPRAGE. A, The scan-rescan intensity variations of MRF T1 and T1w MPRAGE acquired over 3 distinct scanners were compared. All MRF T1 maps shown here are visualized using the same display scale (0–3000 milliseconds), whereas T1w MPRAGE display ranges were determined based on the minimal and maximal values for each image. There is a clear contrast difference between T1w MPRAGE images acquired from different scanners (ie, scanner 1 vs scanner 2), whereas MRF T1 maps show identical overall contrast. B, Six-way histogram intersection shows high agreement in MRF T1 maps (0.87) compared with T1w (0.43), which demonstrates a large scanner-dependent difference.

Statistical Analysis

The agreement of whole-brain scan-rescan intensity distributions was evaluated via histogram intersection,57 with 0 indicating no agreement and 1 indicating complete agreement. The MRF T1 scan-rescan agreement was compared with T1w MPRAGE across acquisitions from 3 scanners (2 MAGNETOM Vida and 1 MAGNETOM Skyra Fit). Interscanner radiomic feature reproducibility was evaluated using the intraclass correlation coefficient (ICC; 2-way random effects; absolute agreement; single rater). Radiomic features obtained from the first image acquisition session (eg, scan 1 from scanner 1) were used as reference values. The ICC of a given radiomic feature was calculated for each subject over all brain tissue ROIs and scan acquisitions: distinct scan acquisitions were considered independent raters of brain tissue ROI features. The ICC distributions were then reported after aggregating the ICC reproducibility values from all subjects. The following ICC reproducibility thresholds were defined in accordance with literature guidelines58: “excellent” (greater than 0.90), “good” (between 0.75 and 0.90), “moderate” (between 0.50 and 0.75), and “poor” (less than 0.50). The effect of image sequence, image discretization method, and simulated segmentation differences on radiomic feature ICC values was assessed using repeated measures analysis of variance. For segmentation robustness analysis, texture features with ICCs greater than 0.75 were determined to be robust to segmentation. To account for multiple comparison testing, Bonferroni correction was applied for all statistical tests; results were considered significant if adjusted P values were below the significance threshold = 0.05. The following significance thresholds were defined: *P < 0.05; **P < 0.01; ***P < 0.001. All statistical analyses were performed in MATLAB R2021b (MathWorks, Natick, MA).

Data Availability

The anonymized volunteer image data analyzed in this study are openly available at https://doi.org/10.5281/zenodo.8234100.

RESULTS Scan-Rescan Intensity Variation in MRF T1 and T1w MPRAGE

Before radiomic feature reproducibility analysis, we investigated scan-rescan intensity variations in quantitative and weighted MRI sequences: MRF T1 and T1w MPRAGE were chosen for direct comparison as image contrast in both is governed by T1 relaxation time. Visual evaluation in Figure 3 demonstrates that all MRF T1 maps show identical overall contrast (displayed using same colormap), whereas there is a clear contrast difference between T1w MPRAGE images acquired on scanners 1 and 2 (MAGNETOM Vida) and those acquired on scanner 3 (MAGNETOM Skyra Fit). This is supported by the high 6-way histogram intersection for whole-brain MRF T1 intensity distributions (0.87) compared with T1w MPRAGE (0.43). Although all MRF T1 intensity histograms show high agreement and overlap, T1w MPRAGE demonstrates a scanner-dependent difference both between similar (scanners 1 and 2) and different (scanner 3) scanner hardware.

FIGURE 4:

Reproducibility of first-order features from quantitative and weighted MRI. A, First-order feature ICCs measured over the entire study population and all scans (5 subjects, 30 total scans) for each image sequence are shown in box plots with individual outlier features marked by circles. Quantitative MRI ICC distributions were significantly higher than all weighted MRI ICCs. B, The reproducibility of individual first-order features are shown via heat map for each image sequence. The MRF T1 had the highest number (n = 10/23, 44%) of excellently reproducible (ICC > 0.90) first-order features followed by ADC (n = 4/23, 17%).

Reproducibility of First-Order Features From Quantitative and Weighted MRI

Figure 4A shows the first-order feature ICC distributions for each image sequence, computed over the entire study population (30 total scans) and across all ROIs (56 total ROIs). The median (interquartile range [IQR]) first-order reproducibility ICCs for MRF T1, MRF T2, and ADC were 0.90 (0.87–0.93), 0.79 (0.70–0.86), and 0.88 (0.78–0.95) compared with T1w MPRAGE, T2w SPACE, and FLAIR ICCs of 0.56 (0.27–0.69), 0.53 (0.19–0.68), 0.43 (0.13–0.76), respectively. First-order features were more reproducible (Bonferroni-adjusted P < 0.05) in quantitative MRI than weighted MRI, with MRF T1 first-order features being most reproducible overall.

FIGURE 5:

Impact of image discretization method on reproducibility of texture features extracted from quantitative MRI. Gray level co-occurrence matrix (GLCM) texture feature ICCs measured for each quantitative MRI sequence using both PID and FBN discretization are shown in box plots. All noted statistical comparisons were performed between an image's PID feature ICC distribution and FBN feature ICC distributions. For both MRF T1 and MRF T2, PID yielded the most reproducible GLCM features, whereas ADC GLCM feature reproducibility was not significantly different between PID and FBN discretization with the exception of FBN 16 (less reproducible).

The reproducibility of individual first-order features for each image sequence is visualized using the heat map in Figure 4B, with darker colors indicating higher ICC. The MRF T1 had the highest number (n = 10/23, 44%) of excellently reproducible (ICC > 0.9) first-order features followed by ADC (n = 4/23, 17%). The following features (n = 7/23, 30%) were more reproducible in MRF T1 and MRF T2 than in all weighted MRI sequences: maximum, mean, median, 90th percentile, robust mean, robust median, and root mean square, whereas the following features (n = 4/23, 17%) were more reproducible in ADC than weighted MRI: maximum, mean, 90th percentile, and root mean square.

Impact of Physics-Informed Discretization on Reproducibility of Texture Features From Quantitative MRI

The reproducibility of texture features (n = 74) extracted from quantitative MRI using PID and multiple FBN discretization settings (FBN = 16, 32, 64, 128, and 256) was compared. As a representative analysis, the ICC distribution of GLCM texture features (n = 23) is shown for each quantitative MRI sequence in Figure 5: the ICC distributions of GLRLM, GLSZM, NGTDM, and GLDM texture features are included in Supplementary Figure S1, https://links.lww.com/RLI/A858. All statistical comparisons in Figure 5 were performed between an image sequence's PID texture feature ICCs and FBN texture feature ICCs. The median (IQR) reproducibility ICCs of GLCM features extracted using PID were 0.91 (0.85–0.95), 0.89 (0.84–0.92), and 0.83 (0.77–0.91) for MRF T1, MRF T2, and ADC, respectively. In comparison, the highest median ICCs extracted using FBN discretization (across all tested FBN settings) were 0.84 (0.77–0.88), 0.81 (0.76–0.86), and 0.82 (0.75–0.86). For both MRF T1 and MRF T2, GLCM features extracted using PID were significantly more reproducible than features extracted using FBN (P < 0.001), whereas ADC GLCM feature reproducibility was not significantly different between PID and FBN discretization (P = 0.53–1.37) with the exception of FBN 16 (PID reproducibility higher; P < 0.001). The magnitude of improved feature reproducibility using PID, measured by median ICC change, was larger for MRF T1 (0.07) and MRF T2 (0.08) than ADC (0.01). Physics-informed discretization yielded a much higher number of excellently reproducible (ICC > 0.9) GLCM features for MRF T1 (n = 69/115, 60%), MRF T2 (n = 40/115, 35%), and ADC (n = 30/115, 26%) than FBN discretization: across all evaluated FBN settings, the highest number of excellently reproducible features was MRF T1 (n = 18/115, 16%), MRF T2 (n = 6/115, 5%), and ADC (n = 18/115, 16%).

FIGURE 6:

Weighted MRI texture feature reproducibility extracted across discretization and intensity standardization strategies. A, GLCM feature ICCs measured for each weighted MRI sequence across all tested FBN discretization settings are shown in box plots. All noted statistical comparisons were performed between an image's FBN 64 feature ICC distribution and all other FBN feature ICC distributions. FBN 16 features had uniformly poor texture reproducibility for T1w MPRAGE, T2w SPACE, and FLAIR; overall, no statistically significant trend was observed between increasing FBN discretization and texture reproducibility across all weighted MRI sequences. B, GLCM texture feature ICCs for each weighted MRI sequence extracted before and after intensity standardization are shown in box plots. All features were extracted using FBN 64. Neither min-max nor z-score normalization significantly improved overall feature reproducibility for any weighted MRI sequence.

Weighted MRI Texture Feature Reproducibility

The reproducibility of weighted MRI (T1w MPRAGE, T2w SPACE, and FLAIR) texture features extracted across FBN discretization settings (FBN = 16, 32, 64, 128, and 256) was similarly evaluated, with the ICC distribution of GLCM texture features displayed via box plot in Figure 6A. All noted statistical comparisons in Figure 6A were performed between an image's FBN 64 feature ICC distribution and all other FBN feature ICC distributions: FBN 64 was chosen for baseline comparison as it represented an intermediate and commonly used bin count.28,29,35 For T1w MPRAGE, there was no significant difference (P = 0.17–1.11) between FBN 64, FBN 32, and FBN 128 ICCs, whereas FBN 64 ICCs were significantly higher (P < 0.001) compared with FBN 16 ICCs and lower (P < 0.01) than FBN 256 ICCs. No difference was observed for T2w SPACE FBN 64, FBN 128, and FBN 256 ICCs (P = 6.21–7.25), whereas FBN 64 ICCs were higher than FBN 16 (P < 0.001) and FBN 32 ICCs (P < 0.05). For FLAIR, FBN 64 ICCs were higher than FBN 16 ICCs (P < 0.01): no differences (P = 0.47–11.56) were observed across all other FBN ICCs. For all weighted MRI sequences, no significant difference in FBN 64 and FBN 128 feature reproducibility was observed, whereas FBN 16 features were less reproducible than all other FBN discretization settings. Across all tested FBN settings, the highest median ICCs for T1w MPRAGE (FBN 256), T2w SPACE (FBN 256), and FLAIR (FBN 32) were 0.85 (0.78–0.89), 0.83 (0.80–0.88), and 0.80 (0.62–0.85), respectively.

FIGURE 7: Texture feature reproducibility comparison between quantitative and weighted MRI. Texture feature reproducibility ICCs from each image sequence are directly compared using thermometer plots, with each bar representing the proportion of features within each ICC reproducibility range (excellent, good, moderate, and poor). For each image sequence, the best performing discretization setting that resulted in the greatest number of highly reproducible (ICC > 0.9) features was chosen for comparison (Table 2). For all quantitative MRI sequences (MRF T1, MRF T2, and ADC), PID resulted in the greatest number of highly reproducible texture features. Across all evaluated texture features, MRF T1 had the most excellently reproducible features (n = 225/370, 61%). In comparison, fewer than 30% of FLAIR (n = 102/370, 28%) and T2w SPACE (n = 98/370, 27%) texture features were highly reproducible across all tested FBN settings.

Next, the effect of 2 common intensity standardization techniques, min-max and z-score normalization, on weighted MRI texture feature reproducibility was evaluated: because quantitative MRI values are on an absolute measurement scale, they were excluded from analysis. As shown in Figure 6B, neither min-max nor z-score normalization significantly changed T1w MPRAGE (P = 5.70–5.79), T2w SPACE (P = 2.11–5.46), or FLAIR (P = 5.52–5.86) feature reproducibility. As neither min-max nor z-score normalization improved feature reproducibility, downstream analyses were performed using weighted MRI radiomic features extracted without intensity standardization.

Comparison of Texture Feature Reproducibility Extracted From Quantitative and Weighted MRI

To directly compare the overall texture feature reproducibility of quantitative and weighted MRI, reproducibility ICCs from the best performing discretization setting were evaluated for each image sequence. Table 2 summarizes the discretization setting that resulted in the greatest number of excellently reproducible (ICC > 0.90) features across all evaluated texture families (GLCM, GLRLM, GLSZM, NGTDM, and GLDM) for each image sequence. As reported previously, PID yielded the highest number of excellently reproducible texture features for all quantitative MRI sequences (MRF T1, MRF T2, and ADC). For weighted MRI sequences, T1w MPRAGE and FLAIR features were most reproducible with FBN 256 discretization, whereas FBN 64 resulted in the most reproducible T2w SPACE features.

TABLE 2 - Best Performing (Most Reproducible) Discretization Settings for Each Image Sequence Discretization Setting GLCM (n = 115) GLRLM (n = 80) GLSZM (n = 80) NGTDM (n = 25) GLDM (n = 70) MRF T1 PID 69 (60%) 54 (68%) 38 (48%) 16 (64%) 48 (69%)

View original article

INVESTIGATIVE RADIOLOGY

分享书签

0 0 0 0 0 0 0

More from this channel

Physics-Informed Discretization for Reproducible and Robust Radiomic Feature Extraction Using Quantitative MRI

留言 (0)