J. Imaging, Vol. 8, Pages 303: Harmonization Strategies in Multicenter MRI-Based Radiomics

Image preprocessing is a significant part in the analysis pipeline that induces increased levels of variability in radiomics. It can be broken down into four consecutive steps: (i) interpolation, (ii) bias field correction, (iii) normalization, and (iv) discretization. Each step consists of various parameters with a considerable impact on the robustness and the absolute values of the features (e.g., distortions/non-uniformities attenuation due to noise, gray-scale and pixel size standardization) [2,20]. Indicatively, image preprocessing exhibited a significant impact in MRI studies when phantom [20] and glioblastoma data [21,53] were utilized to improve radiomics stability. From the technical perspective, image preprocessing is usually performed in computing environments like Python and Matlab and is often combined with software packages for radiomics features extraction (PyRadiomics, CERR, IBEX, MaZda and LIFEx) [2,48]. 2.2.1. InterpolationImage interpolation, divided into upsampling and downsampling, manages to resize an image from an original pixel grid to an interpolated grid. Several interpolation algorithms are commonly used in the literature, including among others the nearest neighbor, trilinear, tricubic convolution and the tricubic spline interpolation. A thorough review is given in [54]. According to the image biomarker standardization initiative (IBSI) guidelines [54], interpolation is a prerequisite in radiomics studies since it enables texture features extraction from rotationally invariant three-dimensional (3D) images [54]. In addition, it ensures that spatially-related radiomics features (e.g., texture features) will be unbiased, especially in the case of MRI where images are often non-isotropic [2]. To this end, there is evidence that interpolating images in a consistent isotropic voxel space can potentially increase radiomics reproducibility in multicenter studies (e.g., CT and PET studies have shown dependencies between feature reproducibility and the selected interpolation algorithm [55,56,57,58]). This was also shown in [53] where three distinct measures assessed the impact of image resampling to reduce voxel size variations from images acquired at 1.5 T and 3 T; these measures were based on: (i) differences in feature distribution before and after resampling, (ii) a covariate shift metric, and (iii) overall survival prediction. To this end, isotropic resampling through a linear interpolation decreased the number of features dependent on the different magnetic field strengths from 80 to 59 out of 420 features, according to the two-sided Wilcoxon test. However, models deployed from the resampled images failed to discriminate between high and low overall survival (OS) risk (p-value = 0.132 using cox proportional hazards regression analysis). Phantom and brain MRI were isotropically resampled to voxels of length 1 mm in a study where combinations of several preprocessing and harmonization processes were applied to compensate for differences in the acquisition settings (magnetic field strength and image resolution) [22]. Radiomics reproducibility was evaluated on a feature-level using the “DiffFeatureRatio”, calculated as the ratio of the radiomics features with a p-value of less than 5% (different feature distributions due to scanner effects) to the overall radiomics features. In most of the cases, the isotropic voxel spacing caused a decrease in the “DiffFeatureRatio”, and a reduction of the scanner effect impact. However, still no clear recommendation can be made about the most effective interpolation technique in multicenter MRI radiomics [54]. In addition, although isotropic interpolation enables radiomics feature extraction in the 3D domain, a per slice (2D) radiomics analysis is recommended when slice thickness is significantly larger than the pixel size of the image (e.g., slice thickness of 5 mm and a pixel size between 0.5 and 1 mm) [6]. 2.2.2. Bias Field CorrectionBias field is a low frequency signal that may degrade the acquired image [59] and lead to an inhomogeneity effect across the acquired image. Its variation degree differs not only between clinical centers and vendors but also at the patient level even when a single vendor or acquisition protocol is used. Bias field correction can be implemented using Gradient Distribution Based methods [60], Expectation Maximization (EM) methods [61] and Fuzzy C-Means based [62]. However, N4 Bias Field Correction [63], an improved version of the N3 Bias Field Correction [64], has been one of the most successful and widely used techniques. It has been used extensively in various anatomical sites (e.g., brain tumor segmentation [65] and background parenchymal enhancement [66]) and the reason for its success is that it allows faster execution and a multiresolution scheme that leads to better convergence compared to N3 [63]. To this direction, a multicenter MRI study showed that when N4 bias field correction was applied prior to noise filtering, an increase in the total number of the reproducible features was achieved according to the concordance correlation coefficient (CCC), dynamic range (DR), and intra-class correlation coefficient metric (ICC) [21]. Indicatively, in the case of necrosis, the number of robust features was increased when radiomics was performed on the bias field corrected images rather than on the raw imaging data (32.7%, CCC and DR ≥ 0.9). Interestingly, when bias field correction was applied prior to noise filtering, the necrotic regions of the tumor had the highest number of extremely robust features (31.6%, CCC and DR ≥ 0.9). Another study explored stability of radiomics features with respect to variations in the image acquisition parameters (time of repetition and echo, voxel size, random noise and intensity of non-uniformity) [20]. MRI phantoms represented an averaging of 27 co-registered images of real patients (i.e., with different image acquisition parameters); and features with an ICC higher than 0.75 were reported as stable. The study showed that N4 Bias Field Correction, coupled with a common isotropic resolution resampling, had a significant impact on radiomics stability (particularly on first-order and textural features). Conclusively, Bias Field Correction is strongly recommended as a pre-processing step in multicenter studies. 2.2.3. Intensity NormalizationTo compensate for scanner-dependent and inter-subject variations, signal intensity normalization has been deployed to change the range of the signal intensity value within the ROI. This is achieved by calculating the mean and the standard deviation of the signal intensity gray-levels within the predefined ROI, or by transforming the ROI histogram to match a reference signal intensity 1-dimensional histogram [16,22,23,24,25,26,27,67]. The importance of intensity normalization was emphasized in the literature but no principal guidelines have been established yet. On the other hand, seven principles for image normalization (a.k.a SPIN) were proposed [68] in order to produce intensity values that: (i) have a common interpretation across regions within the same tissue type, (ii) are reproducible, (iii) maintain their rank, (iv) share similar distributions for the same ROI within and across subjects, (v) are not affected by biological abnormalities or population heterogeneity, (vi) are minimally sensitive to noise and artifacts, and (vii) do not lead to loss of information related to pathology or other phenomena. Noting that Inormx and Ix are the intensities of the normalized and the raw MRI respectively, the most commonly used intensity normalization techniques are outlined below (publicly available repositories are summarized in Table 2).Z-score normalizes the original image Ix by centering the intensity distribution at a mean μ of 0 and a standard deviation σ of 1 [68]. Computationally, Z-score is not time consuming and can be applied easily by subtracting the mean intensity either of the entire image or a specific ROI from each voxel value, followed by dividing the result by the corresponding standard deviation [26]. WhiteStripe is a biologically driven normalization technique, initially deployed in brain radiomics studies, which applies a Z-score normalization based on the intensity values of the normal-appearing white matter (NAWM) region of the brain [68]. The NAWM is used as a reference tissue, since it is the most contiguous brain tissue and is, by definition, not affected by pathology (leading to conformity to SPIN 5) [68]. To this end, WhiteStripe normalizes the signal intensities by subtracting the mean intensity value of the NAWM μ from each signal intensity Ix, and dividing the result by the standard deviation of the NAWM σ.Min–Max standardizes the image by rescaling the range of values to [0, 1] using the Equation 2, where the minx and maxx are the minimum and the maximum signal intensity values per patient, respectively [26].

Inormx=Ix−minxmaxx−minx

(2)

Normalization per healthy tissue population is performed when the signal intensity values of a given image are divided by the mean intensity value of the healthy tissue (e.g., adipose tissue or muscle in musculoskeletal imaging) [26].

Inormx=Ixmeanxhealtytissue

(3)

Fuzzy C-means (FCM) uses fuzzy c-means to calculate a specified tissue mask (e.g., gray matter, white matter or the cerebrospinal fluid) of the image [27]. This mask is then used to normalize the entire image based on the mean value of this specified region. The method procedure is based on the following Equation 4 where c∈R>0 is a contrast that determines the specified tissue mean after normalization.

Inormx=c ⋅ Ix μ

(4)

Gaussian mixture model (GMM) assumes that: (i) a certain number of Gaussian distributions exist in the image, and (ii) each distribution represents a specific cluster [27]. Subsequently, GMM clusters together the signal intensities that belong to a single distribution. Specifically, GMM attempts to find a mixture of multi-dimensional Gaussian probability distributions that best model a histogram of signal intensities within a ROI. The mean of the mixture component, associated with the specified tissue region, is then used in the same way as the FCM-based method according to 4 with a constant c∈R>0.Kernel Density Estimate (KDE) estimates the empirical probability density function (pdf) of the signal intensities of an image I over the specified mask using the kernel density estimation method [27]. The KDE of the pdf for the signal intensity of the image is then calculated as follows:

p^x=1N⋅M⋅L⋅δ∑i=1Ν⋅Μ⋅LKx−xiδ,

(5)

where x is the intensity value, K is the kernel (usually a Gaussian kernel), and δ is the bandwidth parameter which scales the kernel K. The kernel density estimate provides a smooth version of the histogram which allows us to robustly pick the maxima associated with the reference mask via a peak finding algorithm. The peak ρ is then used to normalize the entire image, in the same way the FCM does. Specifically,

Inormx=c ⋅ Ix ρ

(6)

where the c∈R>0 is a constant that determines the reference mask peak after the normalization.Histogram-matching is proposed by Nyul and Udupa to address the normalization problem by first learning a standard histogram for a set of images and then mapping the signal intensities of each image to this specific histogram [32,70]. The standard histogram learns through averaging pre-defined landmarks of interest (i.e., intensity percentiles at 1, 10, 20, …, 90, 99 percent [32]) of the training set. Then, the intensity values of the test images are mapped piecewise and linearly to the learned standards histogram along the landmarks.Ravel (Removal of Artificial Voxel Effect by Linear regression) is a modification of WhiteStripe [69]. It attempts to improve the White Stripe by removing an unwanted technical variation, e.g., scanner effects. The Ravel normalized image is defined as:

Iravelx=IWSx−γxΖΤ

(7)

where IWS is the WhiteStripe normalized image, γxΖΤ represents the unknown technical variation and γx are the coefficients of unknown variations associated with voxel x.Several multicenter studies examined the influence of signal intensity normalization in MRI radiomics variability. Fortin et al. compared Ravel, Histogram Matching and WhiteStripe normalization methods using T1-weighted brain images [70]. Ravel had the best performance in distinguishing between mildly cognitively impaired and healthy subjects (area under the curve—AUC= 67%) compared to Histogram Matching (AUC = 63%) and WhiteStripe (AUC = 59%). Scalco et al. evaluated three different normalization techniques, applied to T2w-MRI before and after prostate cancer radiotherapy [43]. They reported that, based on the ICC metric, very few radiomics features were reproducible regardless of the selected normalization process. Specifically, first-order features were highly reproducible (ICC = 0.76) only when intensity normalization was performed using histogram-matching. A brain MRI study reported that Z-score normalization, followed by absolute discretization, yielded robust first-order features (ICC & CCC > 0.8) and increased performance in tumor grading prediction (accuracy = 0.82, 95% CI 0.80–0.85, p-value = 0.005) [16]. A radiomics study in head and neck cancer explored the intensity normalization effect in: (i) an heterogeneous multicenter cohort comprising images from various scanners and acquisition parameters, and (ii) a prospective trial derived from a single vendor with same acquisition parameters [26]. Statistically significant differences (according to Friedman and the Wilcoxon signed-rank test) in signal intensities before and after normalization were only observed in the multicenter cohort. Additionally, Z-Score (using ROI) and histogram matching performed significantly better than Min–Max and Z-Score, when the entire image was used. This indicates that the addition of a large background area to the Z-score calculations can adversely affect the normalization process. Intensity normalization should also be performed cautiously in cases where a healthy tissue is delineated as the reference ROI (e.g., WhiteStripe where Z-score normalization is based on the NAWM) since significant changes can occur in this area from pathological tissue changes and/or after treatment (e.g., structural and functional changes from a radiation therapy) that can potentially alter the signal intensity values within the reference ROI [26]. Summing up, to the best of our knowledge, there is no clear indication whether to use intensity normalization with or without a reference tissue. On the one hand, normalization like the Z-Score is simple to implement, as it requires only the voxels within the ROI. On the other hand, WhiteStripe and its modifications can potentially perform better when the reference ROI is accurately segmented and the corresponding areas is known to be unassociated with disease status and or other clinical covariates.Deep Learning (DL) methods (Table 2) are also used, in lieu of the well-known interpolation methods for image normalization [4]. These methods rely on Generative Adversarial Networks (GAN) [37,41] and Style Transfer techniques (ST) [30,74]. When it comes to the GANs, the idea is to construct images with more similar properties so that the extracted radiomics features can be comparable. Despite their novelty, the phenomenon of disappearing gradients makes GAN training a challenging process because it slows down the learning process in the initial layers or even stops completely [19]. Furthermore, GANs are also prone to generate images with similar appearance as an effect of mode collapse which occurs when the generator produces only a limited or a single type of output to fool the discriminator [19]. Due to this, the discriminator does not learn to come out of this trap, resulting in a GAN failure. Last but not least, GAN-based models can also add unrealistic artifacts in the images.A solution to this problem is Style Transfer where two images called Content Image (CI) and Style Image (SI) are used to create a new image that has the content of CI rendered according to the style of SI. This can help to overcome scanner acquisition and reconstruction parameter variability. The Style Transfer approach is used for image harmonization by either image-to-image translation or domain transformations [30] (Table 2), [74,75]. Although it can be achieved via Convolutional Neural Networks (CNNs) [76], there are other choices for style transfer such as GANs, used for PET–CT translation and MRI motion correction [42]. Due to the aforementioned disadvantages of GANs, which (in the case of vanishing gradients) can be unpredictable, the use of Style Transfer with CNNs is recommended.

留言 (0)

沒有登入
gif