Multimodal tract-based MRI metrics outperform whole brain markers in determining cognitive impact of small vessel disease-related brain injury

Study sample

The data included in this study include the UMC Utrecht participants of the TRACE-VCI study (Boomsma et al. 2017), a cohort of patients visiting the memory clinic with cognitive complaints and visible vascular lesions on their brain MRI. During their assessment at the memory clinic, participants underwent a 3 Tesla MRI scan with a standardized protocol including T1-weighted imaging with resolution 1 × 1 × 1 mm3, a fluid-attenuated inversion recovery (FLAIR) acquisition with resolution 0.96 × 0.96 × 3.00 mm3, and a diffusion MRI scan with resolution 2.5 mm3 isotropic including 45 gradient directions at b = 1200 s/mm2 in addition to 1 b = 0 s/mm2 averaged three times. Next to MRI, all participants underwent a standardized neuropsychological evaluation to assess their cognitive status. The study was approved by the institutional review board of the UMC Utrecht. All patients provided informed consent prior to research-related procedures.

The severity of WMH burden was rated with the Fazekas score only considering deep and not periventricular lesions, as follows: 0 = absence, 1 = punctate foci, 2 = beginning confluence of foci, 3 = large confluent areas. Out of the 196 available subjects, we selected only patients exhibiting manifestations of cSVD, which was operationalized as having a Fazekas score ≥ 2, or presence of (small) subcortical or lacunar infarcts. Patients were excluded in presence of infarct(s) or hemorrhage(s) with volume above 4.2 mL (i.e., the equivalent of a spherical lesion with a diameter > 2 cm) or incidental findings (i.e., brain cancer, cysts) on MRI affecting analyses. This arbitrary volume cut-off was primarily used because larger lesions by themselves more likely affect cognition.

Of the 116 subjects selected with the abovementioned criteria, 14 were further discarded because of incomplete cognitive assessment (n = 11) or poor MRI quality (n = 3), resulting in the final selection of 102 subjects reported in Table 1. A flow chart summarizing our inclusion and exclusion criteria is shown in the Supplementary Material, Figure S1.

Table 1 Demographic characteristics, clinical diagnosis, and cognitive evaluation of the study participantsCognitive performance

A detailed explanation of the cognitive evaluation of the study sample can be found in a previous work (Boomsma et al. 2017). In short, level of education was defined according to a 7-point rating scale [Verhage scale (Verhage 1964) 1–7; low to high education]. Cognition was first screened with the Dutch version of the Mini Mental State Examination (MMSE, max. score 30). The severity of cognitive symptoms was assessed with the Clinic Dementia Rating score (CDR, 0–3). Patients received a multidomain cognitive assessment. For the present study, we considered the domains memory and processing speed.

The domain memory was assessed by the Dutch version of the Rey Auditory Verbal Learning Test (RAVLT). For the RAVLT, the total number of words remembered in five learning trials was recorded and the delayed recall and recognition tasks were used. Furthermore, the Visual Association Test (VAT) part A was included to assess visuospatial association learning.

The domain information processing speed was assessed by the Trail Making Test Part A (TMTA-A), the Stroop Color Word Test I and II, and the Digit Symbol-Coding Test (DSCT) of the WAIS-III or the Letter Digit Substitution Test (LDST). Z-scores were created for each individual test (reversed Z-scores for the TMT and Stroop Color Word Test).

Individual test scores of all subjects were transformed to z-scores, e.g., subtracting the average and dividing by the standard deviation of all subjects, then averaged to create domain Z-scores. Accordingly, a z-score equal to 0 indicates the average cognitive value in the whole cohort, and not an intact average cognitive score.

Participants were clinically classified as follows:

No objective cognitive impairment, when having cognitive complaints but no objective impairment on neuropsychological testing.

Mild cognitive impairment (MCI), when observing deterioration in cognitive function as compared to a previous time point, and objective impairment in at least one cognitive domain.

Dementia, when observing objective impairment in two or more cognitive domains. Dementia was further classified based on its main etiology using internationally established criteria in the following subtypes: vascular (Roman et al. 1993), Alzheimer’s disease (McKhann et al. 1984), other neurodegenerative etiology (McKeith et al. 2005; Rascovsky et al. 2011), or unknown origin.

Data processing

MRI data were processed with an automated pipeline based on CAT12 (http://www.neuro.uni-jena.de/cat/), ExploreDTI (Leemans et al. 2009) and the in-house developed toolbox “MRIToolkit” (Guo et al. 2020) (https://github.com/delucaal/MRIToolkit).

T1-weighted and FLAIR images were processed with CAT12 to derive automatic segmentations of white matter, gray matter, cerebrospinal fluid and white matter hyper-intensities (Tohka et al. 2004). All segmentations were individually inspected to ensure they were of sufficient quality and did not contain major errors. Additionally, the cortical thickness (CTH) (Yotter et al. 2011; Dahnke et al. 2013) was evaluated. Next, micro-bleeds, infarcts, and hemorrhages were evaluated by a trainer rater using the FLAIR images as previously described (Boomsma et al. 2020).

dMRI data were corrected for signal drift (Vos et al. 2016) and Gibbs’ ringing (Perrone et al. 2015), then motion, Eddy currents and echo-planar-imaging (EPI) corrections with b-matrix rotation were performed in one step. The latter step was performed using the T1-weighted image resampled to 2 × 2 × 2 mm3 as target. Next, a robust fit of the diffusion tensor was performed with REKINDLE (Tax et al. 2015). Visual inspection was performed to effectiveness of motion correction and registration to the T1-weighted image, as well as the presence of major data artifacts in the DTI fit residuals.

Constrained spherical deconvolution (CSD) (Tournier et al. 2007) was performed using spherical harmonics of order 6 and recursive calibration of the response function (Tax et al. 2014) to determine the fiber orientation distribution, then deterministic fiber tractography was applied using each brain voxel as a seed, with angle threshold 30°, step size 1 mm. Streamlines shorted than 30 mm or longer than 500 mm were discarded (default values in ExploreDTI). Subsequently, the white matter analysis clustering approach (Zhang et al. 2018) was applied to automatically reconstruct 73 brain tracts based on known anatomy. A list of the reconstructed tracts and their abbreviation is reported in Supplementary Material Table S1. Spatial probability maps of the reconstructed tracts in the whole dataset are reported in Supplementary Information Videos S1–S3.

Study design

First, we aimed to characterize the maximum amount of variance that conventional linear models can explain at the group level. To this end, we used linear regression to characterize the amount of variance in cognitive scores (i.e., information processing speed, memory performance) explained by models of increasing complexity considering (1) demographics only (i.e., age, sex, level of education); (2) demographics + whole brain markers from sMRI (i.e., WMH burden, brain parenchymal fraction (BPF), presence of lacunes, presence of micro-bleeds); (3) model 2 + the average value of a diffusion metric (i.e., FA or MD or PSMD) in the whole WM. In this analysis, all subjects were used simultaneously (N = 102).

Subsequently, we evaluated two methods to predict individualized cognitive function using a leave-one-out validation strategy, which is an arguably harder task than regression and allows to evaluate the generalizability of a prediction model to unseen data. This implies that the prediction was performed 102 times, removing 1 subject each time and re-training the prediction on the remaining 101 subjects. Furthermore, in the supporting information we further evaluated the generalizability of the method by repeating the prediction with a leave-5-out cross-validation scheme.

The first prediction method is based on linear models —the current standard in cSVD— and serves as prediction benchmark. The same metrics considered for linear regression (demographics + whole brain markers from sMRI and dMRI) were considered as input for this prediction strategy. For each input metric, we evaluated its standardized coefficient, its significance (p-value) and the amount of variance it explained as quantified by the R-squared (R2). In the supporting information, we also report an evaluation of the prediction performance of linear models when considering tract-based metrics as input.

The second prediction strategy is based on a novel tract-based ANN to predict individualized cognitive function, and is presented in the following section. To compare our proposed strategy to the benchmark, we evaluated the mean absolute error (MAE) of the prediction and its R2 value. To assess whether tract-based ANN significantly predicted cognitive performance better than conventional models, F-tests were performed. Because conventional F-tests weight the residuals sum of squares (RSS) by the number of parameters, they are unsuited for evaluating ANNs because of their large number of parameters. Accordingly, we applied a modified F-test considering the number of input predictors #K in place of the number of parameters, as follows:

$$F = }_}}} - }_}}} }}}}} - \# K_}}} }}}$} \!\mathord}_}}} - }_}}} }}}}} - \# K_}}} }}} }_}}} }}}} - \# K_}}} }},}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox}_}}} }}}} - \# K_}}} }},}$}}$$

where NS is the number of subjects (102).

Tract-based ANN predictionANN features sampling

In our prediction framework, we integrated multi-modal MRI metrics sampled both at the whole brain level and in 73 WM tracts, as depicted in Fig. 1. At the whole brain level, we considered the same markers used as input for linear prediction. Additionally, we considered the average CTH, and the mean squared error of the DTI fit residuals, which informs on both data quality and appropriateness of the model. Residuals assume high values in presence of outliers in the data, but also in case of non-Gaussian diffusion effects in the data (van Rijn et al. 2020) owing to, for example, microstructural alterations (Jensen et al. 2005; Goghari et al. 2021).

Fig. 1figure 1

An overview of the framework used in this work. Multi-modal metrics computed from the diffusion tensor (FA, MD, PSMD, RESIDUALS), T1-weighted imaging (CTH) and FLAIR (WMH) are derived at (i) the whole brain level and ii) for each major white matter tracts of the 73 obtained with an automatic tractography clustering method. The considered measures are used as input to a linear multivariate prediction model and an artificial neural network (ANN) with leave-one-out cross-validation

To extract tract-specific metrics, the volumetric representation of the streamlines of each white matter tract was derived and used as a region of interest. For each tract, we computed the average FA, MD, WMH burden volume (WMHV) and DTI fit residuals without distinguishing between WM and GM. Next, we evaluated the peak width of mean Diffusivity (PWD) for each tract in analogy to the whole brain PSMD, e.g., by determining the difference between the 75th and 25th percentile of MD within each tract mask. Additionally, the average CTH of each tract was calculated as the average thickness of the cortex adjacent to a WM tract.

All metrics were transformed to Z-scores as common practice in machine learning (More et al. 2021) to optimize their use as predictors in subsequent analyses.

ANN architecture

Our ANN framework consists of a feed-forward network with 20 nodes and 1 hidden layer (empirically chosen). The input of each layer was normalized (“BatchNorm1d”), and the non-linear rectified linear unit function (ReLU) was included as activation function between each layer. The network was implemented in Python using the PyTorch library and trained with the ADAM optimizer using the mean squared error cost function with L1 penalty (“L1Lasso”). The learning rate was empirically set to 0.01 after experimentation in the range 0.0001–0.1, and a dropout rate equal to 30% was used. The training dataset (N = 101) was split in a training (90%) and validation set (10%) to implement an early stopping strategy, e.g., to interrupt the training once the error in the validation set increases during training. The minimum number of training epochs was 30, and the maximum 300. For each subject, the training and prediction were repeated 30 times to account for non-deterministic processes in ANNs, then the median of all predicted values was taken as final prediction.

ANN features selection

We designed a feature selection strategy based to integrate multimodal MRI metrics and predict cognitive performance while minimizing potential risks of over-fit. An overview of our strategy is presented in Fig. 1.

To reduce the number of metrics to be considered for ANN feature selection, we implemented a first filtering step based on linear prediction. To this end, we repeatedly performed a leave-one-out prediction using a single metric (WMHV, FA, MD, PWD, residuals, CTH) sampled for all 73 WM tracts. For each metric, the 7 most significant tracts (e.g., 10% of the total) and their contralateral pathways were selected for the next phase. For the prediction of memory, the superior longitudinal fasciculus and the frontal-thalamic projections were additionally included if not selected at the previous stage, given their previously reported relevance in memory-related tasks (Bolkan et al. 2017; Biesbroek et al. 2018).

Once a set of candidate tracts was determined, these were given as input to an iterative ANN optimization procedure repeated 10 times on random subsets of 51 subjects (50%). The procedure determined the optimal combination of features to predict cognition in the given random subset with a bottom-up strategy. At the first iteration, age and education are the only predictors. Subsequently, the procedure evaluates which of the available metrics improves the prediction performance (R2) in the random subset and adds it to the predictors list. Given the aleatory nature of ANN, each prediction was repeated three times and the average prediction considered as outcome. The procedure continued until the prediction performance did not further improve.

The feature selection procedure was repeated 10 times to obtain the candidate predictors. We then evaluated the final performance of the ANN at predicting processing speed and memory performance in the complete dataset using (1) the features corresponding to the feature selection iteration achieving the highest R2, and (2) all candidate predictors determined in the 10 repetitions.

留言 (0)

沒有登入
gif