CSF protein ratios with enhanced potential to reflect Alzheimer’s disease pathology and neurodegeneration

Sample collection

The discovery cohort included patients from the Karolinska University Hospital Medical Unit Aging Memory clinic (GEDOC database and biobank) in Solna, examined between 2019–2021. The cohort consisted of 241 individuals diagnosed with probable Alzheimer’s disease (n = 44), mild cognitive impairment (n = 65) or subjective cognitive decline (SCD) (n = 132), according to the national guidelines of Sweden that have been established by the Swedish Board of Health and Welfare [21]. The diagnostic examination process has been described in detail previously [22]. Patients remitted to the memory clinic underwent extensive examinations that encompass, for instance, clinical examinations, neuropsychological assessments, blood chemistry analyses, CSF biomarker measurements and MRI. The diagnosis of each patient was evaluated and set by a multidisciplinary team. The extensive cognitive examinations included: Mini Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA) total points, Rey Auditory Verbal Learning Test (RAVLT) learning, Rey Complex Figure memory (RCF), Digit Symbol-Coding (WAIS-IV) (KOD).

The samples included in the validation cohort were selected from the Amsterdam Dementia Cohort, and consisted of 26 probable AD patients and 26 SCD patients. All patients underwent extensive dementia screening at baseline, including physical and neurological examination, EEG, MRI and laboratory tests. Neuropsychological assessments were performed and included the MMSE for global cognition. Diagnoses were made by consensus in a multidisciplinary meeting. Probable AD was diagnosed according to the core clinical National Institute on Aging–Alzheimer's Association (NIA-AA) criteria. All AD patients were A+T+ in CSF. Diagnosis of SCD was determined when the results of all clinical examinations were normal, and there was no psychiatric diagnosis. All SCD patients were A-T- in CSF.

Sample classification

To explore the association between the measured proteins and amyloid and tau pathology in the discovery cohort, the individuals were classified based on CSF Aβ42/40 ratio and CSF p-tau levels. Individuals with Aβ42/40 ratio × 10 < 0.68 were classified as amyloid positive and those with p-tau levels ≥ 58 pg/ml as tau positive. Based on their combined amyloid and tau status, the individuals were divided into four groups: A-T- (n = 148), A-T+ (n = 9), A+T- (n = 19) and A+T+ (n = 65) (Supp. Fig. 1). Only A-T- and A+T+ individuals were further included in this study. A-T+ and A+T- individuals were excluded due to insufficient sample representation. For analysis, all individuals with probable AD, MCI, or SCD were included in the A+T+ group (n = 65). However, only patients with SCD were included in the A-T- group (n = 106) (Table 1), unless otherwise stated.

Table 1 Sample demographics

To validate the findings, the external validation cohort was classified in a similar manner. However, Aβ42 levels were used instead of Aβ42/40 ratio to determine the presence of amyloid pathology, as Aβ40 levels were not available for this cohort. Here, samples with Aβ42 levels < 813 pg/ml were classified as amyloid positive and with p-tau levels > 55 pg/ml as tau positive [23]. Final classification in the validation cohort based on the combined amyloid and tau status resulted in two groups, A-T- (n = 26) and A+T+ (n = 26) (Table 1, Supp. Fig. 1). These groups were identical to the diagnostic groups (Healthy controls and AD, respectively) as the AT status was used to determine the diagnosis.

Protein analysis with suspension bead array

A multiplex antibody-based suspension bead array was used to measure the levels of 73 proteins, pre-selected based on previous published and in-house unpublished neuroproteomic studies, complemented with targets from literature. Each antibody was immobilized onto the surface of color-coded magnetic beads (MagPlex, Luminex corp.) using NHS-EDC chemistry, as described previously [24]. The beads were subsequently pooled to form a multiplex bead array. All antibodies used in this study were polyclonal rabbit antibodies produced within the Human Protein Atlas (www.proteinatlas.org), except for the angiotensinogen (AGT) antibody (AF3156-SP, R&D Systems).

The CSF samples were transferred into 96-well PCR plates in a stratified randomisation manner based on diagnosis, age and sex. Next, the crude samples (1/2 dilution) were directly labelled with an approximated tenfold molar excess of biotin (NHS-PEG4-biotin, A39259, ThermoFisher Scientific), as previously described [25].The labelled samples were further diluted to a final dilution of 1/25 and heat-treated for 30 min at 56 °C before incubation with the prepared bead array at room temperature overnight. After washing the unbound proteins, the antibody-bound protein targets were labelled with a streptavidin-bound fluorophore and quantified using the Flexmap 3D instrument (Luminex corp.). Data was acquired as a median fluorescent intensity per bead ID and per sample (relative quantification).

CSF amyloid, tau, NfL and CSF and serum albumin measurements

For samples in the discovery cohort analysed before August 22, 2019 and for all samples in the validation cohort, AD biomarkers Aβ40 (only in discovery cohort), Aβ42, t-tau and p-tau were measured in CSF by commercially available ELISAs (Innotest AMYLOID (1– 40), Innotest AMYLOID (1– 42), Innotest hTAU-Ag and Innotest Phosphotau (181P); Fujirebio), following the manufacturer’s instructions. Samples in the discovery cohort analyzed after August 22, 2019 were measured using the Lumipulse G-series chemiluminescent enzyme immunoassay (Fujirebio Europe). NfL measurements were performed using a commercial ELISA (Uman Diagnostics, 10-7001). CSF and serum albumin concentrations were measured using the BN ProSpec/Atellica NEPH platform (Siemens Healthineers). All analyses in the discovery cohort were performed at the Karolinska University Hospital Laboratory, and in the validation cohort at the Neurochemistry Laboratory, Amsterdam UMC.

Data analysis and visualizations

Data processing, analysis and visualizations were performed using the open-source R statistical software (4.2.2) with extra packages vroom, tidyverse, ggpubr, ggbeeswarm, ggrepel, pheatmap, stats, scales, and patchwork. Additional packages and functions used in this study are stated in the respective data analysis sections. The figures were further adjusted for clarity (e.g., figure legends) using the vector graphic editor Affinity Designer (1.8.6) (Serif, West Bridgford, UK).

Data adjustment and quality control

The raw data generated in the multiplex protein profiling were adjusted for technical variations in two steps. First, the data were adjusted to minimize the effect of delayed instrument readout. For this, a robust linear model (rlm, MASS) was constructed for each protein where the response variable was the protein fluorescent intensity and the predictor the sample position in the plate. The model residuals were thereafter added to the median protein signal intensity to obtain the adjusted values for each protein and sample. The data was further adjusted for potential differences between the sample plates using the MA-individual normalization [26]. To evaluate the technical variation of each protein assay, three sample pool replicates were included in each sample plate to assess intra-assay reproducibility. Data adjustment steps were followed by further quality control based on inter-assay correlation (required Spearman rho ≥ 0.7) and background evaluation. Finally, data analysis was conducted on 49 proteins (Supp. Table 2), which had a median intra-assay CV of 3.5% with the range of 1.1–10.9% (with only one protein, CHIT1, having CV over 10%).

Univariate analysis

The differences in CSF levels of the proteins measured with the suspension bead array between A-T- individuals with SCD and A+T+ individuals were tested using the non-parametric Wilcoxon rank-sum two-sided test (wilcox.test, stats). The obtained p-values were adjusted for false discovery rate (FDR) using the Benjamini-Hochberg correction for multiple hypothesis testing. Proteins with adjusted p-value < 0.05 were considered significantly different between the tested groups. The same statistical approach was used to compare ROC AUC values between protein pairs from different clusters in both discovery and validation cohorts, and to compare GAP43/PTPRN2 ratio, SNCB/PTPRN2 ratio, GAP43 and PTPRN2 between the different diagnostic groups in A-T- and A+T+ individuals in the discovery cohort, without p-value adjustment.

Protein clustering

Correlation between the measured proteins and CSF AD markers (Aβ40, Aβ42, Aβ42/40, t-tau, p-tau, NfL) was calculated using Spearman correlation (cor, stats). Hierarchical clustering (hclust, stats) was performed to cluster the measured proteins based on their correlation to CSF AD markers. Euclidean distance was used as the similarity measure and Ward’s method (ward.D2) was used for the clustering. To visualize the clustering result, a heatmap was created (pheatmap) using the same clustering method. The heatmap was further annotated with correlations (Spearman) between the individual protein levels and albumin quotient. The same approach was used to cluster proteins based on their correlation to each other for both A-T- and A+T+ individuals, and further to visualise correlation between cognitive scores and CSF AD markers amyloid beta peptides, tau and NfL.

The network graph visualisation of correlations between all measured proteins in the A-T- with SCD and A+T+ individuals were generated using tidygraph and ggraph R packages. Only correlations with |rho| > 0.5 were included in the graph which was created using the Fruchterman-Reingold layout algorithm.

Support vector machine modelling

Support vector machine modelling (SVM) was used to assess the capability of CSF protein pairs to differentiate A+T+ from A-T- (SCD) individuals. Separate models were created to test all possible protein pairs within and between the amyloid- and tau-associated clusters. The SVM models were constructed using a training sample set comprising 70% of the discovery cohort samples, and each model was evaluated using the remaining 30% of samples from the same cohort. To counter group size biases, the A-T- sample group was randomly undersampled to the size of the A+T+ group (n = 65) prior to the split. The split into training and test sets was repeated 101 times with different seeds resulting in 101 models per protein pair, with the same seed used for undersampling. To optimise the models the “cost” parameter was tuned in the training part with 10 times cross-validation, with the tested values of 0.1, 1, 10 and 100. The linear kernel was applied to all models and the protein data underwent log transformation, scaling, and centering to the median (scale, baseR with colMedians, matrixStats). Model performance was compared using the receiver operating characteristic analysis (ROC) area under the curve (AUC) (roc, pROC). The confidence interval for the median model AUC was estimated using the bootstrap resampling with 1000 iterations (ci, pROC). The same modelling procedure was used to construct SVM models with albumin CSF/serum ratio included as a predictor variable, and for single proteins from the tau-associated cluster as single predictor variables (GAP43, SNCB, NRGN and AMPH).

To validate the robustness of the modelling results, we replicated the model with the median AUC from the 101 constructed models for each protein pair in the independent validation cohort. The sample sizes of the diagnostic groups within this cohort were equal, allowing for passing all samples into the model. The resulting AUC with confidence interval for each protein pair was recorded and evaluated.

Cognitive data correlation analysis

The correlation between the single protein or protein pair ratios and cognitive scores was calculated using the Pearson correlation (cor, stats). All individuals with cognitive score data available within the A-T- individuals with SCD and A+T+ individuals were passed to the correlation (Discovery cohort – MMSE: n = 118, MoCA: n = 145, KOD: n = 100, RAVLT: n = 111, RCF: n = 105; Validation cohort – MMSE: n = 51). The statistical comparison of correlations between the protein pair ratios from different clusters or between the tau-associated protein/amyloid-associated protein ratios and single proteins from amyloid- and tau-associated clusters were calculated using the Wilcoxon rank-sum two-sided test (wilcox.test, stats) using the absolute correlation values.

留言 (0)

沒有登入
gif