Comprehensive proteomics of CSF, plasma, and urine identify DDC and other biomarkers of early Parkinson’s disease

Stanford research cohorts

We included participants from five different studies of aging and neurodegeneration at Stanford University: (1) Biomarkers in PD Study (BPD), (2) the Pacific Udall Center (PUC), (3) Stanford Alzheimer’s Disease Research Center (ADRC), (4) Stanford Center for Memory Disorders Cohort Study (SCMD), and (5) Stanford Aging and Memory Study (SAMS). The combined cohort is, Hereafter, referred to as Stanford-5x. Data were collected between 2012 and 2018. Inclusion criteria for these analyses were (1) ages between 40 and 90 years (2) English or Spanish fluency for comprehensive neuropsychological testing, and (3) no contraindications to lumbar puncture. All participants provided written informed consent to participate in the parent studies following protocols approved by the Stanford Institutional Review Board.

A consensus panel consisting of one board-certified movement disorders neurologist or behavioral neurologist, one board-certified neuropsychologist, and other study personnel adjudicated the diagnosis for each participant. PD diagnosis was based on UK PD Society Brain Bank clinical diagnostic criteria [23]. We defined Early PD as participants with less than three years since diagnosis at the time of CSF collection. Participants on the AD spectrum (AD-s) included those with dementia or mild cognitive impairment likely due to AD based on the NIH Alzheimer’s Disease Diagnostic Guidelines [30, 40]. Participants with mild cognitive impairment who have decreased CSF Aβ-42 concentration, are more likely to have cognitive impairments due to AD [18]. To exclude participants without AD from the AD-s group, we excluded mild cognitive impairment participants who had CSF Aβ-42 concentration more than two standard deviations above the mean in AD [54, 62]. Participants with PD include those with no cognitive impairment, those with mild cognitive impairment [24] and those with dementia due to PD. Healthy controls (HC) were older individuals without a neurological diagnosis adjudicated as cognitively normal for age at the consensus meeting.

Some participants were excluded from the current study because of incomplete metadata on age, sex, or disease status. After filtering based on inclusion criteria, 201 CSF samples (71 PD, 78 HC, and 52 AD-s) and 250 blood plasma samples (68 PD, 105 HC, and 77 AD-s) were sent to Olink Proteomics AB (http://www.olink.com) for proteomics using a multiplex proximity extension assay [1], described in detail in a separate methods section 385 CSF samples (71 PD, 253 HC, 61 AD-s) and 1164 plasma samples (249 PD, 652 HC, 263 AD-s) were sent to SomaLogic Inc. (somalogic.com) for proteomics using a multiplex aptamer affinity assay, described in detail in a separate methods section.

Neurologic, motor and cognitive assessments

All participants completed a general neurological exam. PD participants completed the Movement Disorders Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS III) [13] in the Off- and On-medication states, according to published criteria [34]. We calculated the Levodopa Equivalent Daily Dose (LEDD) using previously reported conversion factors [63, 66].

Global cognitive function was assessed using the Montreal Cognitive Assessment (MoCA) [33] in the ADRC, PUC and BPD, and the Mini-Mental State Exam [32] in the SAMS and SCMD studies. PD participants underwent neuropsychological testing in the on-medication state in order to assess cognitive function without interference by motor deficits.

CSF collection and assessment

A neurologist performed a lumbar puncture to collect CSF samples according to procedures standardized across all Stanford-5x cohorts [10]. Briefly, a 20–22 G spinal needle was inserted in the L4–L5 or L5–S1 interspace and CSF was collected in polypropylene tubes. The tubes were immediately frozen at − 80 °C in a centralized freezer in the Neuropathology Core of the Stanford ADRC.

PPMI cohort

Data used in the preparation of this article were obtained on August 8, 2023, from the Parkinson’s Progression Markers Initiative (PPMI) database (http://www.ppmi-info.org/access-dataspecimens/download-data), RRID:SCR 006431. For up-to-date information on the study, visit http://www.ppmi-info.org.

The Parkinson's Progression Markers Initiative (PPMI) is an ongoing observational, international study conducted in the United States, Europe, Israel, and Australia. The study has enrolled approximately 4000 participants to date which includes healthy adults (HC), de novo PD, prodromal (age 60 or older with DAT deficit and REM sleep behavior disorder (RBD) or hyposmia), and non-manifesting LRRK2 and GBA carrier participants. Participants undergo extensive clinical assessment, imaging, and molecular phenotyping. Here, we have used data from two PPMI sub-studies: project 190 and project 196. Both studies were performed by industry research groups in collaboration with PPMI and shared in the online PPMI portal as part of the PPMI data use agreement. While each study uses samples from PPMI participants, they have few overlapping participants (42 PD, 92 HC, and 5 genetic carriers) and no overlapping samples since they focused on different tissues and proteomics methods. We have labeled them as PPMI 1 (project 196) and PPMI 2 (project 190) in the main text for clarity.

PPMI patient nomenclature

PPMI has recruited multiple participant groups and a detailed description of groups is available at https://www.ppmi-info.org/study-design/study-cohorts/.

PD cohort: all participants of the PD cohort have a clinical diagnosis of PD and a positive dopamine transporter (DAT) SPECT. The PD cohort is comprised of several subgroups, which include the following key inclusion criteria:

De novo PD: people with untreated PD and within 2 years of diagnosis at enrollment. The initial phase of PPMI enrolled 423 untreated PD participants.

Genetic PD: people with PD and pathogenic genetic variant(s) in LRRK2 or GBA, within 7 years of diagnosis. Treatment with medication was allowed at enrollment, therefore, some of the participants were medicated and some were still de novo at baseline visit. The initial phase of PPMI enrolled 294 genetic PD participants (across variants).

Prodromal cohort: participants who are at risk of Parkinson’s based on clinical features, genetic variants, or other biomarkers.

Prodromal (RBD or anosmia): the initial phase of PPMI enrolled 67 prodromal volunteers age 60 or older with DAT deficit and REM sleep behavior disorder (RBD) or hyposmia.

Prodromal (non-manifesting genetic carriers): the initial phase of PPMI enrolled 445 prodromal volunteers with a genetic risk variant (SNCA, LRRK2, GBA).

Project 196 (PPMI 1)

A study document can be found at https://ida.loni.usc.edu/download/files/study/4bb082de-fa77-40ce-9dc5-494ec7fc0a1f/file/ppmi/PPMI_Project_196_Methods_Explore_20221212.pdf.

Briefly, project 196 is a longitudinal Olink proteomics study of de-novo PD and non-genetic prodromal participants age 60 or older with RBD or hyposmia and DAT deficit, with repeat sampling over 4 years. Data from study 196 were downloaded, totaling 924 CSF samples and 1160 plasma samples. The data are provided in Olink NPX units, a normalized arbitrary unit derived from sequence counts. The Olink NPX calculation and QC process are described in a separate section for clarity. Project 196 documents describe the data generation occurred in two separate experimental batches more than one year apart, so a batch effect analysis using principal component analysis (PCA) was performed. A batch effect corresponding to the experimental batch was observed via PCA. The study design utilized bridging samples to allow for correction of batch effects. We noted 89 bridging samples in both CSF and plasma datasets. We performed a batch correction based on PLATEID, using the removeBatchEffect function in the limma package for R 4.0.3. We confirmed via PCA that this removed the experimental batch effect. After batch correction, NPX values for replicate bridging samples were averaged to avoid duplicates in downstream analysis. The dataset was then further filtered to include only samples in which DDC was detected, since a fraction of samples were missing data from the Olink Cardiometabolic panel which contains DDC. After merging the filtered and QC’d data with participant level metadata, there were 765 CSF samples from 257 participants, consisting of 180 PD, 439 HC, and 146 prodromal samples. There were 859 plasma samples from 274 participants, consisting of 193 PD, 439 HC, and 227 prodromal samples. These data were used for all downstream CSF and plasma analysis in PPMI. For analysis of baseline samples, there were 243 CSF samples (69 de novo PD, 130 HC, 44 prodromal) and 262 plasma samples (74 de novo PD, 130 HC, 58 prodromal).

Project 190 (PPMI 2)

A study document can be found at the PPMI website, https://ida.loni.usc.edu/download/files/study/0a00d6e1-cd85-4340-9913-b2436a94acc1/file/ppmi/PPMI_190_Methods_Targeted_Untargeted_MS-based_proteomics_of_urine.pdf.

Briefly, project 190 is an LC–MS/MS proteomics study of urine samples from PD and non-manifesting GBA and LRRK2 mutation carriers, with a small amount of longitudinal data in LRRK2 carriers. Data from study 190 was provided in a normalized and QC’d form. After merging with patient-level metadata there were 1156 samples from 983 participants, consisting of 549 PD, 140 HC, and 467 non-manifesting carrier samples that were then used in this study.

Mass Spec proteomics (excerpted from project 190 study documents)

Proteins were extracted from neat urine samples and digested into peptides by using the MStern blotting sample preparation protocol. To determine urinary proteome profiles, purified peptides were loaded on Evosep Evotips, separated via an online-coupled Evosep One HPLC and analyzed on a Bruker timsTOF Pro mass spectrometer with a data-independent acquisition method and a gradient length of 44 min. Analysis of raw spectra was performed with DIA-NN.

Olink CSF and plasma proteomics

Plasma and CSF samples in the Stanford-5x and PPMI cohorts were sent to Olink Proteomics AB (Uppsala, Sweden. http://www.olink.com) for the quantification of up to 1536 proteins using a multiplex proximity extension assay [1]. This technology has been extensively vetted in biomarker studies and detailed methodology of the assay has been previously published [1]. Briefly, the proximity extension assay uses DNA oligonucleotide-labeled polyclonal antibodies which bind to each protein target. When two antibodies targeting different epitopes bind the same protein target, a proximity-dependent DNA ligation and elongation reaction can occur. The requirement for coincident binding leads to high specificity. The target protein levels can then be read out using quantitative PCR (qPCR) or next generation sequencing. This technology enables multiplex measurement of up to 96 or up to 384 protein targets in a single assay, depending on assay version. In the Stanford-5x cohorts, proteins from 13 different 96-protein panels were measured, resulting in quantification of 1196 proteins in both CSF and plasma samples. CSF and plasma protein levels were analyzed using the Cardiometabolic (v.3602), Cardiovascular II (v.5005), Cardiovascular III (v. 6112), Cell Regulation (v.3701), Development (v.3512), Immune Response (v.3201), Inflammation (v.3012), Metabolism (v.3402), Neuro Exploratory (v.3901), Neurology (v.8011), Oncology II (v.7002), Oncology III (v.4001) and Organ Damage (v.3301) 96-plex immunoassay Olink panels. In the PPMI cohort, four 384-protein panels from the Olink explore 1536 platform, the Cardiometabolic 384, Neurology 384, Oncology 384, and Inflammation 384, were run. Details on the exact assay version were not provided in study docs.

Olink data processing and quality control

A detailed description of the Olink data normalization and QC process can be found at https://olink.com/application/data-normalization-and-standardization/. Exact processing differs depending on assay version (qPCR-based readout or NGS-based readout).

In Stanford-5x, a qPCR-based readout was used. As previously described [1], eight control samples are run on each plate: two are external pooled plasma samples, which are used to assess potential intra-plate/run variation, three are Inter-Plate Controls (IPCs) and three are buffer blanks. The IPCs are formed from a pool of 92 antibodies. The median of the IPCs is used to normalize each assay and compensate for potential variation between runs and plates.

Briefly, protein expression data are reported in Normalized Protein eXpression (NPX), which is a normalized unit on a log2-scale. Calculation of NPX differs based on assay version, depending on if a qPCR readout or next generation sequencing readout was used. In Stanford-5x cohorts, a qPCR readout was used, and the NPX values are derived from the Ct or “threshold cycle”. This is the number of qPCR cycles needed for the signal to pass a fluorescence signal threshold. NPX is calculated from the Ct values using the following equations:

In the PPMI cohort, an NGS readout was used. Detailed information on normalization and the calculation of NPX from NGS reads can be found in the PPMI project 196 study documents and on the Olink website.

SomaScan proteomicsSomaScan assay

The SomaLogic SomaScan assay, which uses slow off-rate modified DNA aptamers (SOMAmers) to bind target proteins with high specificity, was used to quantify the relative concentration of 4979 protein targets in CSF and 7288 protein targets in plasma samples from Stanford-5x. The assay has been used in hundreds of studies and described in detail previously [15, 35]. Two versions of the SomaScan assay were used in this study. The v4 assay (4,979 protein targets) was used in CSF samples from Stanford-5x, and the v4.1 assay (7,288 protein targets) was used in plasma samples from Stanford-5x. All v4 probes are included in the v4.1 assay. Since protein levels across assay versions were not directly compared to each other, no bridging procedure was needed in the current study to harmonize values.

SomaScan normalization and QC

Standard Somalogic normalization, calibration, and quality control were performed on all samples [59,60,61, 69] by SomaLogic Inc. Detailed documentation can be found at https://somalogic.com/tech-notes/. Briefly, pooled reference standards and buffer standards are included on each plate to control for batch effects during assay quantification. Samples are normalized within and across plates using median signal intensities in reference standards to control for both within-plate and across-plate technical variation. Samples are further normalized to a pooled reference using an adaptive maximum likelihood procedure. Samples are additionally flagged by SomaLogic if signal intensities deviated significantly from the expected range and these samples were excluded from analysis. The resulting expression values are the provided data from Somalogic and are considered “raw” data. Raw data was then log10 transformed to reduce heteroscedasticity and increase power in downstream statistical modeling.

Statistical analyses

To examine demographic and clinical group differences, we used a non-parametric Wilcoxon sign-rank test or a non-parametric one-way ANOVA on ranks (Kruskal Wallis H-test).

When using longitudinal sample data, we ran differential expression analysis on protein levels using a multi-level linear-mixed effects model controlling for age, sex, race, ethnicity, and sample-relatedness. Sample-relatedness refers to longitudinally collected samples from a single individual, which we expect to be more correlated than samples from different individuals. When looking at samples from one timepoint only, we used a linear model controlling for age, sex, race, and ethnicity.

We used Benjamini–Hochberg false discovery rate control across the number of detected proteins to account for multiple testing. We studied the association between DDC levels and clinical measures of disease severity (MDS-UPDRS III, LEDD, MoCA) using linear regression analyses corrected for age and sex. We used principal component analysis to explore the relationship between global differences in protein expression profile and clinical/demographic variables. We tested correlations between principal components with a spearman correlation test with Bonferroni correction for multiple testing. All statistical analysis was done in R 4.0.3. We used the package lmerTest [19] and the dream function from the R package variancePartition [17] for mixed effects models. We used the glm function with a binomial link in R 4.0.3 to perform binary logistic regression. We used the pROC [46] and multiROC [68] packages in R 4.0.3 to generate and visualize receiver operator sensitivity–specificity curves and calculate area under the ROC curve.

留言 (0)

沒有登入
gif