Untargeted blood serum proteomics identifies novel proteins related to neurological recovery after human spinal cord injury

Patients and healthy control subjects

A cohort of thirty patients with motor complete (American Spinal Injury Impairment Scale -AIS- A and B) traumatic spinal cord injury (SCI) was selected among those recruited at the Trauma Center Murnau (Bavaria, Germany) or at Hospital Nacional de Paraplejicos (Toledo, Spain) under the development of the Autoantibodies in Spinal Cord Injury study [26] (Table 1). Blood samples included in this study were collected during the subacute phase (31 ± 1 days post-injury). The study was approved by the Ethics Committee of Toledo Health Care Area and by the Ethics Committee of the Bavarian Medical Board (registry number 15,046). The study follows and adheres to the World Medical Association Declaration of Helsinki and is registered at the public database Clinicaltrial.gov (NCT02493543). All patients fulfilled the inclusion and exclusion criteria and gave their informed consent to participate.

Table 1 Clinical and demographical characteristics of patients with spinal cord injury

Inclusion criteria were:

- Males and females.

- At least 18 years old.

- Any neurological level of injury, except cauda equina syndrome.

- Complete and incomplete lesions.

- If patient was treated with glucocorticoids, the last dose should have been.

administered at least 7 days before study onset. Exclusion criteria were:

In addition, patients with traumatic brain injury (Glasgow Coma Scale < 14) were excluded from this study. Polytrauma patients were recruited as long as their injuries did not interfere with neurological examination.

Sensorimotor function of patients was evaluated following the International Standards for Neurological Classification of Spinal Cord Injury scale (ISNCSCI) at an average time of 31 ± 1 days after injury (from now on described as 4 weeks after injury). All evaluations were performed by experienced personnel. At the same dates, a blood sample was obtained from each patient. Clinical and demographical characteristics of patients are summarized in Table 1.

For the ELISA studies, the levels of some proteins were compared with control individuals. Two different control groups were included: (i) Healthy control group (HC), formed by voluntary healthy individuals recruited at Murnau Trauma Hospital (n = 41; Supplementary Table S1) and (ii) Spine fracture control group (SPFC), formed by patients with spine fracture between C1 and L1 but no neurological damage, recruited at Murnau Trauma Hospital (n = 9; Supplementary Table S1). In the control groups, a single blood sample was taken after signing the informed consent and fulfilling the inclusion and exclusion criteria detailed above (with the obvious exception of not suffering a SCI). Age and sex of these individuals are summarized in Supplementary Table S1. Both HC and SPFC had not previous history of neurological trauma or neurological deficits. In addition, HC were asymptomatic at the time of recruitment.

A scheme of the workflow followed in the current study can be found in Fig. 1.

Fig. 1figure 1

Experimental Design. (A) Scheme depicting the workflow used in this study. Sera were collected from SCI patients with no (NR) or strong (SR) recovery and depleted from high and medium abundant proteins for subsequent tagging and detection of low abundant proteins using mass spectrometry. After enrichment analysis (NR vs. SR), some of these proteins were further validated by ELISA. (B) Box plot showing the distribution of SR (green) and NR (red) patients studied according to their Integrated Neurological Change Score (INCS) between 30 and 120 days after injury. (C) Distribution of SR (green) and NR (red) patients according to their AIS grade conversion (-1 to + 3) between 30 and 120 days after injury

Patient stratification: strong vs. no-recovery

Patients were classified as showing no recovery or strong recovery based on AIS grade conversion and INCS (Integrated Neurological Change Score; [27]), a score of overall change in the neurological function of patients assessed according to the International Standards for the Neurological Classification of Spinal Cord Injury (ISNCSCI; [28]). Patients who did not convert their AIS grade from 30 to 120 days post-injury (dpi) and who showed INCS values (for the same period) close to zero or negative –did not experience significant changes in the overall neurological function or even experienced some worsening– were classified as no recovery (NR; Fig. 1B, C). Patients with AIS grade conversion and INCS values significantly higher than those of the previous group were classified as strong recovery (SR; Fig. 1B, C). Indeed, median INCS value of SR group is close to 0.5, which may be interpreted as recovering at 120 dpi half of the overall neurological function that was not present at 30 dpi (Fig. 1B). Based on these criteria, 10 patients were classified as SR and 20 as NR.

Serum samples collection and processing

Peripheral blood was collected by venipuncture at the medial cubital vein. Blood clot was allowed to form by maintaining the tubes for 45 min at room temperature (RT) followed by 1 h at 4ºC. Blood was centrifuged at 4ºC, at 1500 g for 20 min and serum was aliquoted and stored at -80ºC until used.

A tandem IgY14/Supermix depletion method, following that described in Keshishian et al. [29], was used in this study. Serum volumes corresponding to a starting mass of ~ 12.5 mg were immuno-affinity- depleted of the 14 most abundant proteins followed by the next ~ 50 moderately abundant proteins using Seppro® IgY14 (LC10) and Seppro® Supermix (LC5) columns (both columns from Sigma- Aldrich, St. Louis, MO, USA). In an HPLC-assisted manner (Waters Alliance 2695), serum was first loaded on the IgY14 column and the flow-through directed onto the Supermix column. Dilution, stripping and neutralization buffers provided by the manufacturer were used and manufacturer’s instructions were followed (Sigma-Aldrich). Flow-through that included proteins based on UV absorbance was collected and concentrated by spin filters (Amicon 3 kDa MWCO; Millipore) to a volume of ~ 500 µl. The protein concentrations of the samples post-depletion were determined by Bradford protein assay. Furthermore, the protein content was visualized by Coomassie stained (Imperial Stain, Pierce) SDS-PAGE 4–20% gradient gels (Criterion, Biorad; Supplementary Figure S1). Collectively, this procedure yields a ~ 100-fold enrichment of LAPs and a reduction of, at least, two orders of magnitude in the dynamic range of protein concentrations, allowing LAPs to be detected by mass spectrometry [22, 30]. The proteins depleted by both columns are listed in Supplementary Table S2.

Serum depletion, tandem mass tagging (TMT) and mass spectrometry (including peptide and protein identification and scoring) were performed by Proteome Sciences (Proteome Sciences plc, Surrey, UK), and includes several quality control steps (detailed below).

TMT labeling and Mass spectrometry (TMT MS2)

Depleted samples were combined into three tandem mass tags (TMT) 11plexes which were processed and analyzed by the TMT®MS2 workflow (Proteome Sciences, UK). The workflow was the following: in each TMT® 11plex, ten experimental samples were combined with one reference pool; the three TMT® 11plexes were separated using basic reverse phase (bRP) chromatography and 30 fractions collected; each fraction was subjected to LC-MS2 analysis using a high-performance Orbitrap Fusion mass spectrometer (Thermo Scientific).

More in detail, all samples were adjusted to the same protein concentration by adding depletion’s dilution buffer (Sigma). Equal volumes from all samples were taken to prepare the pooled reference sample. Volumes equivalent to 50 µg of protein per sample were reduced (dithiothreitol), alkylated (iodoacetamide), and digested (trypsin) to generate peptides, then desalted (SepPak tC18 cartridges) and lyophilised to dryness. For TMT® labeling, peptides in each sample were re-suspended in KH2PO4 buffer, mixed with TMT® 11plex reagents (1 tag per sample according to the labeling plan) and incubated for 1 h at RT. The TMT® reactions were stopped by adding hydroxylamine, and the samples were pooled according to the labeling plan to generate three TMT® 11plex samples which were purified by solid-phase extraction.

Each of the three purified TMT® 11plex samples (~ 250 µg per sample) was fractionated by HPLC-assisted basic reversed phase (bRP) chromatography (EC 250/4.6 Nucleodur C18 Gravity (Macherey-Nagel) and HPLC system (Waters Alliance 2695). In total, 54 tubes were collected at regular time points along the main elution profile for the separation. These were combined to generate 30 fractions per TMT® 11plex sample. Fractions were lyophilised to completion and stored at -80 °C prior to mass spectrometry.

Each of the 30 fractions generated per TMT® 11plex sample was analysed by LC-MS/MS using the EASY-nLC-1000 system coupled to an Orbitrap FusionTM TribridTM Mass Spectrometer (both Thermo Scientific). Re-suspended peptides were loaded onto a nanoViper C18 Acclaim PepMap 100 pre- column (Thermo Scientific) and resolved using an increasing gradient of ACN in 0.1% Formic acid through a 50 cm PepMap RSLC analytical column (Thermo Scientific) at a flow rate of 200 nL/min. Peptide mass spectra were acquired throughout the entire chromatographic run (120 min), using a top speed higher collision induced dissociation (HCD) method Fourier-transform mass spectrometry (FTMS). MS2 scans were acquired at 30,000 resolving power at 400 m/z, following each FTMS scan (120,000 resolving power at 400 m/z).

MS quality controls

As reported, Proteome Sciences (UK) workflow includes several quality controls of the MS procedures as follows:

Basic Reverse Phase (bRP) Fractionation: chromatograms of the three TMT® 11plex experiment samples were consistent and passed internal quality assessment. Fraction collection (30 fractions per plex) was successfully conducted and 90 fractions were generated from all the three TMT® 11plexes.

TMT®MS2 Analytical QC: Analysis of TMT® labeling reaction efficiency of the three TMT® 11plex experiment samples showed that > 98% of N-terminal amino groups were labelled which indicates that labeling was essentially complete. All MS runs of the fractionated samples passed internal quality assessments based on the total ion counts (TICs) and numbers of peptide spectral matches (PSMs) (data not shown).

MS instrument performance quality control: Quality controls (commercial digest of bovine serum albumin (BSA)) were run before and after samples to check the analytical reproducibility of the MS performance. Retention time stability, intensity values extracted from six monitored BSA peptides, the numbers of peptides and PSMs obtained by HCD fragmentation were within quality requirements (data not shown).

Computational Mass Spectrometry

In total, 90 separate raw mass spectrometry data files (30 fractions per TMTplex) were submitted to Proteome Discoverer (PD) v2.1 (Thermo Scientific) using the Spectrum Files node. The Spectrum Selector was set to its default values while the SEQUEST HT node was suitably set up to search data against the human FASTA UniProtKB/Swiss-Prot database (version October 2018). The reporter ions quantifier node was set up to measure the raw intensity values for TMT® 11plex mono-isotopic ions (126, 127 N, 127 C, 128 N, 128 C, 129 N, 129 C, 130 N, 130 C, 131, 131 C). The SEQUEST HT search engine was programmed to search for tryptic peptides (with two missed cleavages) and with static modifications of carbamidomethyl (C), TMT6plex (K), and TMT6plex (N-Term). Dynamic modifications were set to deamidation (N/Q), oxidation (M). Precursor mass tolerance was set to 20ppm and fragment (b and y ions) mass tolerance to 0.02Da. All raw intensity values were exported to tab delimited text files for later processing and filtering. Grouped protein results were exported to tab-delimited “Multi-consensus.txt files”, filtered at 1% (High confidence) false discovery rate (peptide spectral matches -PSM- level) and 1 x Rank 1 peptide per protein. Protein grouping was performed using the Parsimony Principle option in the Protein Grouping area within PD. More information about the protein grouping algorithm can be found in the Proteome Discoverer (PD) Version 2.1 User Guide (version A, July 2015).

The steps of data assembly were:

(i) Only none redundant PSMs with protein accession number annotation were used for quantification, (ii) Filtering of PSMs was conducted using Isolation interference information from input Proteome Discoverer multi-consensus file. The threshold of 45% was selected based on analysis of the Isolation interference density distribution. (iii) Isotope impurity correction was applied to PSM level data. (iv) Intensities of the reporter ions of each sample were median-scaled. Then, ratios of reporter ion intensities were calculated for experimental samples relative to the reference sample and log2-transformed. (v) Data belonging to identical peptide sequences were summarized by the median of PSM ratio values to transform the PSM data matrix into a peptide matrix. (vi) The Laboratory Information Management Systems (LIMS) entries were then combined with this peptide matrix to generate a table of peptide identification information (including assigned protein group), quantitative peptide data (given as median PSM ratios in log2 range) together with the sample IDs. (vii) Final peptide data matrix was of size 54,770 peptides and 30 samples.

MS Data Pre-processing

The following data-dependent pre-processing steps were applied:

(i)

Peptides with more than ~ 36% missing values per treatment group were removed, resulting in a reduced peptide data matrix of 30,263 peptides and 30 samples. Remaining missing values were imputed by iterative principal component analysis (iPCA) [31].

(ii)

Peptide ratios were quantile normalized.

(iii)

Exploratory analysis of the resulting data set revealed that the strongest factor driving non-biological variance within the data is TMT plex effect. Besides this, a minor effect of Medical Centre (Toledo vs. Murnau) was detected. Therefore, batch correction for TMT plex and medical centre effects was applied using LIMMA R package [32]. Exploratory analysis after batch effect removal showed that the clustering of the samples was majorly driven by the clinical outcome.

(iv)

Peptides belonging to non-unique protein groups were filtered out, resulting in peptide matrix of size 27,311 peptides (30 samples). The peptide matrix was further used for biological functions enrichment analysis.

(v)

To obtain a quantitative protein data matrix, the peptide values from unique protein group peptides were summarized by trimmed mean into a protein value. The protein data matrix of size 2649 proteins (30 samples) was further used for statistical analysis.

Data Quality Control: Quality parameters were controlled during the whole data pre-processing workflow. Data matrices were also analysed using principal component analysis (PCA) with the aim to identify outlier samples. No outlier samples were detected.

Differential expression statistical analysis

Linear models were created using LIMMA R package [32] to find out peptides and proteins related to neurological recovery (NR vs. SR, as explained before). Models included neurological recovery, level of injury (tetraplegia vs. paraplegia), medical centre (Murnau or Toledo), patient’s sex and age, the time the patient’s sample was stored frozen and, in the case of proteins, also patient’s AIS grade at 30 dpi (A or B). Log2 fold change (logFC) and modified t-statistics of NR vs. SR were calculated using LIMMA R package based on the generated linear models.

Setting of log fold change thresholds (FCT) for peptides or proteins were based on the distribution of the standard deviations of every peptide/protein across the 30 patients. Thresholds were adjusted to the median variance level within the data as,

(1)

log2(FCT(peptide/protein)) = 1.47 x median SD (peptide/protein).

As a result, a FCT = 1.8 for peptides and FCT = 1.6 for proteins were applied to the data analysis. Similarly, a p-value threshold of 0.01 was established for peptides and 0.05 for proteins. Multiple testing corrections were performed using the Benjamini-Hochberg procedure.

Statistical power analysis

One of the most prominent limma package features is that inference is reliable even in experiments with small sample size due to the use of empirical Bayes posterior variance estimation. Nevertheless, we performed a posteriori estimation of the statistical power achieved for protein-level analysis based on the method for calculating sample size while controlling false discovery rate developed by Liu et al. [33] and implemented into the ssize.fdr R package [34]. For this estimation, the effect size was set to 0.68 (log2 FCT, Eq. (1)), FDR to 0.1 (value used to filter differentially expressed proteins, Table 2) and statistical test was determined as two-sided. The standard deviation (SD) was fixed to the median SD of all 2649 proteins across the 30 patients (0.4375), and π0 (proportion of null p-values) was estimated from limma analysis as 0.66 using the qvalue R package [35]. Based on these parameters, the statistical power achieved by including 10 patients –sample size of the smallest group: strong recovery– is 0.91, quite above the standard value of 0.8 commonly used in sample size calculation. Indeed, based on the same parameters, a statistical power of 0.8 (actually 0.81) is expected to be reached by including 8 patients in each experimental group.

Table 2 Blood serum low abundant proteins differentially enriched in strong recoverers (SR) or non-recoverers (NR)Functional analysis

Functional analysis was performed at the peptide level to identify biological processes that are significantly altered between the different samples, where the applied set of thresholds was the same as during statistical testing: For peptides p < 0.01; |FC|>1.8)

A Significance of Enrichment analysis, based on the Fisher Exact Test, was performed by means of a tool developed by Proteome Sciences (Functional Analysis Tool; FAT v1.2.0). Enrichment of functional terms: Gene Ontology Biological Processes and Biological Pathways was performed within FAT. A two-sided p-value was generated by the Fisher’s exact test and the Benjamini-Hochberg method was used for multiple test correction. A minimum of two matched identifiers (e.g. gene names) was required and terms with an adjusted significance value < 0.3 were considered significant. All functional results were visualized using volcano plots (enrichment vs. adjusted p-value).

Gene Ontology (GO) term and pathway enrichment were performed using the background of all non-regulated peptides identified in the study. FAT calculates an enrichment or depletion of annotation terms among the regulated peptides/proteins, where “regulated” implies those passing the fold change thresholds, as broadly used in gene set analysis [36].

Validation of selected proteins by ELISA

We used specific ELISA kits to validate proteins that were enriched in patients with strong recovery or with no recovery constrained to a false discovery rate (FDR) < 0.1. The following kits were used following manufacturer instructions (the dilution of our sera used for each kit is also detailed): AGER (R&D Systems, R&D Systems, Minneapolis, MN, USA; #DRG00; 1:1 dilution), ANGPT1 (RayBiotech, Peachtree Corners, GA, USA; #ELH-Angiopoietin1-1; 1:35 dilution), ARHGAP35 (Abbexa, Cambridge, UK; #abx384958; 1:15 dilution), CALU (Aviva Systems Biology, San Diego, CA, USA; #OKEH04727), CD300A (RayBiotech, Peachtree Corners, GA, USA; # ELH-CD300A-1; 1:10 dilution), CTSG (Aviva Systems Biology, San Diego, CA, USA; #OKEH01241; 1:750 dilution), DEFA1/DEFA3 (Aviva Systems Biology, San Diego, CA, USA; #OKBB01048; 1:500 dilution for SCI and SPFC groups, 1:300 for HC), OLR1/ LOX-1 (RayBiotech, Peachtree Corners, GA, USA; #ELH-LOX1-1); PIN1 (Aviva Systems Biology, San Diego, CA, USA; #OKCD06255; 1:100 dilution), RACK1/ GNB2L1 (RayBiotech, Peachtree Corners, GA, USA; #ELH-GNB2L1-1; 1:6), SERPINE1/PAI1 (Aviva Systems Biology, San Diego, CA, USA; #OKCD06428; 1:100 dilution), TCN2 (Aviva Systems Biology, San Diego, CA, USA; #OKEH02273; 1:500 dilution). Plates were read at 450 nm in a Spark® Multimode Microplate Reader (Tecan Austria GmbH, Grödig, Austria). The dilution of sera stated above for every ELISA determination was determined after testing a range of dilutions (up to eight) for each kit, to ensure that the analyte concentration was within the range of the standard curve.

Standard curves were performed in every plate according to the manufacturer instructions and optical densities were adjusted to the most appropriate model (that with the highest coefficient of determination, r2) for interpolating the analyte concentration. For all cases but CTSG and SERPINE1, a 5-parameter logistic regression curve was fitted to the standard curve. For CTSG and SERPINE1, a simple linear regression was fitted. In all cases, r2 > 0.99.

All samples were measured by duplicate and technical reproducibility (inter and intra-assay) was checked by measuring the coefficient of variation (CV; mean value across all samples and ELISAs 8.5%).

A limit of detection (LoD) was established independently for every plate as suggested by the guidelines of the Clinical and Laboratory Standards Institute [37]. First a limit of blank (LoB) was established as:

(2)

LoB = Meanblank + 1.645 SDblank.

Then,

(3)

LoD = LoB + 1.645 SD lowest concentration standard.

Whenever a sample was under the LoD, values were dismissed and new ELISAs were performed concentrating these samples.

Levels of analytes measured by ELISA were tested for normality by Shapiro’s test. Whenever normality could not be assumed, levels of analytes between NR and SR were compared by Mann-Whitney’s test; otherwise, Student’s t-test was applied. The same procedure was applied to compare levels of patients with those of control subjects. The precise statistical test is stated in the correspondent figure legend. All statistical analysis were performed in R statistical programming language [38] using RStudio [39].

留言 (0)

沒有登入
gif