Aqueous humor proteomics analyzed by bioinformatics and machine learning in PDR cases versus controls

Subjects

Written informed consent was obtained from all enrolled patients in the study. The study was performed in compliance with tenets of the Declaration of Helsinki for biomedical research and was approved by the Ethics Review Committee of Peking Union Medical College Hospital (FW-HXKT2018103102421S2). The enrollment criteria of the PDR group were as follows: (1) clinical diagnosis of PDR [15]; (2) absence of other ocular diseases, pregnancy, or severe systemic conditions (except diabetes mellitus); and (3) absence of ocular treatment, such as photodynamic therapy, surgery, or intravitreal injection. In the control group, each patient should be diagnosed with senile cataract and scheduled for phacoemulsification cataract surgery for the insertion of an intraocular lens. They should have no history of other ocular diseases, prior intraocular treatment, or severe systemic conditions. All patients underwent pre-treatment ocular examinations, testing intraocular pressure (IOP), axial length, corneal endothelial cell counts, best-corrected visual acuity, B-ultrasonography, and biomicroscopy of anterior and posterior segments.

Sample collection and preparation

The collection of AH was performed before treatment in both groups, regardless of whether it was pharmacological or surgical. AH was collected during microscope-aided surgery using a sterile 1-mL insulin injection syringe with a needle. The samples were collected in a 1.5-mL microcentrifuge tube and centrifuged at 13,000 rpm for 10 min at 4 °C, then transfer to a new 1.5-mL microcentrifuge tube and stored at -80 °C until subsequent analyses.

AH samples were sonicated three times on ice, using a high-intensity ultrasonic processor (Scientz, Ningbo, China), in lysis buffer (2 M Thiourea [Sigma-Aldrich, USA] + 7 M Urea [Amresco 0568-1Kg, USA] + 0.1% 3-[(3-Cholamidopropyl) dimethylammonio]-1-propanesulfonate [CHAPS] + protease inhibitors).

The remaining debris was removed by centrifugation at 12,000 g and 4 °C for 10 min. Furthermore, 10 µL of supernatant was collected and utilized by the Bradford Protein Assay Kit (Thermo 23,236, USA) for protein quantification. Proteins were then trypsin digested using the modified filter-aided sample preparation (FASP) technique [16, 17]. Briefly, lysate sample reduction was accomplished by incubating in 25 mM dithiothretitol (DTT) (Bio-Rad, USA) for 30 min at 60 °C, followed by 10 min of 50 mM iodoacetamide alkylation in the dark. After loading the samples onto a 10 kDa cutoff ultrafiltration membrane (Sartorius, Germany), they were incubated overnight at 37 °C with trypsin at a 1:50 enzyme-to-protein ratio. Following three 50 mM triethylammonium bicarbonate buffer (TEAB) (Sigma T7408, USA) rinses, the samples were treated with 10 min of spinning at 12,000 g. Peptide desalting was performed according to Ziptip C18 pipette tips in the manufacturer’s instructions.

After the C18 solid phase extraction column was activated and balanced with acetonitrile (CAN; Thermo A955-4, USA) and 2% ACN 0.1% formic acid (FA; Thermo A117-50, USA), the sample was loaded, followed by 10 times of pipetting, 2% ACN 0.1% FA desalination, and elution in 50% ACN 0.1% FA. The eluent was then collected into a rotary vacuum drier and refrigerated at -80 °C until use.

To build a data-independent acquisition (DIA) Spectral Library, dried peptides were subjected to resuspension in 0.1% FA and then collected and divided into samples with equal lysate quantities. The rest of the samples were used with the Biognosys iRT kit, including the preparation of a 10 × iRT buffer and the subsequent addition of it to each sample at 9:1.

High-pH reversed-phase fractionation

Additional high-pH reversed-phase chromatographic separation of digest samples was performed. The reverse chromatography column (RIGOL, Beijing, China) was utilized for the separation of mixed peptides in a 30 µg digest sample. After the dissolution of peptides in mobile phase A (100 µL; 2% (v/v) ACN, 98% (v/v) ddH2O, pH 10), the mixture was spun down for 20 min at 14,000 g.

Then the mobile phase B (98% (v/v) acetonitrile, 2% (v/v) ddH2O, pH 10) was injected into the supernatants at 1 mL/min to achieve stepwise elution in the column. Mobile phase B step gradients were used to acquire individual 1.5-minute eluant fractions.

MS acquisition

For MS analysis, we used an internally prepared analytical column (150 μm×150 mm, 1.9 μm) to evaluate each sample with a volume of 1 µg on an EASY-nLC1000 connected to an Orbitrap Fusion™ Tribrid™ MS instrument (Thermo Scientific). A binary solvent system, which was prepared by 0.1% FA in H2O (A) and 0.1% FA in ACN (B), was adopted, and the following linear gradient settings were used: 3–8% B/4 min, 8–22% B/65 min, 22–35% B/12 min, 35–90% B/4 min, 90% B/5 min.

Then the direct introduction of eluents into the MS instrument was performed using an EASY-Spray ion source, with the spray voltage and capillary temperature set at 2.3 kV and 320 °C, respectively. For data-dependent acquisition (DDA)-MS runs, the whole MS scanning ranged from 300 to 1400 m/z. The MS had a resolution of 60,000, with under 3-s top-speed mode for 15,000 resolution MS/MS scans. HCD had an isolation window and a normalized collision energy of 1.6 m/z and 32%, respectively. For DIA analyses, MS1 scans (automatic gain control (AGC) target 4e5 or 50 ms injection time) were performed from 300 to 1300 m/z, with DIA segmentation resolution of 30,000 (AGC target 5e5; for injection time). The collision energy was 32%, and the spectra were collected in profile mode.

Identification and quantification of proteins

DIA data analyses adopted Biognosys’ Spectronaut pulsar programme and the ID picker algorithm [18]. The default software settings were employed for targeted data analyses, which included dynamic iRT for retention time prediction types with window-based correction factors. The enzyme specificity was configured to target the C-terminal of arginine and lysine residues, permitting a maximum of two missed cleavages during the database search. Peptide identification was performed with an allowed initial precursor mass deviation up to 10 ppm and an allowed fragment mass deviation 0.02 Da. The search criteria comprised carbamidomethylation of cysteine as a fixed modification, along with oxidation of methionine and acetylation at the protein N-terminus as variable modifications. The peptide-level false discovery rate (FDR) was set to 1% at both the protein and peptide precursor levels. Local mass calibration was utilized, along with limitless scrambled decoy generation. We also employed an MS2-level interference connection for fragment elimination based on interference signals while retaining ≥ 3 fragments for measurement. When conducting spectral library-based studies, RAW images were converted to the Spectronaut file format and calibrated according to the global spectral library’s retention time dimension. After that, the files were used for spectrum analysis without any further retention time-based recalibration.

Then, Proteome Discoverer 2.3 was used with default settings (Trypsin/P (Promega, V5111, USA), two missed cleavages). The fixed modification and the variable modifications in the search criteria were consistent with DIA data analyses. The mass tolerances for precursor and fragment ions were also set at 10 ppm and 0.02 Da, respectively [19]. DDA data searches used UniProt human (uniprot_human_73940_20190731_iRT.fasta) and Biognosys iRT peptides fasta (uploaded to the public repository) databases as references.

Proteomic analysesStatistical analysis of the quantitative data

After minimizing biases between experiments through median normalization, protein expression differences were then evaluated using a Student’s t-test. Statistically significant DEPs were defined using p adjust < 0.05 and fold-change (FC) cut-offs of |log2(FC)| > 0.58. Data normalization and identification of DEPs were performed in the ‘Wu Kong’ platform (URL: https://www.omicsolution.com/wkomics/main/) [20].

The enrichment analysis

To provide an intuitive and comprehensive visualization and direct comparison of DEPs data, heatmap was performed. The heatmap clustering analysis parameters were as follows: the scale direction is set to genes, gene clustering is performed using the complete method, distance calculation method is Euclidean, the callback function is set to pheatmap, and rows with completely identical expression values are removed.

Gene Ontology (GO) analysis has been used extensively to identify the characteristic biological attributes of genes, gene products, and sequences, including the biological process (BP), cell components (CC), and molecular function (MF) [21]. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis provides a comprehensive set of bio-interpretation of genomic sequences and protein interaction network information [22].

In this study, GO terms and KEGG pathway enrichment analyses were automatically completed and visualized using the clusterProfiler V3.14.0 [23], pathview V1.36.0 [24], and the Goplot V1.0.2 package [25] in the R software statistical analysis platform (significance was p < 0.05 and a q-value < 0.05).

PPI network construction and hub proteins identification

The protein-protein interaction (PPI) network of the DEPs was established using the Search Tool for the Retrieval of Interacting Genes (STRING) [26]. Cytoscape was used to build the visual network of molecular interactions with a combined score > 0.15 [27]. The molecular complex detection (MCODE) plugin was applied to detect closely correlated modules from the PPI network [28]. The most significant protein module of this PPI network was visualized and displayed through the MCODE plug-in. The filtering criteria were as follows: MCODE score > 5, node score cutoff = 0.2, degree cutoff = 2, k-score = 2, and max depth = 100. In addition, the degree, edge percolated component (EPC), betweenness, and maximum neighborhood component (MNC) algorithms were useful methods for selecting hub genes or proteins from PPI network [29]. Scores of the degree, EPC, betweenness, and MNC of all nodes of the PPI network were calculated via the CytoHubba plugin. The top 10 nodes with the highest degree, EPC, betweenness, and MNC scores were selected. Finally, to increase the reliability of hub proteins, their overlapping proteins were considered to be hub proteins related to PDR.

Machine learning-based inference of optimal biomolecular combinations

Identification of optimal biomolecular combinations using iBM that included mutual DEPs selection (MDS), candidate combination generation (CCG), and final combination prioritization (FCP) were carried out as previously described [14]. First, the true negative (TN), true positive (TP), false positive (FP), false negative (FN), sensitivity (Sn), and specificity (Sp) values were calculated. Then, 5-fold cross-validation was performed. The receiver operating characteristic (ROC) curve was illustrated and the area under curve (AUC) value was calculated based on Sn and 1-Sp scores. Third, the root mean squared error (RMSE) was calculated to estimate the prediction bias of a model.

CCG was adopted to select different sets of combinations with ≤ 5 proteins. Candidate combinations were randomly generated for the proteomic data. For each candidate combination, we randomly generated a training data set and a testing data set with a ratio of 4:1. The testing data set was only used to test the performance but not for training. The least absolute shrinkage and selection operator (LASSO, L1 regularization) penalty and the ridge regression (L2 regularization) penalty in penalized logistic regression (PLR) [30,31,32], were iteratively used to optimize the weight values of the 5 proteins. To simplify the combination, proteins with a weight of 0 in the model training results were deleted. All combinations with a total AUC equal to 1 were reserved for the optimal biomolecular combinations pool, respectively. The algorithm was implemented in Python 3.7 with Scikit-learn 0.22.1.

Validation study by PRM analysis

All the hub proteins determined above and the proteins of the top 25 combinations with the smallest root mean squared error (RMSE) values and AUC value of 1 were validated by PRM in independent samples.

First, the proteins were extracted, digested and mixed samples were prepared, and the full spectrum was scanned by the “label-free” method using the EASY-nLC1200 connected to the Orbitrap Q-Exactive HF mass spectrometer (Thermo, Scientific, USA). Second, the Proteome Discoverer 2.2 software was used to search the library. The search results were imported into Skyline(version 20.1.0.155) software [33] to obtain the target protein peptide information. Then, the PRM method can be established, and the obtained data were imported into Skyline software for quantification. The parameters of PRM were set as follows: the primary resolution was 12,000 (at 300–1400 m/z) with an automatic gain control (AGC) target value of 3e6, a maximum injection time of 80 ms, and a Normalized Collision Energy (NCE) of 27%; the secondary resolution was 15,000 with an AGC target value of 2e4, a maximum injection time of 19 ms. The mass tolerances for precursor and fragment ions were also set at 10 ppm and 0.02 Da, respectively.

The ROC curve analyses of the validated combinations were examined for PDR, and the AUC of each ROC curve was calculated.

Exploration for BCVA and early PDR-related proteins

To explore whether the hub proteins and the biomolecular combinations were associated with the best corrected visual acuity (BCVA) and the early occurrence of PDR, we used Spearman’s correlation analysis to evaluate the relations between the alteration of these proteins and the clinical parameters. A p-value of less than 0.05 was considered statistically significant.

留言 (0)

沒有登入
gif