Investigating causal relationships between plasma proteins and lung adenocarcinoma: result from proteomics and Mendelian randomization study

2.1 Patient cohort

All cases came from Tongde Hospital, including 28 healthy individuals (H) and 56 LUAD patients. The basic information of all individuals is presented in Table 1. All subjects signed informed consent, and the study was permitted by the ethics committee of Tongde Hospital. A flowchart was presented in Fig. 1.

Table 1 The basic information of all participantsFig. 1figure 1

A flowchart of the study. DEPPs: differential expression plasma proteins; LUAD: Lung adenocarcinoma; H: healthy individuals; PPI: Protein–Protein Interaction

Inclusion criteria: Pathological or cytological diagnosis of LUAD; 18 ≤ Age ≥ 75; ECOG ≤ 2. and KPS ≥ 60; Life span > six months.

Exclusion criteria: There are acute and chronic diseases that affect the composition of plasma proteins; Pregnant or lactating women; Abnormal blood test: hemoglobin < 100 g/L, white blood cell < 4*109/L, neutrophils < 2*109/L, platelet < 100 g/L; Abnormal kidney function: Serum creatinine > 1.5 ULN (upper limit of normal value); Abnormal liver function: Hemobilirubin > 1.5 ULN, ALT/AST > 2.5 ULN.

2.2 Blood sample collection

The anticoagulation tube (Ethylene Diamine Tetraacetic Acid) was used to collect 5 mL blood samples from each individual. After centrifuging at 3500 rpm (15 min, 4 ℃), the plasma samples were gathered and stored at − 80 ℃.

2.3 Sample preparation

An appropriate amount of SDT lysis buffer (50 mM TEAB, 5% SDS) was added to the sample, sonicated, and soaked in boiling water for 15 min. Centrifuged at 14,000g for 15 min, filtered the supernatant using a 0.22 μm centrifuge tube, and collected the filtrate, determined with the BCA method for protein quantification. Took 200 μg of protein solution for each sample, added dithiothreitol (DTT) to a final concentration of 100 mM, and mixed and added 200 μL of UA buffer. Centrifuged at 12,500g for 25 min and discard the filtrate. Repeated this step twice. Add 100 μL IAA buffer, centrifuged at 600 rpm for 1 min, and after leaving it in the dark for 30 min, centrifuged at 12,500g for 25 min. Added 100 μL of UA buffer and centrifuged at 12,500g for 15 min twice. Added 100 μL of 0.1 M TEAB solution, centrifuged (12500 g, 15 min) twice. Added 40 μL Trypsin buffer (4 μg Trypsin, 40 μL 0.1 M TEAB solution), shaked at 600 rpm for 1 min, and placed at 37 °C for 16–18 h. Centrifuged (12,500, 15 min) and added 20 μL of 0.1 M TEAB solution. Centrifuged at 12500 g for 15 min and collected the filtrate. The peptide segments were desalinated using a C18 Cartridge, and after freeze-drying, they were re-suspended in 40 μL of 0.1% formic acid solution for quantification.

2.4 LC–MS/MS analysis

The sample was separated using Liquid chromatography (LC). LC was performed at Agilent 1260 infinity II HPLC (Palo Alto, CA, USA) with XBridge Peptide BEH C18 Column (4.6 mm X 100 mm; 5 μm). Buffer solution A: 10 ml HCOONH (4, 5% ACN, pH 10.0); Buffer solution B: 10 ml HCOONH4 (85% ACN, pH 10.0). The flow rate was set as 1 mL/min. Liquid phase gradient: 5% B to 45% B within 40 min at 30 °C temperature. Dry each sample in a vacuum concentrator for later use. Freeze dry the sample and dissolve it in a 0.1% formic acid aqueous solution to form 6 fractions.

Took 1 μg peptide segment from each Fraction, added iRT peptide segment, mixed, injected, separated with nano LC, and analyzed by online electrospray tandem mass spectrometry (MS). Liquid phase system: Easy nLC system (Thermo Fisher Scientific); MS system: Orbitrap Lumos (Thermo Fisher Scientific). Buffer solution A is a 0.1% formic acid aqueous solution, and buffer solution B is a 0.1% formic acid acetonitrile aqueous solution (acetonitrile is 80%). The sample was separated in a non-linear gradient at a flow rate of 300 nL/min through an analytical column (Thermo Fisher Scientific, Acclaim PepMap RSLC 50um X 15 cm): 0–5 min, 1% B; 5–95 min, 1% B to 28% B; 95–110 min, 28% B to 38% B; 110–115 min, 38% B to 100% B, 115–120 min, 100% B. Electric spray voltage: 2.0 kV.

2.5 Bioinformatics analysis

MS analysis of raw data was conducted using the iProteome one-stop data analysis cloud platform for qualitative and quantitative inventory analysis. Normalize the quantitative information of the target protein set with Z-score. The missing data were filled out using the K-nearest neighbor method, The basic principle of which is to calculate K observations with the nearest missing values based on Euclidean distance, and then use the weighted average of these observations to perform imputation.The Wilcoxon rank sum test was used to identify differential expression plasma proteins (DEPPs). Log fold change (FC) > 1 and P < 0.05 were set as cut-off values. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) functional annotation analysis were conducted using the R package: “org.Hs.eg.db” with adjusted P < 0.05 set as the threshold. Protein–Protein Interaction Networks (PPI) was analyzed with GeneMANIA (http://genemania.org/).

2.6 Two sample MR analysis

Genetic variants for LUAD were collected from the TRICL consortium in a genome-wide association study (GWAS) (https://gwas.mrcieu.ac.uk/) (GWAS ID: ieu-a-984), including 65 864 cases (11 245 LUAD cases and 65 864 cases) and 10 345 176 single-nucleotide polymorphism (SNP). GWAS data for Plasma protein of 35 559 cases (34 857 953 SNPs) were gathered from deCODE GRNETICS (https://www.decode.com). All participants in the study were from Europe.

A strict screening criterion (P < 5e × 10−8) was used to select genetic instrumental variables (IVs) for Plasma protein and LUAD. R2 < 0.001 within ± 10 000 kb distance was set as the thresholds of IVs clumping for independence by linkage disequilibrium (LD). The F-statistics was calculated via β divided by the square of the standard error. The IVs with F statistics > 10 would selected for further analysis. PhenoScanner V2 (http://www.phenoscanner.medschl.cam.ac.uk/) [13] was used to remove SNPs that may have violated the second and third critical hypothesis and have potential confounders and bypassing (e.g., Age, sex, smoking, other disease).

The R packages: “TwoSampleMR,” “VariantAnnotation,” and “ieugwasr” were used to conduct the two-sample MR analysis. Five different approaches were presented: “MR Egger,” “Weighted median,” “Inverse variance weighted (IVW),” “Simple mode,” and “Weighted mode.” Since it produced more reliable estimates over a wider variety of scenarios, the principal method for causal estimation—the random-effect IVW method—was chosen to represent the results of MR analysis by combining the Wald ratio of individual SNPs if the null heterogeneity was observed [14]. P < 0.05 was set as the threshold. Based on the IVW and MR Egger techniques, the degree of heterogeneity was assessed using Cochran's Q statistic. The MR-Egger intercept test was used to determine the horizontal pleiotropy, and the R package “MR-PRESSO” was used to carry out the outlier method (MR-PRESSO) [15, 16]. It was considered a statistical difference when P < 0.05. The MR-PRESSO approach will adjust for horizontal pleiotropy if it exists by eliminating outliers and comparing the significance of the differences between the original and corrected data. Leave-one-out analysis was also operated to determine if a single SNP drove or influenced the MR analysis.

2.7 Construction of a diagnostic model

The selected DEPPs were applied to develop a diagnostic model using logistic regression analysis. LUAD represents positive results, H represents negative results. Bootstrap resampling was applied for internal validation purposes. The model's receiver operating characteristic curve (ROC) was depicted. The discrimination power of the model was assessed by calculating the area under the ROC (AUC) and the calibration plot. The nomogram was constructed with the package “rms” in R 4.2.3 (https://www.r-project.org).

留言 (0)

沒有登入
gif