TropicalMed, Vol. 7, Pages 397: Screening and Analysis of Serum Protein Biomarkers Infected by Coronavirus Disease 2019 (COVID-19)

1. IntroductionAs per the World Health Organization (WHO), COVID-19 is a rapidly spreading disease caused due to the infection of the 2019 new acute coronavirus (severe acute respiratory syndrome coronavirus-2, SARS-CoV-2) [1,2]. The continuous spread of this disease is the third coronavirus pandemic worldwide in the past two decades. SARS-CoV-2, SARS-CoV, and the Middle-East respiratory syndrome coronavirus (MERS-CoV) belong to the genus that caused the pandemic [3]. However, SARS-CoV-2 has unique infection characteristics, such as strong infectivity [4], long incubation period [5], general population susceptibility [6], no obvious infection symptoms in the early stage, and asymptomatic patients [7]. These features will be more conducive to the spread of SARS-CoV-2 among people. The basic regeneration number (R0) of SARS-CoV-2 is more than SARS-CoV and MERS-CoV without human intervention [8]. Furthermore, SARS-CoV-2 can be transmitted through close contact transmission, droplet transmission, fecal mouth transmission, and other methods [9,10]. Therefore, in-depth research on SARS-CoV-2 to identify the differentially regulated biomarkers and target proteins is essential.The serum/plasma contains thousands of proteins produced by various tissues/cells of the body. Changes in the concentration, structure, and function of serum proteins suggest an abnormal pathophysiological state [11]. Therefore, the serum proteome study of patients infected with viruses helps discover the biomarkers of viral infections, establish early diagnostic methods, monitor disease progression, and predict therapeutic effects [12]. Proteomics technology is being widely used to study the interaction between viruses and human plasma. For example, Pang et al. reported two valuable SARS-CoV diagnostic markers: the complement C3c α chain N-terminal fragment and the fibrinogen α-E intra-chain fragment with specificity and sensitivity of over 95% [13]. Kang et al. employed four specific differentially expressed proteins (DEPs) in the serum of patients with acute SARS infection as variables to establish the diagnostic method of acute SARS proteome fingerprints using the decision tree classification algorithm. A double-blind study showed that the accuracy of the diagnostic method for identifying acute SARS and non-acute SARS samples was more than 95% [14]. Moreover, Wan et al. compared and analyzed the SARS case progression stage, recovery stage, and normal human plasma proteome [15]. The results indicated that SARS-CoV infection activates an excessive immune response, and enhanced inflammation may play a crucial role in the disease progression. Furthermore, the proteome of the plasma of patients infected with SARS at different onset times and the plasma of healthy participants was analyzed [16]. Quantitative proteomic methods can reveal the proteomic changes in the bronchoalveolar lavage fluid of critically ill patients with COVID-19, and help to screen out proteins that may be protein markers or therapeutic targets of COVID-19, thus providing new information for the research of anti-inflammatory drugs related to COVID-19 and the exploration of the molecular mechanism of the host response [17].

In this study, we have employed liquid chromatography-tandem mass spectrometry (LC-MS/MS) and high-performance liquid chromatography (HPLC) to probe the potential protein changes in the serum of the SARS-CoV-2 infected group compared with the healthy group. DEP and their functions were identified. The results obtained are helpful for the further study of the mechanisms of SARS-CoV-2 infection.

2. Materials and Methods 2.1. Serum Samples

The Ethics Committee of the Beijing Center for Disease Prevention and Control reviewed and approved all experimental protocols. The code is 2020031. Blood samples were collected from six COVID-19 patients and six healthy people, followed by serum isolation for further analysis. All patients signed the informed consent. The COVID-19 infection was diagnosed by the real-time RT-PCR recommended by China CDC in six male patients aged between 30 and 40 years. The six healthy people were also all male and aged between 30 and 40 years. All samples underwent liquid nitrogen snap freezing and were kept at −80 °C. All samples were initially centrifuged at 12,000× g for 10 min at 4 °C to be rid of the cellular debris. The PierceTM Top 12 Abundance protein Depletion spin columns kit was used to separate 12 highly abundant proteins from the supernatant after it had been transferred to a different centrifuge tube (Thermo Fisher, Waltham, MA, USA). Following the manufacturer’s instructions, the manufacturer’s BCA kit was used to measure the total protein. This study takes into account the high variability of characteristics caused by the SARS-CoV-2 virus in human beings and the homogeneity of the research object when screening subjects because it attempts to infer the generality from SARS-CoV-2 infected patients. Therefore, the severity of these six patients is the same; they are all mild patients, and they do not need oxygen treatment.

2.2. Trypsin Digestion

The protein sample was incubated with dithiothreitol (5 mM) at 56 °C for 30 min. Then, the protein sample was supplemented with iodoacetamide (11 mM) in the dark at room temperature for 15 min and lysed. Later, the samples were added with 100 mM TEAB, keeping the urea concentration below 2 M. Finally, trypsin was used for final digestion at two different mass ratios. First, digestion was performed at a 1:50 ratio overnight, followed by a 1:100 mass ratio for 4 h.

2.3. HPLC Analysis

A high pH reverse phase HPLC was used to extract the tryptic peptide fractions by the Agilent 300 extend C18 column (4.6 mm ID, 250 mm length, 5 μm particle). An initial 60 fractions of the peptides were extracted using an 8 to 32% gradient of acetonitrile (pH 9.0) in 60 min. The total fractions were combined into 18 fractions and vacuum dried through centrifugation.

2.4. LC-MS-MS Analysis

Tryptic peptides were dissolved in solvent A, which contains 0.1% formic acid. They were then loaded onto a reverse-phase analytical column. An increasing gradient was then made using solvent B, which contains 0.1% formic acid. The rising gradient was made up of a mixture of varying degrees of formic acid, starting from around 6 to 23% and going up to 80% in 3 min. Using an EASY-nLC system, the flow rate of 400 nL per minute was maintained. After the initial nanospray ionization process, the samples were subjected to multiple MS/MS tests using the Q ExactiveTM Plus platform from Thermo Scientific. The electrospray was set to 2.0 kV. The intifcation of the peptide was performed using an Orbitrap, which produced a resolution of 70,000 to 1800. The NCE setting was used to select the peptides for the MS/MS tests. Orbitrap was then used to achieve a resolution of 17,500. The data dependence procedure was performed on different scans with a dynamic exclusion of around 15 s. The fixed mass was then adjusted to 100 m/z.

2.5. Database Search

The data from the MS/MS were processed using Maxquant v. 1.52.8. Tandem MS data were thoroughly compared to the reverse decoy database and the SwissProt Human database. Trypsin/P was chosen as the cleavage enzyme for the two uncompleted cleavages. The mass tolerance of the precursor was set at 20 ppm for the initial search and was thereafter decreased to 5 ppm for the main search, with the mass tolerance of the fragments adjuted to 0.02 Da. The carbamidomethyl on the Cys residue was designated as a permanent modification, whereas the oxidation on the Met residue was designated as a variable modification. The minimum score was changed to >40, and the FDR value was set at 1%.

2.6. Gene Ontology AnalysisThe gene ontology annotation (GOA) (https://www.ebi.ac.uk/GOA/index, accessed on 26 January 2021) program aims to provide annotations to protein databases in UniProt. Initially, the identified protein is converted into UniProt ID, followed by mapping to GO IDs. In the case of an unannotated protein ID, the InterProScan program was used to annotate the protein based on the protein function that was hypothesized by the sequence alignment approach. Then, according to GOA, these proteins are divided into three groups: cell component, biological process, and molecular function. A Fisher’s exact test (two-tailed) was implemented to examine the enrichment of the proteins in comparison to the identified proteins, and p 2.7. Pathway AnalysisTo predict the functional role of the discovered proteins, the Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.genome.jp/kegg/, accessed on 26 January 2021) was implemented. In order to annotate the proteins, the KAAS’s online tool was used. These annotation results were mapped using the KEGG mapper against the KEGG database. According to the KEGG website, the putative pathways are organized hierarchically. Fisher’s exact test with two-tailed results was used to assess the statistical significance of the annotation, with a p-value 2.8. Protein Domain AnalysisBased on the protein sequence alignment approach and the InterPro domain database (http://www.ebi.ac.uk/interpro/, accessed on 26 January 2021), Inter-ProScan (a sequence analysis application) annotated the functional domain description of the detected proteins. The enrichment of the DEPs versus all the detected proteins was examined using a two-tailed Fisher’s exact test after searching the InterPro database for each category of proteins. The protein domains were deemed significant if their p-value was less than 0.05. 2.9. Subcellular Localization

The subcellular location of the identified proteins was identified by the WoLF-PSORT-protein subcellular localization prediction tool. This tool predicts the subcellular location on the basis of amino acid residues of the protein. The program is an upgrade of the previous PSORT/PSORTII program. For the prokaryote species, the CELLO program was implemented to predict the subcellular location.

2.10. Enrichment-Based Clustering

To predict the hierarchical distribution of the identified protein based on the functions including GO, complex, pathways, and domain, all the categories, along with the p-values, were collated. The categories were narrowed down to ones that were enriched in at least one cluster with p < 0.05. This matrix of p-values was converted as the function x = −log10 (p-value). These x values were converted as the z function for each category. Finally, the z scores were clustered in genesis by a one-way hierarchical clustering. Using the “gplot” function of the R-package, cluster membership was depicted as a heatmap.

2.11. Protein–Protein Interaction Network

To gain an understanding of the protein–protein interaction, all the proteins identified were searched in the STRING database 10.5. The search was limited to proteins identified only and excluded external proteins. STRING results are presented as the confidence score to quantitatively define the protein–protein interaction; a confidence score > 0.7 was categorized as high confidence. Thus, we selected all interactions with a confidence score of seven, and the interaction network was visualized by the “networkD3” function of the R-package.

4. Discussion

At present, the novel coronavirus has spread to the whole world, causing a serious impact on global public health and the economy. Its appearance also made scientific researchers make unremitting efforts to fight against it with determination of daring to fight. The data shows that among COVID-19 patients, severe patients account for the majority of deaths, but the current clinical medical indicators are not observed in a timely manner, so early detection (prediction) and effective treatment of severe patients are crucial. The industry has found that there are many unique molecular changes in the serum of severe patients with COVID-19, and a series of biomarkers have been found, which are expected to provide guidance for predicting the development of mild patients to severe patients.

The rapid progress of proteomics provides a new approach to looking for serum molecular markers in patients with a viral infection. Tan et al. reported more than 20 proteins in the SARS-CoV genome [17]. Jiang et al. employed DIGE technology to analyze Vero-E6 cell lines infected with SARS-CoV for the first time and identified 355 DEPs [18]. Of these 355 proteins, 186 proteins were significantly and differentially expressed, which provided clues for understanding the SARS-CoV infection and pathogenic mechanisms [19]. Meanwhile, SILAC quantitative analysis of SARS-CoV-positive BHK21 cells showed that BAG3 restricts the replication of SARS-CoV [20]. Moreover, the analysis of the serum proteome of patients positive for SARS-CoV was helpful in discovering the biomarkers that can be used for the diagnosis, prognosis, and treatment [21]. Therefore, we have employed proteomics to probe the potential protein expression changes between SARS-CoV2-infected patients and healthy people in this study.In the serum/plasma, the presence of high-abundance proteins, such as serum albumin, transferrin, binding globin, immunoglobulin, and lipoprotein, interferes with the identification of low-abundance biomarkers/proteins [22]. The concentration of different proteins in the serum varies greatly, and 22 of the main proteins account for 99% of the total serum protein. In human serum, it is estimated that there are more than 10,000 proteins, of which the major portion is found in low concentrations [23]. Therefore, reducing the complexity of serum samples, such as removing serum/plasma high-abundance proteins, is necessary to identify potential low-abundance disease-associated proteins. In this analysis, dyes, and protein A were used to remove high-abundance protein albumin in human serum samples. Additionally, we used Y3 ultrafiltration tubes to perform ultrafiltration centrifugation on the samples with an aim to effectively reduce the concentration of salt ions and lipids in the sample, thereby eliminating the effect of these interfering substances on isoelectric focusing.The study identified a total of 24 DEPs. Among them, 10 protein expressions were increased, and on the other hand, 14 protein expressions were reduced. S100A9, an up-regulated protein in this analysis, is a member of the S100 calcium-binding protein family. The gene encoding the S100A9 is located at 1q21. The chromosome stability of this segment is poor, and various chromosome rearrangements easily occur. Previous reports suggest that this molecule can be involved in the final differentiation of epithelial cells [24]. S100A9 often forms a heterodimeric complex with S100A8 in a fold–fold symmetry resulting in the immunogenic protein-calprotectin [25]. The complex was initially known to be secreted by neutrophils. Subsequent studies have shown that the complex plays a crucial role in chronic and acute inflammation. The molecule was shown to participate in a variety of inflammatory reactions, which helps the host clear tumor and diseased cells [26]. PIGR, a down-regulated protein of the analysis, is an important component of mucosal immunity. This molecule is mainly distributed on the surface of the respiratory tract, gastrointestinal tract, and reproductive tract of the organisms. This molecule can be secreted into the mucosal cavity through endocytosis to form secreted immunoglobulin IgA, which can bind bacteria, viruses, parasites, and protoxins [27]. A study showed that the binding of the secreted immunoglobulin IgA to HIV surface capsid protein, which can hinder the adhesion between HIV and cells, subsequently halts intracellular viral replication [28]. Meanwhile, another study suggested that the secretory component can prevent the degradation of neutrophil elastase and further enhance the humoral immune effect of the respiratory tract, which is adapted to the virus infection [29].The subunit C4B of complement C4 is also found to be down-regulated in virus-interfering serum. C4 is the main component of the classical pathway of complement. C4B preferentially combines with hydroxyl groups to form ester bonds, which is very important for the formation of complement immune complexes with soluble protein antigens [30]. Studies have shown that C4B genetic defects are associated with human immune complex diseases [31]. Aberations in the complement system are associated with diseases such as the dysfunction of the coagulation system [32]. Therefore, complement components are important regulators of inflammatory responses and immune responses. The reduction in serum C4B caused by SARS-CoV-2 infection needs to be studied in detail. Furthermore, IL6R protein expression was up-regulated in the infection group. IL-6 is a very important factor in protein synthesis, such as C-reactive protein. Similar to our results, another study has also shown that serum levels of IL-6 increase in acute and chronic inflammatory diseases [33]. Pathogens could stimulate endothelial cells and vascular components to produce IL-1β and/or TNF-α and induce the production of large amounts of cytokines, such as IL-6, which in turn causes the proteolysis of IL-6R. Subsequently, IL-6 and IL-6R drive the activation of the IL-6 transduction pathway in internal tissue cells by inhibiting leukocyte recruitment factors (CXCL1, CXCL8, and CX3CL1) and enhancing the activity of cytokines (CCL2, CCL8, CXCL5, and CXCL5) [34]. Therefore, IL-6 and IL-6R essentially play a role in controlling and alleviating acute neutrophil exudation in the human immune system.

Notably, IGLV3-19, IGLV3-1, and IGLV5-45 were also up-regulated during this analysis. These protein molecules originated from the immunoglobulin lambda light chain variable region, which is closely related to antigen–antibody reactions and drug target development.

留言 (0)

沒有登入
gif