A total of 17,162 variants were retrieved from all three databases, of which 1,514 were from gnomAD, 802 from COSMIC, and 14,846 from Ensemble, as shown in Fig. 1A. Variants were then grouped into various categories based on their consequences, which include missense variants, frameshift variants, 5`UTR, 3`UTR, and splice variants (Fig. 1B), and from this data, 674 unique non-synonymous ADGRE2 gene variants were determined for further analyses. ADGRE2 gene is comprised of 21 exons that encode their respective amino acids. This study also found that exon 16 of the ADGRE2 gene encodes the highest number of amino acid residues, while in respect of non-synonymous variant frequency, exon number 9 contained the highest number of variants, as shown in Fig. 1C.
Fig. 1Identification of ADGRE 2 genetic variants (A) ADGRE2 variants retrieved from Ensembl, COSMIC, gnomAD (B) Total count of various ADGRE2 variants retrieved from the three databases (C) Frequency of amino acids per exon and variants per exon
Variant pathogenicity analysisThe pathogenicity of the filtered non-synonymous variants was evaluated through various computational tools mentioned in the methodology. For that purpose, the pathogenicity scores given by the tools for each variant were determined, and the pathogenicity percentage for each variant was calculated (Fig. 2A). For this study, a pathogenicity percentage of > 80% was determined as the threshold. Out of 674 initial non-synonymous variants, only one variant was found to have a pathogenicity percentage of > 80% (Fig. 2B). A thorough analysis of the selected variants led to the selection of the non-synonymous variant rs765071211 for further in silico analysis due to its highest pathogenicity percentage, i.e., 100%. This variant caused amino acid alteration (D67N) at residue number 67 in the Egfca_6 domain of ADGRE2 encoded protein EMR2.
Fig. 2Analysis of pathogenicity of ADGRE2 non-synonymous variants (A) Filtration of SNPs through various algorithms (B) Pathogenicity of non-synonymous SNPs predicted through computational tools (SIFT, PolyPhen, CADD, REVEL, MetaLR)
Effect of rs765071211 polymorphism on RNA secondary structureThe impact of selected SNP on the secondary structure of ADGRE2 mRNA was also predicted. For this purpose, minimum free energy (MFE) for the variant and wild-type mRNA were determined and compared. The secondary structure of variant mRNA exhibited a significant change compared to the wildtype mRNA, revealing a prominent impact of altered allele on the structure and stability of ADGRE2 mRNA. MFE value determined for the wildtype and rs765071211 variant demonstrated that reference allele C was stabilizing in nature with a low MFE value (-37Kcal/mol) while its corresponding variant allele T destabilized the mRNA secondary structure and had a high MFE value of 36Kcal/mol (Fig. 3). Decreased mRNA stability can alter the levels of the translated proteins [40].
Fig. 3Effect of ADGRE2 variant on its RNA secondary structure. The lower the value of MFE, the higher the stability of mRNA secondary structure and vice versa
Structure prediction and validationEMR2 is the protein encoded by the ADGRE2 gene. The structure of EMR2 was predicted through I-TASSER, which uses the threading technique to predict the structure of the protein. I-TASSER predicted five models of EMR2 protein, and the model with the highest C-score value was selected for further analysis. The 3-dimensional protein structure was visualized through PyMOL. The predicted structure was also validated through ERRAT. The overall quality factor of the model was determined to be 94.4809. According to the Ramachandran plot analysis conducted through PROCHECK, 88.4% of residues were in the highly favored region, followed by 10.4% of residues that were in the additional allowed region, and only 0.3% of the residues were found to be in the disallowed region (Fig. 4A). D67N variant 3D structure was attained by inducing amino acid modification in PyMOL through the mutagenesis plugin. The wildtype and variant proteins were superimposed, and it was determined that there was a high degree of structural deviation in the variant protein compared to the wildtype, as indicated by the RMSD value of 6.34Å (Fig. 4B). Higher values of RMSD indicate that there is a higher structural dissimilarity between the two protein structures [41].
EMR2 protein domains were also analyzed through InterPro, and their constituting amino acids were highlighted in PyMOL (Fig. 4C). It was revealed that EMR2 had seven domains. Domain Egf-like 1 contained amino acids from 28 to 66. Amino acid residues from 67 to 118, 119–162, 163–211, and 212–260 were part of Egfca-like 2, Egfca-like 3, Egfca-like 4, and Egfca-like 5 domains, respectively. GPS_3 domain was comprised of 52 amino acid residues from 478 to 529. Lastly, it was found that 7tmB2 EMR was the largest domain with 262 amino acids from 533 to 795.
Fig. 4Prediction and validation of EMR2 3D structure. (A) Ramachandran plot of EMR2 predicted structure showing 98.8% residues I allowed region. (B) Superimposition of native EMR2 (cyan) and D67N variant (magenta), RMSD was 6.34Å. (C) ADGRE2 Domains including egf_5 (raspberry color), egfca_6 (1)(marine blue), egfca_6 (2)(sand color), egfca_6 (3) (magenta color), EGF_3, GPS_3 (cyan color ) and 7tmB2 EMR domain (orange color)
Stability analysisIt is a general consensus that the majority of pathogenic non-synonymous SNPs affect protein structure stability. The effect of variant D67N on EMR2 stability was determined through various computational tools, including MUpro, I-Mutant, and Dynamut, and the predicted free energies for the variant were compared. I-Mutant2.0 indicated a DDG value of -0.44 kcal/mol for the D67N variant, indicating that the stability of the variant was decreased. Free energy analysis by MUpro also showed similar results, indicating that the stability of the variant was decreased with DDG value of -1.1507558 kcal/mol. Results from DynaMut 2 revealed that D67N decreased the DDG value of the protein (-0.04 kcal/mol), lowering its stability. All three computational tools predicted the D67N variant of EMR2 as destabilizing in nature (Fig. 5), which can potentially disrupt the normal structure and functioning of the protein.
Fig. 5Stability analysis of D67N through I-Mutant 2.0, MUpro, and Dynamut 2
Molecular characteristics analysisThe molecular characteristics of non-synonymous SNP D67N were also investigated. Intramolecular interactions of the wild-type and variant amino acids were determined and compared through DynaMut 2. It was found that wildtype amino acid made nine intramolecular bonds (two hydrophobic bonds, four polar bonds, and three hydrogen bonds), while the variant amino acid made five intramolecular interactions, including two hydrogen bonds and three polar bonds (Fig. 6A). Project HOPE revealed that the D67N variation resulted in the change in protein net charge. The wildtype residue was a negatively charged amino acid, which was substituted with a neutral amino acid. Furthermore, the analysis also showed that this SNP can also result in the loss of interactions with other molecules and abolish the function of the protein (Fig. 6B).
Fig. 6Molecular characteristics of wildtype and D67N EMR2 proteins. (A) Analysis and comparison of intramolecular interactions of wildtype and variant amino acids with neighboring residues. (B) Structural analysis through Project HOPE indicating that negatively charged aspartic acid (D) mutates into electrochemically neutral amino acid asparagine (N)
Evolutionary conservation analysisAmino acids located in the biologically active regions demonstrate high sequence conservation. Any variation within these residues can result in disruption of the normal biological activities of the protein. ConSurf server was used to evaluate the evolutionary conservation of the EMR2 protein at individual amino acid residues. This tool gave a complete analysis of the EMR2 protein. However, only that amino acid residue was focused, which was selected as the most highly pathogenic non-synonymous SNP. ConSurf analysis revealed that Aspartic acid at residue number 67 was highly conserved with a conservation score of 9. Furthermore, ConSurf also predicted D67N as a conserved and exposed residue with high functional significance (Fig. 7). The presence of highly conserved amino acids on the surface of the protein assists in showing their structural or functional significance.
Fig. 7Illustration of evolutionary conservation of EMR2 amino acid residues
Sub-cellular localizationSubcellular localization of ADGRE2 encoded protein was also predicted, and Deeploc 1.0 gave the potential localization sites as well as the likelihood scores. It was revealed that EMR2 is a membrane-associated protein, and it is primarily allocated in the plasma membrane of cells, with a likelihood score of 0.9995, which is in line with the literature, as ADGRE2 is famous for its significant role in cell-cell interaction. The inner workings of the cell are also shown in Fig. 8.
Fig. 8Subcellular localization of EMR2 protein
Genotype analysisGenotype analysis was performed on extracted DNA samples from both controls and CML patients to determine the presence of rs765071211 (C/T) in the ADGRE2 gene. For this purpose, ARMS PCR was used. The distribution frequency of both alleles of ADGRE2 genetic variant rs765071211 for CML positive samples and control are given in Table 3. This study showed that wild-type genotype CC showed no statistical significance with either control or disease group. In contrast, variant genotype TT showed statistical significance (P = < 0.005), and it was associated with an elevated risk of CML with an Odds ratio (OR) of 7.278 and a relative risk (RR) value of 2.381. In contrast, the heterozygous genotype CT was also found to be statistically significant (P < 0.005), but it was found to have a protective effect in this regard (RR = 0.3000; OR = 0.1250).
Table 3 Genotype analysis of rs765071211 (C/T) through ARMS PCRAllele frequencies for ADGRE2 variant rs765071211 were also determined (Table 3). It was revealed that the frequency of reference allele C was higher in the healthy controls compared to the CML group and had a protective effect (P < 0.005, OR = 0.3245, RR = 0.5759). On the other hand, allele T was more abundantly present in the CML group (72.41%) compared to the control group and was found to be significantly associated with the disease (OR = 3.082, RR = 1.737, P < 0.005).
ADGRE2 polymorphic variant was also compared in CML patients and control with respect to gender (Table 4). Variant genotype TT was found to be statistically significant in both males and females and showed an association with the disease. Meanwhile, heterozygous genotype CT was found to have a significant protective effect in both sexes (Table 4) Comparison of ADGRE2 polymorphismrs765071211 (C/T) in CML patients and control with respect to gender.
Table 4 Genotype analysis on the basis of gender
留言 (0)