記住我
(O/E)XpY=[f(XpY)/f(X)f(Y)]×G
(1)
where f (XpY) = observed frequency of dinucleotide, f (X) = frequency of nucleotide X, f (Y) = frequency of nucleotide Y, G = Genome length.ZAP-binding motifs (i.e., C(nm)G(n)CG, where m = 4/5/6/7/8) in the sequences were found using re module (v 2.2.1). GC content was calculated using Biopython’s (v 1.79) built-in method.
The multiple sequence alignment (MSA) of 2616 sequences was created using mafft v7.490 [19] in a single step. Each sequence was individually aligned to the H77 reference genome (Accession ID: NC_004102.1). A python script was used to generate mapping between pre-alignment and post-alignment reference sequence positions. The resulting MSA with a total 2616 full length sequences was used to analyse the number of sequences that lost a ZAP-binding motif from each of the ZAP-binding motif sites (i.e., n = 258 ZAP-binding motif sites present in the H77 HCV reference sequence). The violin plots, line plots, scatter plots, and bar plots were created using seaborn (v 0.11.2). The moving average plot for the number of CpGs and number of ZAP-binding motifs in the reference sequence were generated in seaborn (v 0.11.2), and the calculations were done using the pandas’ (v 1.3.5) rolling function with a window size of 500 bp (with window labels set as the centre of the window and no points in the window were excluded from calculations). Percentages of mononucleotides and dinucleotides were plotted along a time axis of 1-year intervals. Since multiple sequences are reported each year, a 95% confidence interval band was plotted alongside the mean values. The boxplot within the violin plots depict the lower quartile, the median, and the upper quartile. pscipy (v 1.7.3) to compare the medians in the violin plots, and Pearson’s correlation coefficient was determined for the numbers of CpGs and ZAP-binding motifs using scipy (v 1.7.3). Bar plots were created using seaborn (v 0.11.2) by extracting the median values of the violin plots for obtaining the loss of CpG motifs and ZAP-binding motifs, and the number of CpGs and ZAP-binding motifs were plotted as in reference sequence. The codes used for the analysis in this study are available at the Github repository (https://github.com/iamakhilverma/hcv_seqs_analysis.git; uploaded on 7 December 2022.). The statistical calculations for the barplots, Chi-square tests, were performed in Graphpad. 4. DiscussionHigh mutation rates, recombination, and mutations in the virus polymerase have been associated with the high genetic diversity of the HCV genome [22]. As a result, HCV genotypes may have up to 30% genetic diversity at specific genomic regions [23]. In addition to subtypes within a genotype, HCV quasispecies or viral variants with an infected host adds to its diversity [24]. Studies on CpG depletion in other RNA viruses have provided interesting insights on virus evolution, pathogenesis, and adaptation to the host [25,26,27,28]. Therefore, the evolution of CpGs HCV in humans over the last four decades represents an interesting but yet unexplored opportunity. We analysed a total of 2616 HCV genomes from 1977 to 2021 and found a significant reduction in CpG numbers, CpG O/E ratios, and ZAP-binding motifs over time. Contemporary HCV genomes have significantly reduced CpG content and ZAP-binding motifs as compared to historical HCV sequences. These findings suggest a role for CpG depletion in shaping the evolution of HCV. Previous studies have shown that CpG depletion in virus genomes is pronounced during host adaptations [7,27]. In addition, CpG content remains stable for well-adapted human viruses such as influenza B virus [27]. Our findings indicate that CpG content for HCV genomes still appears to be evolving, suggesting ongoing adaptations to the human host. This is consistent with a report that suggests that the most common ancestor of HCV (subtype 1b) infections in humans may date back to early 1900s [29].The trend of declining CpG numbers, CpG O/E ratios, and the number of ZAP-binding motifs in the HCV genome over time was briefly reversed during the period 2013–2015 (Figure 1D–F). An increase in CpG content and ZAP-binding motifs in HCV genomes is evident from 2013–2015. Interestingly, this period overlaps with the timeline for the approval of combination DAA therapy [30]. The introduction of antiviral drugs may limit the genetic diversity of viruses in the host, as only a small subset of the virus population with resistant mutations are able to survive. This genetic bottleneck may also lead to the emergence of new drug-resistant variants [31]. We speculate that evolutionary constraints associated with the introduction of combination DAA therapy for HCV may have impacted the evolution of CpGs and ZAP-binding motifs from 2013 to 2015. Previous reports indicate that Ribavarin (anti-HCV agent) leads to the accumulation of mutations at specific genomic locations [32]. Furthermore, some of the DAA anti-HCV drugs target the HCV RNA dependent polymerase [30], which may directly impact the type of mutations occurring in the HCV genome Apart from ZAP-mediated selection pressures, other selection pressures including TLR7-mediated immune selection pressures [7], host-specific selection pressures [12], and tissue-specific selection pressures [26] may be associated with the depletion of CpGs in RNA viruses. Therefore, the number of CpGs may not necessarily correlate with the number of ZAP-binding motifs for a given RNA virus. Nonetheless, we found a good correlation between the CpG numbers and the number of ZAP-binding motifs in HCV genomes (Figure 2). This finding suggests that CpG numbers in HCV genomes may be surrogates for the number of ZAP-binding motifs. The role of ZAP-mediated selection pressures in shaping RNA virus evolution has not been well studied. In SARS-CoV-2, the depletion of CpGs has been primarily attributed to pressures acting outside the ZAP-binding motifs. Our finding on the role of ZAP-mediated selection pressures as a major driver of HCV evolution (Figure 3) highlights that the CpG depletion in RNA viruses infecting humans is due to fundamental differences in evolutionary pressures. The underlying reasons for contrasting roles of ZAP-mediated selection among human viruses remain elusive. Liver is one of the human tissues where ZAP is highly expressed (Tissue atlas) [33]. A potential role for tissue-specific expression of ZAP and the necessary co-factors for ZAP-mediated restriction merits further investigation.Among the HCV genes, the HCV core gene is enriched for both CpGs and ZAP-binding motifs. Although ZAP-binding motifs in the HCV genome are depleted with time due to selection pressures, ZAP-binding motifs within the HCV core gene appear to be well conserved (Figure 4). The HCV core protein is a basic protein that interacts with HCV RNA, and oligomerizes and facilitates virus assembly [34]. In addition, the HCV core protein is a nucleic acid chaperone [35]. Mutations and deletions in the N-terminus of the HCV core has been shown to impact virus assembly [36]. We have not identified the specific reasons for the conservation of ZAP-binding motifs in the HCV core gene. Nonetheless, the selective conservation of ZAP-binding motifs in specific genes in virus genomes may indicate the existence of yet unknown constraints that minimize the loss of CpGs/ZAP-binding motifs. Importantly, this finding also suggests that the benefits of retaining CpGs/ZAP-binding motifs over the survival/replication advantages are associated with escaping ZAP-mediated restriction in the host. We also found that the loss of CpGs within the HCV core gene occurs primarily outside ZAP-binding motifs (Figure 5), suggesting the existence of gene-specific differences in selection pressures.
留言 (0)