Pathogens, Vol. 12, Pages 43: Selective Depletion of ZAP-Binding CpG Motifs in HCV Evolution

1. IntroductionHepatitis C infection is a common cause of chronic liver disease and cirrhosis worldwide [1,2]. The current estimates of the global prevalence of hepatitis C virus (HCV) infection show that 0.7% of the global population is infected by HCV, accounting for 56.8 million cases as of January 2020 [3]. The WHO estimated that in 2019, around 290,000 people died from hepatitis C mostly from cirrhosis and hepatocellular carcinoma [4]. The Hepatitis C virus is a positive sense strand RNA virus belonging to the family Flaviviridae. Although there are 7 genotypes with over 65 subtypes, genotype 1 is the most prevalent, followed by genotypes 2 and 3 [5]. Other genotypes are geographically restricted [6].The abundance of the CpG dinucleotides in viruses and their association with virus evolution and host-evasion strategies have been explored for both RNA [7,8] and DNA viruses [9,10]. The CpG content differs significantly amongst viruses [11,12]. The depletion of CpG dinucleotides in DNA viruses is attributed to several mechanisms, including the stimulation of TLR-9 immune responses and deamination of cytosines that are methylated by DNA methyl transferases. [10,11]. Although CpG dinucleotides in RNA virus genomes are not methylated, most RNA viruses infecting humans are CpG depleted [12]. A recent report identified selective binding of Zinc-finger Antiviral Protein (ZAP) to CpG enriched motifs in viral genomes leading to the restriction of virus replication [13]. Furthermore, the sequence of specific ZAP-binding motifs has also been identified [14]. The underlying mechanisms of ZAP-mediated restriction of RNA viruses is not well understood. CpG depletion in SARS CoV2 has been linked to ZAP-mediated selection pressure [7].The majority of individuals infected with HCV progress to chronic HCV infection. Therefore, HCV has long-term interactions with the human host. Although the impact of HCV on the methylation of host genes has been studied [15], the loss of CpG dinucleotides from the HCV genomes during the last four decades of evolution in humans has not been studied. Furthermore, the role of ZAP-mediated pressures, if any, in the evolution of HCV remains unknown. The scenario for the treatment of HCV infection has changed rapidly since the introduction of Direct-Action Antivirals (DAA), which was first approved for use in 2011 for genotype 1 virus variants, after which second-generation DAAs were introduced in December 2013 [16]. Most patients on DAA regimens attained sustained virological response (SVR) [17], however, treatment failure was seen in about 5–10% [18]. Moreover, DAA-mediated SVR does not eliminate the risk of development of hepatocellular carcinoma (HCC). The availability of over 3000 complete HCV genome sequences allows us to investigate CpG depletion, ZAP-mediated selection pressures, and the effect of DAAs, if any, on virus evolution. 2. Materials and MethodsWe retrieved 3983 complete HCV genomes available in the LANL’s HCV database (https://hcv.lanl.gov/; accessed on 28 September 2022; Last GenBank update: 1 July 2022). We used the default conservative criteria of the database to exclude sequences with too many Ns (high content of non-ACTG characters), contaminants (likely contamination with a laboratory strain), synthetic sequences, sequences containing an artifactual deletion of >100 NTs, and tiny sequences (Supplementary Table S1).The mono and dinucleotide frequencies were calculated as percentages from the sequence length, excluding inserts (-) and Ns from the MSA of HCV sequences. The dinucleotide O/E ratio is a normalized abundance of dinucleotides against the constituent mononucleotides, which helps with understanding whether the changes in dinucleotide frequency compared to the changes in the constituent mononucleotides. The dinucleotides O/E ratios were calculated using the formula:

(O/E)XpY=[f(XpY)/f(X)f(Y)]×G

(1)

where f (XpY) = observed frequency of dinucleotide, f (X) = frequency of nucleotide X, f (Y) = frequency of nucleotide Y, G = Genome length.

ZAP-binding motifs (i.e., C(nm)G(n)CG, where m = 4/5/6/7/8) in the sequences were found using re module (v 2.2.1). GC content was calculated using Biopython’s (v 1.79) built-in method.

The multiple sequence alignment (MSA) of 2616 sequences was created using mafft v7.490 [19] in a single step. Each sequence was individually aligned to the H77 reference genome (Accession ID: NC_004102.1). A python script was used to generate mapping between pre-alignment and post-alignment reference sequence positions. The resulting MSA with a total 2616 full length sequences was used to analyse the number of sequences that lost a ZAP-binding motif from each of the ZAP-binding motif sites (i.e., n = 258 ZAP-binding motif sites present in the H77 HCV reference sequence). The violin plots, line plots, scatter plots, and bar plots were created using seaborn (v 0.11.2). The moving average plot for the number of CpGs and number of ZAP-binding motifs in the reference sequence were generated in seaborn (v 0.11.2), and the calculations were done using the pandas’ (v 1.3.5) rolling function with a window size of 500 bp (with window labels set as the centre of the window and no points in the window were excluded from calculations). Percentages of mononucleotides and dinucleotides were plotted along a time axis of 1-year intervals. Since multiple sequences are reported each year, a 95% confidence interval band was plotted alongside the mean values. The boxplot within the violin plots depict the lower quartile, the median, and the upper quartile. pscipy (v 1.7.3) to compare the medians in the violin plots, and Pearson’s correlation coefficient was determined for the numbers of CpGs and ZAP-binding motifs using scipy (v 1.7.3). Bar plots were created using seaborn (v 0.11.2) by extracting the median values of the violin plots for obtaining the loss of CpG motifs and ZAP-binding motifs, and the number of CpGs and ZAP-binding motifs were plotted as in reference sequence. The codes used for the analysis in this study are available at the Github repository (https://github.com/iamakhilverma/hcv_seqs_analysis.git; uploaded on 7 December 2022.). The statistical calculations for the barplots, Chi-square tests, were performed in Graphpad. 4. DiscussionHigh mutation rates, recombination, and mutations in the virus polymerase have been associated with the high genetic diversity of the HCV genome [22]. As a result, HCV genotypes may have up to 30% genetic diversity at specific genomic regions [23]. In addition to subtypes within a genotype, HCV quasispecies or viral variants with an infected host adds to its diversity [24]. Studies on CpG depletion in other RNA viruses have provided interesting insights on virus evolution, pathogenesis, and adaptation to the host [25,26,27,28]. Therefore, the evolution of CpGs HCV in humans over the last four decades represents an interesting but yet unexplored opportunity. We analysed a total of 2616 HCV genomes from 1977 to 2021 and found a significant reduction in CpG numbers, CpG O/E ratios, and ZAP-binding motifs over time. Contemporary HCV genomes have significantly reduced CpG content and ZAP-binding motifs as compared to historical HCV sequences. These findings suggest a role for CpG depletion in shaping the evolution of HCV. Previous studies have shown that CpG depletion in virus genomes is pronounced during host adaptations [7,27]. In addition, CpG content remains stable for well-adapted human viruses such as influenza B virus [27]. Our findings indicate that CpG content for HCV genomes still appears to be evolving, suggesting ongoing adaptations to the human host. This is consistent with a report that suggests that the most common ancestor of HCV (subtype 1b) infections in humans may date back to early 1900s [29].The trend of declining CpG numbers, CpG O/E ratios, and the number of ZAP-binding motifs in the HCV genome over time was briefly reversed during the period 2013–2015 (Figure 1D–F). An increase in CpG content and ZAP-binding motifs in HCV genomes is evident from 2013–2015. Interestingly, this period overlaps with the timeline for the approval of combination DAA therapy [30]. The introduction of antiviral drugs may limit the genetic diversity of viruses in the host, as only a small subset of the virus population with resistant mutations are able to survive. This genetic bottleneck may also lead to the emergence of new drug-resistant variants [31]. We speculate that evolutionary constraints associated with the introduction of combination DAA therapy for HCV may have impacted the evolution of CpGs and ZAP-binding motifs from 2013 to 2015. Previous reports indicate that Ribavarin (anti-HCV agent) leads to the accumulation of mutations at specific genomic locations [32]. Furthermore, some of the DAA anti-HCV drugs target the HCV RNA dependent polymerase [30], which may directly impact the type of mutations occurring in the HCV genome Apart from ZAP-mediated selection pressures, other selection pressures including TLR7-mediated immune selection pressures [7], host-specific selection pressures [12], and tissue-specific selection pressures [26] may be associated with the depletion of CpGs in RNA viruses. Therefore, the number of CpGs may not necessarily correlate with the number of ZAP-binding motifs for a given RNA virus. Nonetheless, we found a good correlation between the CpG numbers and the number of ZAP-binding motifs in HCV genomes (Figure 2). This finding suggests that CpG numbers in HCV genomes may be surrogates for the number of ZAP-binding motifs. The role of ZAP-mediated selection pressures in shaping RNA virus evolution has not been well studied. In SARS-CoV-2, the depletion of CpGs has been primarily attributed to pressures acting outside the ZAP-binding motifs. Our finding on the role of ZAP-mediated selection pressures as a major driver of HCV evolution (Figure 3) highlights that the CpG depletion in RNA viruses infecting humans is due to fundamental differences in evolutionary pressures. The underlying reasons for contrasting roles of ZAP-mediated selection among human viruses remain elusive. Liver is one of the human tissues where ZAP is highly expressed (Tissue atlas) [33]. A potential role for tissue-specific expression of ZAP and the necessary co-factors for ZAP-mediated restriction merits further investigation.Among the HCV genes, the HCV core gene is enriched for both CpGs and ZAP-binding motifs. Although ZAP-binding motifs in the HCV genome are depleted with time due to selection pressures, ZAP-binding motifs within the HCV core gene appear to be well conserved (Figure 4). The HCV core protein is a basic protein that interacts with HCV RNA, and oligomerizes and facilitates virus assembly [34]. In addition, the HCV core protein is a nucleic acid chaperone [35]. Mutations and deletions in the N-terminus of the HCV core has been shown to impact virus assembly [36]. We have not identified the specific reasons for the conservation of ZAP-binding motifs in the HCV core gene. Nonetheless, the selective conservation of ZAP-binding motifs in specific genes in virus genomes may indicate the existence of yet unknown constraints that minimize the loss of CpGs/ZAP-binding motifs. Importantly, this finding also suggests that the benefits of retaining CpGs/ZAP-binding motifs over the survival/replication advantages are associated with escaping ZAP-mediated restriction in the host. We also found that the loss of CpGs within the HCV core gene occurs primarily outside ZAP-binding motifs (Figure 5), suggesting the existence of gene-specific differences in selection pressures.

留言 (0)

沒有登入
gif