Integration of SARS-CoV-2 RNA in infected human cells by retrotransposons: an unlikely hypothesis and old viral relationships

The COVID pandemic that started at the end of 2019 led to a remarkable mobilization of scientific efforts as evidenced by > 175.000 publications to date. Among these, the work by Zhang et al. triggered an animated debate in the scientific community [1]. Based on studies performed in cultured cells transfected with DNA encoding the retrotransposon L1 (long interspersed nuclear elements 1), authors proposed that SARS-CoV-2 RNA, in particular the subgenomic RNA encoding the nucleocapsid (NC), can be converted into dsDNA and integrated into the cellular genome by the L1 retrotransposition machinery [1]. These SARS-CoV-2 sequences can be expressed in patients as chimeric cellular-viral transcripts, which could explain the long-term PCR positivity for viral RNA in patients who recovered from COVID. A similar hypothesis was proposed by Yin and co-workers who observed that infection by SARS-CoV-2 (as well as other human coronaviruses) causes upregulation of retrotransposon expression, leading to the formation of chimeric virus-retrotransposon transcripts [2].

These original reports opened a heated debate on the correctness of the findings and their relevance for recovered COVID patients and subsequent work was initiated to test alternative explanations. It was proposed early on that the observed chimeric RNAs could be artifacts generated during cDNA library preparation. Two findings hint at this possibility. First, the directionality of the observed chimeric transcripts, in which a large fraction of SARS-CoV-2 RNA derives from the (–) strand, in contrast to the predominance of ( +) strand RNAs in SARS-CoV-2 natural infection. Second, the absence of the 3’ end and polyA tail of the viral genome, which are commonly present in integrated sequences processed by L1 elements.

The origin of the chimeric human-SARS-CoV-2 reads in RNA-seq libraries was subsequently investigated in a dedicated study [3], which showed that such hybrid sequences arose also between SARS-CoV-2 RNA and transcripts encoded by mitochondrial DNA or episomal adenoviral DNA in transfected cells, thus being unlikely the result of genuine SARS-CoV-2 integration. Other studies focused on detecting SARS-CoV-2 retrotransposition events in deep sequencing data, confirming the absence of genuine L1-mediated integration events and suggesting that the observed chimeric transcripts had emerged during RNA-seq library construction [4,5,6,7]. Importantly, such chimeric reads were also identified when RNA from infected human cells was mixed before library preparation with RNA from uninfected or unrelated vertebrate cells [4, 7]. In addition, the lack of reproducibility of the observed host-virus chimeric transcripts across SARS-CoV-2 patient samples corroborated the idea that these sequences arose from stochastic, artifactual events at the RNA-seq level (e.g. random ligations, template switching and/or sequence alignment errors). Consistent with this notion, the chimeric reads are mostly composed of abundantly expressed cellular and viral transcripts. Together, the collective evidence for genuine SARS-CoV-2 DNA formation and integration remains sparse [7].

To put these recent reports in a broader context, some consideration should be given to the molecular biology of L1 elements and their interplay with viruses. L1 elements represent the most abundant subfamily of non-LTR retrotransposons, accounting for ∼17% of the human genome. L1 elements are autonomous for self-mobilization by encoding two proteins (ORF1 and ORF2) that together mediate reverse transcription of their own RNA and subsequent integration of the resulting dsDNA in the cellular genome [8]. This process shows some cross-activity on non-autonomous retrotransposons. Despite the accumulation of inactivating mutations, a subset of 80–100 L1 elements remains active in the human genome. Accordingly, L1 retrotransposition has been observed at early stages of embryonic development, and > 100 de novo L1 insertions have been linked to heritable genetic disorders [9]. Beyond the germ line and pluripotent stem cells, L1 activity has been reported at the somatic level in neuronal progenitors and various human tumors, possibly being responsible for mutagenic events [9]. For these reasons, L1 elements are intensively being studied in diverse diseases and they were reported to be upregulated in different pathologies and especially cancer. However, there is no direct evidence for retrotransposition as a cause of disease. This also holds true for the multi-step process of tumorigenesis, where putative LINE contributions could be due to indirect effects, e.g. by non-specific epigenetic changes in cancer cells.

There are few reports on L1-mediated mobilization of viral transcripts. In hepatocellular carcinoma (HCC) induced by hepatitis B virus (HBV), recurrent integration of HBV subgenomic RNAs was reported to yield a chimeric long non-coding RNA between the HBV mRNA for the X antigen (HBx) and L1 RNA in > 23% of patient samples [10]. Of note, this HBx-L1 chimeric RNA is reported to promote malignant transformation and hepatic injury [11]. Unlike for retroviruses, integration is not a mandatory step in the HBV replication cycle and the mechanism of HBV integration in HCC cells remains poorly characterized. The observation that ∼ 90% of HBV-induced HCC cells contain at least one integrated HBV-DNA fragment, combined with their preferential localization in or near repetitive elements, could cautiously suggest a possible role of L1 elements in the mobilization of short HBV transcripts [12]. This scenario is consistent with the fact that HBV replication occurs in the nucleus and is corroborated by the presence of HBV-integrations in most HCC samples, whose abundance seems to negatively correlate with patient survival [12]. Of note, ∼ 40% of viral breakpoints observed upon HBV integration are restricted to an 1800-bp genome portion including the viral enhancer, X gene and core gene, which may contain features that are recognized by the L1 machinery. Perhaps a coincidence, but the size of the above mobilized HBV genome portion is comparable to that of the mobilized SARS-CoV-2 RNA fragment (1,662 bp) reported by Zhang et al. [1]. Specific breaks in the viral genome also occur during SV40-BK virus oncogenesis, leading to upregulated expression of the viral oncogene. It is important to stress that integration as detected in tumor cells does NOT occur during normal virus replication.

The most remarkable case of L1-virus interplay does however not involve “modern” human viruses, but rather a group of human endogenous retroviruses (HERVs) that were acquired by the primate genome some 20–43 million years ago through infection of the germ line by now-extinct retroviruses [13]. The hallmark of all retroviruses is reverse transcription of their RNA genome into dsDNA that integrates in the genome of the infected cells. Hence, germline integration of these ancestral retroviruses allowed their inheritance as Mendelian genes and vertical transmission to the offspring. HERV retrotransposons currently constitute ∼ 8% of our genome and have occasionally been used to develop novel and important physiological processes like placenta formation [14, 15]. The HERV-W group is unique for its colonization dynamics: among the 213 members, 135 (63%) are not direct retroviral integrations, but rather processed pseudogenes that were generated through mobilization of HERV-W transcripts by the L1 machinery [13, 16]. Only this HERV group shows such L1-dependency, although the determinants for the specific interaction with L1 remain unclear. Sequence analyses indicated that mobilization is 2.5-fold more efficient for subgroup 1 HERV-W members, suggesting the presence of preferential sequence signatures for L1 recognition [13]. Besides retroviruses, which have reverse transcription and integration as a stable biological feature, an example of human endogenous viral elements (EVEs) that have likely involved L1 in their formation are the bornavirus-like elements, i.e. the only non-retroviral RNA virus-derived EVEs [17]. This scenario is supported by the fact that most of such elements originate from reverse-transcription and integration of the mRNA coding for ancient bornavirus nucleoprotein, with genomic localization and flanking sequences being consistent with L1 action [18].

留言 (0)

沒有登入
gif