The COVID-19 pandemic driven by SARS-CoV-2 is currently totalling more than 105 million cases and 2 million deaths around the world. Many prophylactic and therapeutic regimens1, 2 have been tested in randomised controlled trials (RCT), but to date only dexamethasone3 and remdesivir4 have shown evidences of clinical benefit. The Spike protein drives SARS-CoV-2 infectivity: 30–40 Spike homotrimers are exposed on the envelope of each virion,5, 6 and each monomer consists of 2 domains (S1 and S2). The S1 domain includes the receptor-binding domain (RBD), whose most relevant region is the receptor-binding motif (RBM) (Figure 1). Anti-SARS-CoV-2 Spike antibodies can be grouped in 11 clusters according to epitopes or in 4 classes according to mechanism of action (Table 1). There have been many exploitations of passive immunotherapies based on anti-Spike neutralising antibodies (nAb), which develop in close to 90% of patients and persist for at least 5 months.7 The nAbs isolated from SARS-CoV-2 patients are preferentially encoded by certain heavy-chain germline genes and the two most frequently elicited antibody families (IGHV3-53/3-66 and IGHV1-2) can each bind the RBS in two different binding modes.8 The first nAb-based manufactured therapeutic has been COVID19 convalescent plasma (CCP), whose efficacy seems promising9 but for which randomised controlled trials are still pending.10 Antiviral monoclonal nAb have entered the market,11 and polyclonal IgG formulations (i.e., hyperimmune serum) will likely follow.12 All these antibody-based therapeutics and vaccines suffer from one major risk: mutational escape of the Spike protein.13 Changes in Spike protein might also increase transmissibility, leading to increased re-infection rates and reduced efficacy of vaccine campaigns.14 Please note that many of the references in this manuscript are preprints which have not yet been through the peer review process.
Linearised representation of Single nucleotide polymorphisms (SNPs) and deletions commonly detected in the S1 and S2 domains of the Spike protein, with a focus on the receptor binding domain (RBD) and receptor binding motif (RBM). Circle size represents relative abundance of the mutation in worldwide genome repositories as of January 2021. Mutations within RBD are represented on grey background
TABLE 1. Competition clusters for anti-SARS-CoV-2 Spike monoclonal antibodies referred in the text mAbs Target Cluster (adapted from Ref79) Representative mAbs Class (Adapted from Ref [201]) Representative mAbs Neutralising RBD I COVA2-16, COVA2-31, COVA2-23, COVA2-11, COVA3-06, COVA3-09, COVA2-29, COVA2-45, COVA1-18, COVA2-20, COVA2-39, COVA 2-15 1 (block ACE2, accessibility to RBD epitope only in ‘up’ conformation) C102 C105 B38 CC12.3 III COVA2-04, COVA2-13, COVA2-07, COVA2-24, COVA2-44, COVA1-16 2 (blockACE2, accessibility to RBD epitope in ‘up’/’down’ conformations) C002, C104 C119, C121 C144, COVA2-39, 5A6 P2B-2F6 Ab2-4, BD23 VI COVA1-01, COVA1-02, COVA1-27, COVA2-34, COVA1-12 3 (does not overlap with ACE2 binding site; accessibility to RBD epitope in ‘up’/‘down’ conformations) C135 S309 C110 REGN10987 VII COVA2-02, COVA2-46, COVA2-05 4 (does not overlap with ACE2 binding site; accessibility to RBD epitope only in ‘up’ conformation) CR3022COVI1-6EY6AS304S2A4 IX (NTD) COVA2-25, COVA2-03, COVA2-22, COVA2-30, COVA1-06, COVA2-17, COVA3-07, COVA1-20, COVA2-06, COVA3-05, COVA1-09, COVA2-37, COVA1-22 Not against RBD IV COVA2-40, COVA1-25 X COVA1-03 XI COVA1-21 Nonneutralising II, V and VIII Many 2 CURRENTLY CIRCULATING SARS-CoV-2 CLADESCoronaviruses belong to the order Nidovirales, which is known for viruses with the longest RNA genome.15 The genome of SARS-CoV-2 has 29,903 ribonucleotides, which encode 29 proteins. Although coronaviruses have a proof-reading apparatus,16 their genomes remain subject to recombination as well as other copy-choice transcriptional errors.17 Being a recent virus, the observed diversity is lower than for other RNA viruses.18 Most SARS-CoV-2 proteins exhibit little mutational variability, the proteins with highest mutation rate (MR) being the Spike, NSP12 (RNA-dependent RNA polymerase [RdRp]) and NSP9c.19 The average MR of SARS-CoV-2 genome has been estimated from the related mouse hepatitis virus (MHV) at 10−6 nucleotides per cycle, and the MR at 4.83 × 10−4 subs/site/year, which is similar, or slightly lower, than what is observed for other RNA viruses.20 Heterogeneous mutation patterns are mainly reflections of host antiviral mechanisms that are achieved through apolipoprotein B mRNA editing catalytic polypeptide-like proteins (APOBEC), adenosine deaminase acting on RNA proteins (ADAR) and ZAP proteins and probable adaptation against reactive oxygen species (ROS).21 Two particular mutation types, G→U and C→U, possibly the result of APOBEC and ROS, cause the majority of mutations in the genome and occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasmic).22
Nomenclature of genetic diversity within a given species is not regulated by the International Committee on Taxonomy of Viruses. Historically, genetic diversity is variably grouped in ‘clades’, ‘subtypes’, ‘genotypes’, ‘groups' or ‘lineages’. The main repositories for SARS-CoV-2 genomic sequences are listed in Table 2.
TABLE 2. Main SARS-CoV-2 gene sequence repositories and analysis tools Repositories URL China National Center for Bioinformation (CNCB) - National Genomics Data Center (NGDC) https://bigd.big.ac.cn/ncov/release_genome?lang=en China National Microbiology Data Center (NMDC) http://nmdc.cn/nCov/en COVID-19 Genomics Consortium UK (CoG-UK) https://www.cogconsortium.uk/ Global initiative on sharing all influenza Data (GISAID) https://www.gisaid.org/epiflu-applications/phylodynamics/ NCBI SARS-CoV-2 GenBank https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049 NextStrain https://nextstrain.org/sars-cov-2 Analysis tools Virus pathogen resource (ViPR) https://www.viprbrc.org/brc/vipr_genome_search.spg? Method = SubmitForm&decorator = corona& searchId = 44742&runFrom = persistent Global evaluation of SARS-CoV-2/hCoV-19 sequences (GESS) https://wan-bioinfo.shinyapps.io/GESS/ SARS-CoV-2 mutation Browser v-1.3 [203] http://covid-19.dnageography.com/ Microbial Genome mutation Tracker (MicroGMT) https://github.com/qunfengdong/MicroGMT Coronapp http://giorgilab.unibo.it/coronannotator/ Ensembl variant effect predictor https://www.ensembl.org/info/docs/tools/vep/index.html Infection pathogen detector 2.0 http://ipd.actrec.gov.in/ipdweb Pangolin COVID-19Lineage assigner https://pangolin.cog-uk.io/ US SARS-CoV-2 variant dashboard https://janieslab.github.io/sars-cov-2.html NextClade https://clades.nextstrain.org/ CovRadar https://gitlab.com/dacs-hpi/covradarIn April 2020, a preliminary work by the London School of Hygiene & Tropical Medicine on 5300 sequences from 62 countries identified two clusters (C1 and C2) further classified in 6 main clades (C1, C.1.1, C2, C2.1, C2.1.1 and C.2.1.2).23 These findings were replicated by a Chinese study in June 2020 using only 103 isolates, which first introduced the L and S lineage nomenclature.24
The Global Initiative on Sharing All Influenza Data (GISAID) repository contains more than 400,000 full SARS-CoV-2 proteome sequences (mostly from Europe, and in particular the UK) as of 20 December 2020, and classifies clades with progressive letters. In Winter 2020, the main clades were L, O, V and S. Later, clade G (with the associated D614G mutation in the Spike protein) emerged followed by the related GR and GH clades.25 An eighth clade named GV has since been described in the following months.
Nextstrain26 sources data from public repositories such as NCBI, GISAID and ViPR, as well as GitHub repositories and other sources of genomic data. Nextstrain supports the year-letter dynamic Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) lineage nomenclature27 (https://github.com/nextstrain/ncov/blob/master/docs/naming_clades.md). Clades originally needed a frequency of at least 20% globally for two or more months, and are named with the year it was first identified and the first available letter within the alphabet. The parent clade is reported with the ‘.’ notation (e.g., 19A.20A.20C to indicate clade 20C). Then, in January 2021, it was acknowledged that lack of international travel made it slower for new clades to move past 20% global frequency, and consequently two alternative requirements were added: clade reaches more than 20% global frequency for two or more months: a clade reaches more than 30% regional frequency for two or more months, and a VOC (‘variant of concern’) is recognised.28
All the above-mentioned different SARS-CoV-2 phylogenies are reconciled in Table 3, which details the separating (barcoding) SNPs. Globally, Jacob et al.29 showed positive selection of D614G, S477N (clade 20A.EU2), A222V (20A.EU1) and V1176F SNPs, an expansion of B.1 clade, especially strain containing Q57H (B.1. X), R203K/G204R (B.1.1. X), T85I (B.1.2-B.1.3), G15S + T428I (C.X) and I120F (D.X). None of the SARS-CoV-2 variants described so far has been shown to increase infection severity; on the contrary, a clade 19B variant with lower severity was detected in Singapore in the Spring and then disappeared.30
TABLE 3. Summary of main clade/lineages according to different naming schemes London clade GISAID \clade PANGOLIN lineage NextStrain Clade Originary country Separating (barcoding) nonsynonymous single nucleotide mutations and deletions corresponding effects on protein sequence max frequency C1 L B 19A Asia: China/Thailand Root clade 65%–47% globally in Jan 2020, now disappearing C1.1 n.a. n.a. n.a. C18060T A17858G orf1ab:nsp14:S7F ? orf1ab:nsp13:M541V C2 S A 19B Asia: China C8782T NSP4:S76S (synonymous) 28-33% globally in Jan 2020; now in some restricted areas in the US and Spain, but resurging thanks for convergent evolution [210] T28144C ORF8:L84S n.a. V B.2 19A G11083T NSP6:L37F Now disappearing G26144T ORF3a:G251V C.2.1 G B.1 20A N America/Europe/Asia: USA, Belgium, India C14408T NSP12b:P314L Found in Germany, Australia and China in Jan 2020; basal pandemic lineage bearing S 614G that's globally distributed A23403G (C241T) S:D614G (C3037T) 5′UTR NSP3:F106F C.2.1.1 GH B.1.2 20C (US) N America: USA C14408T NSP12b:P314L Derived from 20A since Feb 2020; southern US in late May of 2020 [211]; globally distributed A23403G (C241T) S:D614G (C3037T) 5′UTR G25563T NSP3:F106F C1059T ORF3a:Q57H orf1ab:nsp:T85I C2.1.2 GR B.1.1.1 20B Europe: UK, Belgium, Sweden C14408T NSP12b:P314LS:D614G5′UTRNSP3:F106FN:RG203KR Derived from 20A since Feb 2020;Globally distributed A23403G (C241T) (C3037T) GGG28881AAC n.a. n.a. B.1.2 20G USA Many ORF1b 1653D Derived from 20C, main strain in USA in second wave ORF3a 172V N 67S N 199 n.a. n.a. B.1.1.7 20I/501Y.V2 South-East UK ORF1ab:C3267T ORF1ab:T1001I 10% in UK Dec 2020 derived from 20B ORF1ab:A1708D ORF1ab:A1708D ORF1ab:I2230T ORF1ab:I2230T ORF1ab:Δ11288-11296 ORF1ab:ΔSGF3675-3677 S:21765-21770 deletion S: ΔHV69-70 S:21991-21993 deletion S: ΔY144 S:A23063T S:N501Y S:C23271A S:A570D S:C23604A S:P681H S:C23709T S:T716I S:T24506G S:S982A S:G24914C S:D1118H Orf8:C27972T ORF8:Q27stop Orf8:G28048T ORF8:R52I Orf8:A28111G ORF8:Y73C N:28280 GAT- > CTA N:D3L N:C28977T N: S235F n.a. GV B.1.177 20E (EU1) (formerly 20A/EU.1) Spain Many N 220VORF10 30LORF14 67FA222V many Main strain in second wave in EUDerived from 20A n.a. n.a. B.1.351 20H/501Y.V1 South Africa Many D80AD215GK417NE484KN501YA701V Derived from 20C, concentrated in South Africa n.a. n.a. B.1.1.1 20D Many ORF1a 1246IORF1a 3278S Derived from 20B, concentrated in South America, Southern Europe and South Africa n.a. n.a. D.2 20F Many ORF1a 300FS 477N Derived from 20B, concentrated in Australia n.a. O n.a. n.a. Others Abbreviation: n.a., not available.Viruses with both S:D614G and RdRp:P323L mutations have lower ratios of nonsynonymous mutations per nonsynonymous site to synonymous mutations per synonymous site (dN/dS) compared to those without the two mutations, particularly at RdRp coding region and Orf8 gene. Instead, S gene had higher dN/dS ratios in the mutant genomes. While the S gene was under stronger negative selection in wild-type genomes during the early stages, it is almost at equal levels between mutant and wild-type genomes in the later stages. Instead, RdRp is under stronger overall negative selection in the mutant genomes, particularly during the early stages.31
3 MECHANISM OF IMMUNE ESCAPE: SINGLE NUCLEOTIDE MUTATIONS VERSUS DELETIONSSingle-nucleotide polymorphisms (SNP) and deletions, such as the ones reported in Table 3 and discussed below, can occur in individual patients and then expand at a global scale. As of February 2021, there were 2592 distinct SARS-CoV-2 variants.32 It has been reported that 95% of patients show within-host diversity, mostly due to mutational hotspots.33 High-confidence subclonal variants were found in about 15.1% of the NGS data sets with mutant spike protein, which might indicate coinfection with various SARS-CoV-2 strains and/or intrahost evolution.32 SNPs are rare because of proofreading efficiency of the SARS-CoV-2 RNA-dependent RNA polymerase (nsp12) and the error-correcting exonuclease protein non-structural protein 14 (nsp14): P203L mutation in nsp14 almost doubles the genomic MR (from 20 to 36 SNPs/year).34
Deletions also represent a mechanism to drive sudden evolution: in antigenic terms, deletions can drive antigenic drift. McCarthy et al.35 showed two recurrent deletions in the Spike glycoprotein which compromise binding of a nAb: deletions in the N terminal domain (such as ΔH69/ΔV70 and ΔY144) are becoming increasingly prevalent.36 There are both putative33 and in vivo37 evidences of superinfection from SARS-CoV-2 strains belonging to different clades. While studies relying on clade assignment and statistics such as linkage disequilibrium have identified that recombination occurs at very low levels37, 38 (or is unlikely to be occurring at all24, 39-43) even when analysing vast quantities of sequencing data, a new method detected multiple recombination events using relatively small samples.44
Of interest, all the three major variant of concerns (VOC) discussed in details below and summarised in Table 4 (i.e., B.1.1.7, B.1.351 and P.1) harbour the deletion in ORF1ab (del11288-11296 [3675-3677 SGF]).45 Positive selection has been detected for 21 Spike signature mutations sites (convergent for 16 sites and nonconvergent for 5 sites) and 90 nonsignature mutation sites in these VOCs.46 Given consistent convergent evolution, we will separately discuss individual mutations first, and will later focus on VOCs.
TABLE 4. Comparison of B.1.1.7, B1.1.28-derived clades, B.1.1.33 (E484K), B.1.351 and CAL.20C lineages with regard to mutations in Spike, other SARS-CoV-2 genes and evidences for reinfection Clade B.1.1.298 B.1.1.720I/501Y.V1 P.120J/501Y.V3 P.2 B.1.1.33 (E484K) B.1.35120H/501Y.V2 B.1.429 B.1.526 B.1.258Δ a.k.a. Cluster V VUI/VOC 202012/01 B.1.1.28.1 B.1.1.28.2B.1.1.28 (E484K) − 501Y.V2 x CAL.20C − − B.1.1.248 VOC 202101/02 Country of first detection Denmark South-East England, UK Amazonas, Brazil Rio de Janeiro, Brazil São Paulo and Amazonas, Brazil South Africa Southern California, USA New York, USA Czech Republic, Slovakia L5F − − − − − − − + − S13I − − − − − − + − − L18F − − → + + − − − → + − − − T20N − − + − − − − − − P26S − − + − − − − − − ΔH69/ΔV70 + − → + − − − − − − + D80A − − − − − + − − − T95I − − − − − −− − + − D138Y − − + − − − − − − ΔY144 − + − − − − - − − W152C − − − − − − + − − R190S − − + − − − − −
留言 (0)