Astrovirus infections were identified from a dataset of viruses detected in samples from a longitudinal study of fruit bats across Madagascar. Methodological details on bat field sampling and subsequent RNA extraction, library preparation, and Illumina sequencing have been reported in previous work [13, 14]; here, we give only a brief overview.
Between 2018 and 2019, monthly bat captures were carried out at three species-specific locations: Ambakoana roost (-18.513 S, 48.167 E, Pteropus rufus); adjacent Angavobe/Angavokely caves (-18.944 S, 47.949 E, and − 18.933 S, 47.758 E, Eidolon dupreanum); and Maromizaha cave (-18.9623 S, 48.4525 E, Rousettus madagascariensis). Bats were identified to species, sex, and age (adult vs. juvenile), and throat, fecal, and urine samples were collected.
Following field collection, throat, fecal, and urine samples underwent RNA extraction in the Virology Unit at the Institut Pasteur de Madagascar (IPM) using the Zymo Quick DNA/RNA Microprep Plus Kit (Zymo Research, Irvine, CA). In total, RNA from 285 fecal, 143 throat, and 196 urine swab samples was extracted, then stored in -80 °C freezers at IPM, prior to final transport on dry ice to the Chan Zuckerberg Biohub San Francisco (CZB-SF) for eventual library preparation and subsequent mNGS.
Aliquots of each sample were arrayed into a 384 well plate for mNGS library preparation. Samples were evaporated using a GeneVac EV-2 (SP Industries, Warminster, PA, USA) to enable miniaturized library preparation with the NEBNext Ultra II RNA Library Prep Kit (New England Biolabs, Beverly, MA, USA). Library preparation was performed per the manufacturer’s instructions, with the following modifications: 25 pg of External RNA Controls Consortium Spike-in mix (ERCCS, Thermo-Fisher) were added to each sample prior to RNA fragmentation; the input RNA mixture was fragmented for 8 min at 940C prior to reverse transcription; and a total of 14 cycles of PCR with dual-indexed TruSeq adaptors was applied to amplify the resulting individual libraries. Samples were assessed for quality and quantity, then submitted to an Illumina NextSeq 2000 (Illumina, San Diego, CA, USA) for paired-end sequencing (2 × 146 bp). The pipeline used to separate the sequencing output of the individual libraries into FASTQ files of 146 bp paired-end reads is available on GitHub at https://github.com/czbiohub/utilities.
DetectionRaw reads from Illumina were host-filtered, quality-filtered, and assembled on the Chan Zuckerberg Infectious Diseases (CZID) bioinformatics platform (v3.10 NR/NT 2019-12-01) [33], using a host background model of “bat” compiled from all publicly available full-length bat genomes in GenBank at the time of sequencing (July 2019). Samples were marked positive for astrovirus infection if at least two contigs with an average read depth > 2 reads/nucleotide were assembled that showed significant nucleotide or protein BLAST alignment(s) (alignment length > 100nt/aa and E-value < 0.00001 for nucleotide BLAST/bit score > 100 for protein BLAST) to astroviruses present in the NCBI NR/NT database (v12-01-2019). Additionally, all non-host contigs assembled in CZID were manually BLASTed against all full-length and protein reference sequences for astroviruses available in NCBI Virus.
To test for differences in astrovirus prevalence, we performed four Pearson’s Chi squared tests between differing subsets of the data: between total prevalence across the three species, and between adults and juveniles within each species.
Molecular confirmationSamples identified as positive via mNGS described above were subjected to conventional PCR. Among the 4 fecal and 7 urine samples identified as AstV positive by mNGS, only one urine sample did not have enough leftover nucleic acid material and was not included in the analysis. Original fecal and urine samples were syringe filtered through a 0.45\(\:\mu\:\)m filter, and 80\(\:\mu\:\)L of supernatant was re-extracted using the Zymo Quick-RNA Miniprep Kit (Zymo Research, Orange, CA, USA), then eluted in 15\(\:\mu\:\)L of nuclease free water according to manufacturer’s instructions. Reverse transcription was carried out on the eluted RNA using SuperScript IV Reverse Transcriptase (Thermo Fisher Scientific, Waltham, MA, USA) under the following thermal conditions: random priming at 65 °C for 5 min, followed by first strand synthesis at 25 °C for 10 min, 42 °C for 50 min, and 80 °C for 10 min. The resulting cDNA was then tested for presence of astrovirus using a heminested assay targeting the gene ORF1b, specifically the RdRp gene, using primers and cycling conditions described in Chu et al. 2008 [34]. We included water as negative control, and a synthetic gBlock (IDT, Newark, NJ, USA) based on a bat-derived Mamastrovirus (NCBI taxID: NC_043102) as a positive control. PCR products were visualized using gel electrophoresis, and any bands appearing at the target size of 422 bp were cut out and purified using a Zymoclean Gel DNA recovery kit (Zymo Research, Orange, CA, USA). Purified PCR amplicons were then subject to Sanger sequencing through the University of Chicago DNA Sequencing Core.
Genome annotation and % identity analysisTo annotate coding sequences, we downloaded all available bat astrovirus full genomes from NCBI Virus at the time of analysis (August 2022). We aligned positive astrovirus contigs from our dataset to these background sequences using the MAFFT [35] algorithm (v7.450) with default parameters in Geneious Prime (v08-18-2022). We then annotated open reading frames and genes in our novel sequence by identifying stop and start codons in regions adjacent to those identified in the homologs.
To investigate similarity to other published sequences, we performed BLASTn (nucleotide-nucleotide) and BLASTx (translated nucleotide-protein) searches within the NCBI database (Table S1). Additionally, we created amino acid and nucleotide identity plots using the program pySimplot [36] with input alignment generated using MAFFT [35] (v7.450) with default parameters in Geneious Prime (v08-18-2022).
From our initial mNGS screen, we identified one near-full length astrovirus genome which we characterized and used in a full-genome phylogeny; all other astrovirus hits were short fragments which aligned with AstV regions with limited phylogenetic relevance given the lack of corresponding PCR targets in GenBank. From our confirmatory PCR analysis, we successfully generated three RdRp sequences which were used in a Southwest Indian Ocean Region bat RdRp phylogeny.
Phylogenetic analysisTo perform phylogenetic analysis, we combined our novel sequences with those publicly available on NCBI. We carried out three major phylogenetic analyses, building (a) a full-genome Mamastrovirus maximum likelihood (ML) phylogeny, (b) a time-resolved Bayesian phylogeny corresponding to a selection of full genome Mamastrovirus sequences available on NCBI Virus, and (c) a Mamastrovirus ML phylogeny corresponding to a conserved 410 bp fragment of the RdRp gene encapsulated in the AstV ORF1b with a focus in the South West Indian Ocean region. Detailed methods for the construction of each phylogeny are available on GitHub (see Data Availability).
Sequence compilationOur full genome ML phylogeny consisted of one novel full length Mamastrovirus sequence from our study, combined with 41 unique Mamastrovirus sequences from NCBI, and one full length Avastrovirus sequence as an outgroup, for a total of 43 sequences. For comparison, we compiled Mamastrovirus sequences from NCBI through three queries, selecting: [A] all complete RefSeq Genomes under Virus: Mamastrovirus (taxid:249588) and Virus: unclassified Mamastrovirus (taxid:526119) greater than 6,000 bp (N = 36), [B] Mamastrovirus nucleotide genomes under Virus: Astroviridae (taxid:39733) and Virus: unclassified Astroviridae (taxid:352926) with Host: Chiroptera (bats) (taxid:9397) over 6,000 bp (N = 2), and [C] manual searching of Mamastrovirus nucleotide genomes > 6,000 bp identified in the literature (N = 3).
Our Bayesian timetree consisted of the same set of full length Mamastrovirus sequences, removing the Avastrovirus outgroup, for a total of 42 sequences.
Our Mamastrovirus RdRp ML phylogeny consisted of our three PCR-detected RdRp sequences, one from Rousettus madagascariensis and two from Eidolon dupreanum, combined with 122 unique bat Mamastrovirus sequences from NCBI, and one Avastrovirus RdRp fragment as an outgroup, for a total of 126 sequences. NCBI sequences were restricted to those from bat hosts sampled in the Southwest Indian Ocean (SWIO) region. They were compiled through one query in NCBI Virus: Virus: Astroviridae (taxid:39733), Host: Chiroptera (taxid:9397), and Geographic Region: Madagascar (64), Mozambique (31), and Reunion (27). No sequences were available from the other SWIO landmasses: the Comoros, Mayotte, Mauritius, and the Seychelles. Sequences were confirmed to be RdRp fragments via alignment, and metadata such as host taxa and sampling location were verified in the source literature.
Alignment and substitution modelFollowing dataset compilation for each phylogenetic analysis, sequences were aligned via the MAFFT [35] (v7.450) algorithm in Geneious Prime (v 2022-08-18) using default parameters. Alignments were visually examined and trimmed to match the shortest sequence in the dataset. We then used Modeltest-NG [37] (v0.1.7) to determine the best fit nucleotide substitution for each alignment. All sequences, subsets, and alignments are available in our open-source GitHub repository (see Data Availability).
Phylogenetic tree assemblyBoth the full genome and RdRp ML trees were constructed in RAxML-NG [38] (v1.1.0), using the best nucleotide substitution model from Modeltest-NG (TVM + I + G4) [37]. Following best practice recommendations in RAxML-NG [38], twenty ML inferences were made, followed by bootstrap replicate trees inferred using Felsenstein’s method [39]. The MRE-based bootstrapping test was performed every 50 replicates, and bootstrapping was terminated when the diagnostic result was below the threshold value. Support values were compiled onto the best-scoring tree.
The Bayesian timetree was built in BEAST2 [40] (v2.6.7), using the best nucleotide substitution model from Modeltest-NG (TVM + I + G4)[37]. We used a Bayesian Skyline Coalescent Model with a strict lognormal clock rate with prior mean 0.001 substitutions/site/year [41], and a constant population size. Sampling date for each sequence was inferred from NCBI ‘Collection Date’ or through reading source literature; if day was not available, the sampling date was set to the 15th of the month listed; if day and month were not available, the sampling date was set to July 15th of the collection year. Markov Chain Monte Carlo (MCMC) chains were run for > 700,000,000 iterations and terminated when we identified convergence at ESS values > 200 using TRACER (v1.7), with 10% burn-in. We used TreeAnnotator (v2.6.3) to examine mean posterior densities at each node.
All phylogenies were visualized in RStudio (v2022.07.01), using the package ‘ggtree’ [42].
Nucleotide sequence accession numbersOne annotated near full-length mNGS genome sequence from a Rousettus madagascariensis was submitted to NCBI and is available under accession number OQ606244. In addition, three PCR-detected RdRp sequences were submitted to NCBI, two from different Eidolon dupreanum individuals and one from a Rousettus madagascariensis (the same individual as OQ606244) and available under accession numbers PQ038332, PQ038333 and PQ038344.
留言 (0)