Chloroquine resistance evolution in Plasmodium falciparum is mediated by the putative amino acid transporter AAT1

Ethics approval and consent to participate

The study was performed in accordance with the Guide for the Care and Use of Laboratory Animals of the US National Institutes of Health (NIH). The Seattle Children’s Research Institute (SCRI) has an Assurance from the Public Health Service through the Office of Laboratory Animal Welfare for work approved by its Institutional Animal Care and Use Committee. All of the work carried out in this study was specifically reviewed and approved by the SCRI Institutional Animal Care and Use Committee.

Project design

The project design is summarized in Supplementary Fig. 6. In brief, we use (1) population genomic analyses, (2) genetic crosses and quantitative genetics analysis followed by (3) functional analyses to investigate the role of additional loci in CQ resistance.

Gambia population analysis P. falciparum genome sequences

P. falciparum-infected blood samples collected from central (Farafenni) and coastal (Serrekunda) Gambia in 1984 and 2001, were processed for whole blood DNA and P. falciparum genomes and deep sequenced at the Wellcome Trust Sanger Institute. Data from isolates collected from coastal Gambia in 2008 and 2014 had been published previously44,45 (Supplementary Table 1). Before sequencing, P. falciparum genomes were amplified from whole blood DNA of each sample from 1984 and 2001 using selective whole genome amplification (WGA) and then sequenced (paired-end reads) on the Illumina HiSeq platform46. Reads were mapped to the P. falciparum 3D7 reference genome using bwa mem (http://bio-bwa.sourceforge.net/). Mapping files (Binary Alignment Map) were sorted and deduplicated by Picard tools v2.0.1 (http://broadinstitute.github.io/picard/), and SNP and indel were called with GATK HaplotypeCaller (https://software.broadinstitute.org/gatk/) following best practices (https://www.malariagen.net/data/pf3K-5). Variant call format (VCF) files were generated by chromosome, merged using bcftools (https://samtools.github.io/bcftools/bcftools.html) and filtered using vcftools (https://vcftools.sourceforge.net/). After filtration, only biallelic SNP variants with a VQSLOD score of ≥2, a map quality >30 and supported ≥5 reads per allelic variant were retained. SNPs with minor allele frequency <2% were removed from our analysis. We also removed samples with >10% genotypes missing. In the final dataset, there were in total 16,385 biallelic SNP loci and 321 isolates (1984 (134), 1990 (13), 2001 (34), 2008 (75) and 2014 (65)). The complexity of infection (monogenomic or polygenomic) was estimated as the inbreeding coefficient Fws from the merged VCF file using R package Biomix. The short-read sequence data analysed are listed in Supplementary Table 1.

Allele frequencies and pairwise differentiation

For each sample with a complexity of infection greater than 1, the allele with most reads was retained for mixed-allele genotypes to create a virtual haploid genome variation dataset. Allele frequencies were calculated in plink, and pairwise differences between temporal populations and genetic clusters were estimated by Fst using Weir and Cockerham’s method applied in the hierfstat package in R. The likelihood ratio test for allele frequency difference pFST was further calculated using vcflib. For a combined pFST P value, the fisher method was performed in R metaseq package. The summary P values were corrected for multiple testing using Benjamini–Hochberg (BH) method. To examine haplotype sharing at pfaat1 (Pf3D7_06_v3:1,213,102-1,217,313) and pfcrt (Pf3D7_07_v3:403222-406317) between isolates from the different years of sampling in Gambia, we extracted the IBD matrix using isoRelate R package18 for all pairs of isolates for gene regions spanning an additional 25 kb on each flank. We generated relatedness networks using the R package igraph following the scripts in the isoRelate R package18. Isolates are connected if they show >90% IBD.

Genome scans for selection

We considered samples collected in the same year as a single population irrespective of the location of collection. We used the hapFLK approach to detect signatures of positive selection through haplotype differentiation following hierarchical clustering of Gambian temporal population groups compared with an outgroup from Tanzania, as previously described47. P values were computed for each SNP-specific value using the Python script provided with the hapFLK program, and values were corrected for multiple testing using the BH method. Secondly, we used pairwise relatedness based on identity by descent to derive an iR statistic for each SNP as implemented by the IsoRelate18 package in R. Regions with overlapping iR and hapFLK −log10P values >5 were considered as regions of interest.

Population analysis on pfaat1 and pfcrt evolutionDatasets

We included two datasets in this study: (1) genotypes of 7,000 worldwide P. falciparum samples from MalariaGEN Pf community project (version 6.0) (ref. 21). This dataset includes samples from south America (SM), west Africa (WAF), Central Africa (CAF), East Africa (EAF), South Asia (SA), the western part of southeast Asia (WSEA), the eastern part of southeast Asia (ESEA) and the Pacific Oceania (PO) region. (2) We also included 194 Thailand samples with whole genome sequencing data available from Cerqueira et al.15, and merged them into the WSEA population. Duplicate sequences were removed according to the sample’s original ID (Hypercode). Only samples with single parasite infections (within-host diversity FWS > 0.90) and >50% of SNP loci genotyped were included for further analysis. A total of 4,051 samples remained after filtration (Supplementary Table 2). Non-biallelic SNPs and heterozygous variant calls were further removed from the dataset. We then extracted genotype data at pfaat1 and pfcrt gene regions and calculated the allele frequencies (Fig. 2a).

p faat1 haplotypes and evolutionary relationships

To minimize the effect of recombination, we extracted 1,847 SNPs distributed within 25 kb upstream and 25 kb downstream of the pfaat1 gene. Only samples with all 1,847 SNPs genotyped (581/4,051) were used for evolutionary analysis. To visualize the population structure, we calculated the pairwise genetic distance between samples and generated a minimum spanning network (MSN; Fig. 2b and Extended Data Fig. 5), using R package poppr. We compared genome sequences (PlasmoDB, version 46) between P. falciparum and Plasmodium reichenowi and extracted genotypes at 1,803/1,847 common loci. We then built an unweighted pair group method with arithmetic mean (UPGMA) tree rooted by P. reichenowi using the 581 haplotypes and 1,803 SNPs (Extended Data Fig. 5), using the R packages ape and phangorn under default parameters. MSN network and unweighted pair group method with arithmetic mean tree were plotted with ggplot2.

Genetic cross and BSAGenetic cross preparation

We generated genetic crosses between parasite 3D7 and NHP4026 (ref. 48), using FRG NOD huHep mice with human chimaeric livers and Anopheles stephensi mosquitoes as described previously23,24,25,49,50. 3D7 is a parasite of African origin51 that has been maintained in the lab for decades and is CQ sensitive, while NHP4026 was cloned from a patient visiting the Shoklo Malaria Research Unit clinic on the Thailand–Myanmar border (2007) and is CQ resistant (Supplementary Table 3). We generated three recombinant pools using independent cages of infected mosquitoes: these are independent pools of recombinants48. The estimated number of recombinant genotypes in each pool was ~2,800 (ref. 48). We used two pools (pool 1 and pool 2) maintained in AlbuMAX-based culture medium for this study.

Drug treatment and sample collection

For each recombinant pool, the parasite culture was expanded under standard culture conditions25. Briefly, cultures were maintained in complete medium at 5% haematocrit in O+ red blood cells (RBCs) (Biochemed Services) at 37 °C, pH of 7.0–7.5, 5% CO2, 5% O2 and 90% N2. Medium changes were performed every 48 h and cultures were expanded to keep the parasitaemia at ~1%. Once expanded, each recombinant pool was divided into 16 0.5 ml aliquots while diluting to 1% parasitaemia. The aliquots were maintained in 48-well plates and treated with CQ (Supplementary Fig. 7). In total, we had 32 cultures: 2 pools × 4 CQ concentrations (0 (control), 50, 100 or 250 nM) × 2 drug duration time (48 h or 96 h) × 2 technical replicates. We define the day when drug was applied as day 0. After 2 days (48 h) of drug treatment, the infected RBCs were washed with phosphate-buffered saline solution twice to remove residual drug. For the plate assigned for 48 h CQ treatment (48-well plate 1), cultures were maintained in complete medium; and samples were collected at days 0, 4 and 7. For the plate assigned for 96 h CQ treatment (48-well plate 2), fresh CQ was added back to the culture medium and treated for another 48 h; and after a total of 96 h CQ treatment, drug was removed and samples were collected at days 0, 5 and 10. CQ was dissolved in H2O and diluted in incomplete medium (Gibco, Life Technologies). Culture medium was changed every 48 h. Parasitaemia was monitored using 20% Giemsa-stained slides, and cultures were diluted to 1% parasitaemia if the parasitaemia was higher than 1%. Approximately 15 μl packed RBCs was collected per sample.

Library preparation and sequencing

We prepared Illumina libraries and sequenced both parents and the 96 segregant pools collected. We extracted genomic DNA using the Qiagen DNA mini kit and quantified DNA with Quant-iT PicoGreen Assay (Invitrogen). For samples with <50 ng DNA obtained, we performed WGA52. WGA products were cleaned with KAPA Pure Beads (Roche Molecular Systems) at a 1:1 ratio. We prepared sequencing libraries using 50–100 ng DNA or WGA product using KAPA HyperPlus Kit following the instructions with three cycles of PCR. All libraries were sequenced at 150 bp pair-end using Illumina Novaseq S4 or Hiseq X sequencers, to obtain >100× genome coverage per sample.

Mapping and genotyping

We mapped the sequencing reads against the 3D7 reference genome (PlasmoDB version 46) using BWA mem (http://bio-bwa.sourceforge.net/), and deduplicated and trans-formatted the alignment files using picard tools v2.0.1 (http://broadinstitute.github.io/picard/). We recalibrated the base quality score based on a set of verified known variants53 using BaseRecalibrator, and called variants through HaplotypeCaller. Both functions were from Genome Analysis Toolkit GATK v3.7 (https://software.broadinstitute.org/gatk/). Only variants located in the core genome regions (defined in ref. 53) were called and used for further analysis.

Genotype of parents

We merged calls from the two parents using GenotypeGVCFs in GATK, and applied standard filtration to the raw variant dataset as described in ref. 54. We recalibrated the variant quality scores and removed loci with variant quality score <1. The final variants in VCF format were annotated using snpEff v4.3 (https://pcingola.github.io/SnpEff/) with 3D7 (PlasmoDB, release 46) as the reference. After filtration and annotation, we selected SNP loci that are distinct in the two parents and used those SNPs for further BSA.

BSA

We used statistical methods described in refs. 25,48,50 for BSA. The variant calls from segregant progeny pools were merged together. Additionally, SNP loci with coverage <30× were removed. We counted reads with genotypes of each parent and calculated allele frequencies. Allele frequencies of 3D7 were plotted across the genome, and outliers were removed following Hampel’s rule55 with a window size of 100 loci. We performed the BSA using the R package QTLseqr56. Extreme QTLs were defined as regions with G′ > 20 (ref. 57). Once a QTL was detected, we calculated an approximate 95% confidence interval using Li’s method58 to localize causative genes.

Progeny cloning and phenotypingProgeny cloning

Individual progeny were cloned via limiting dilution at 0.3 cells per well from bulk cultures on day 10 after 96 h of control/250 nM CQ treatment. Individual wells with parasites were determined by qPCR (as previously described49) and expanded to larger cultures under standard culture conditions to obtain enough material for both cryopreservation and genome sequencing.

Sequencing and genotyping

Cloned progeny were sequenced and genotyped as described in the ‘Genetic cross and BSA’ section, with these modifications: (1) the cloned progeny were sequenced at 25× genome coverage; (2) SNP calls were removed if the coverage was more than three reads per sample.

Cloned progeny analysis

Unique recombinant progeny were identified from all cloned progeny using a previously described pipeline49. Non-clonal progeny were identified on the basis of the number and distribution of heterozygous SNP calls. Selfed progeny were identified as having greater than 90% sequence similarity to either parent. Unique recombinant progeny that were sampled multiple times were identified as clusters of individual clonal progeny with greater than 90% sequence similarity. We plotted frequencies of 3D7 alleles across the genome in progeny populations with and without CQ treatment. Heatmaps were generated to visualize inheritance patterns in individual unique recombinant progeny (Fig. 4a). We selected 16 unique recombinant progeny with different allele combination at chromosome 6 and chromosome 7 QTL regions for further CQ IC50 vvalues measurement (Supplementary Fig. 5).

Genome-wide linkage analysis on pfaat1 in cloned progeny

Fisher’s exact test was used to test for linkage between all inter-chromosomal pairs of loci across the set of 109 unique recombinant progeny. The distribution of the −log of the resulting P values were plotted in Fig. 4c, and the significance cut-off was calculated on the basis of a Bonferroni correction for the number of loci.

IC50 measurement for cloned progeny

Cryopreserved stocks of 3D7, NHP4026, 3D7×NHP4026 progeny were thawed and grown in complete medium under standard culture conditions as described above. Cultures were kept below 3% parasitaemia with medium changes every 48 h. Parents and progeny IC50 values were assessed via a standard 72 h SYBR Green 1 fluorescence assay59. Cultures were assessed daily for parasitaemia and stage. Cultures that were at least 70% ring were loaded into CQ dose–response assays of a series of two-fold drug dilutions across ten wells at 0.15% parasitaemia. Drug stocks (1 mg ml−1) for CQ were prepared in H2O as single-use aliquots and stored at −20 °C until use. Drug dilutions were prepared in incomplete medium. Biological replicates were conducted with at least two cycles of culturing between load dates. IC50 values were calculated in GraphPad Prism 8 using a four-parameter curve from two technical replicates loaded per plate.

CRISPR–Cas9 editing at pfaat1 and parasite phenotypingCRISPR–Cas9 editing

We designed plasmids for CRISPR–Cas9 editing as previously described60. The guide RNA (GAAATTAAATACATAAAAGA) was designed to target pfaat1 in NHP4026. Edits (258L/313F, 258S/313S and 258S/313F, Fig. 5a) were introduced to NHP4026 through homology arm sequence with target and shield mutations. Binding-site control mutants were not generated, as P. falciparum lacks error-prone non-homologous end joining61. The parasites were transfected at ring stages with 100 µg plasmid DNA, and successful transfectants were selected by treatment with 24 nM WR99210 (gift from Jacobus Pharmaceuticals) for 6 days. The parasites were recovered after ~3 weeks. To determine whether recovered parasites contained the expected mutations, we amplified the target region (forward primer, AGTACGGTACTTTTTATATGTACAGCT; reverse primer, TGCATTTGGTTGTTGAGAGAAGG) and confirmed the mutation with Sanger sequencing. We cloned parasites from successful transfection experiments: independent edited parasites (from different transfection experiments) were recovered for each pfaat1 genotype. Edited parasites were genome sequenced to identify off-target edits elsewhere in the genome. We were not able to find any SNP or indel changes between the original NHP4026 and any CRISPR-edited parasites other than the target and shield mutations.

IC50 measurement for CRISPR–Cas9-edited parasites

Parasite IC50 values for CQ, amodiaquine, lumefantrine, mefloquine and quinine were measured for two to four clones per CRISPR–Cas9-modified line and for NHP4026 across multiple load dates as described above for cloned progeny, except that each plate included two NHP4026 technical replicates as controls. This replication of genotype within each load date allowed for detection of batch effects due to load date.

Batch correction for IC50 data

Analysis of variance was used to account for batch effects and to test for differences in IC50 values between all genotype groups and for each contrast between each CRISPR–Cas9-modified line and NHP4026 for each drug tested62. A linear model with load date (batch) and genotype as explanatory variables was utilized to generate batch-corrected IC50 values for visualization of the impact of CRISPRCas9 modifications (Fig. 5b and Extended Data Fig. 7).

Measurement of parasite fitness using competitive growth assays

Parasites were synchronized to late-stage schizonts using a density gradient63. The top layer of late-stage schizonts was removed and washed twice with Roswell Park Memorial Institute (RPMI) medium. Synchronized cultures were suspended in 5 ml of complete medium at 5% haematocrit and allowed to re-invade overnight with gentle shaking. Parasitaemia and parasite stage were quantified using flow cytometry. Briefly, 80 μl of culture and an RBC control were stained with SYBR Green I and SYTO 61 and measured on a Guava easyCyte HT (Luminex). A total of 50,000 events were recorded to determine relative parasitaemia and stage. When 80% of parasites were in the ring stage, the head-to-head competition experiments were set up64. Competition assays were set up between CRISPR–Cas9-edited parasites and NHP4026 in a 1:1 ratio at a parasitaemia of 1% in a 96-well plate (200 μl per well) and maintained for 30 days. Each of the assays contained three biological replicates (three independent clones from different CRISPR–Cas9 editing experiments) and two technical replicates (two wells of culture). Every 2 days, the parasitaemia was assessed by microscopy using Giemsa-stained slides, samples were taken and stored at −80 °C and the cultures were diluted to 1% parasitaemia with fresh RBCs and medium. The proportion of parasites in each competition (Extended Data Fig. 10) was measured using a rhAmp SNP Assay (Integrated DNA Technologies) with primers targeting the CRISPR–Cas9-edited region in pfaat1.

Selection coefficient

We measured selection coefficient (s) by fitting a linear model between the natural log of the allele ratio (freq (allele-edited parasite)/freq (NHP4026)) and time (measured in 48 h parasite asexual cycles). The slope of the linear model provides a measure of the driving s of each mutation65. To compare relative fitness of parasites carrying different pfaat1 alleles, we normalized the fitness of NHP4026 to 1 and used slope + 1 to quantify the fitness of CRISPR–Cas9-edited parasites (Fig. 5c).

Overexpression of pfAAT1 in yeast

To generate pfAAT1 expressing yeast, plasmid carrying the pfaat1 coding sequence was transformed into yeast Saccharomyces cerevisiae (BY4743) as previously described27. The doubling time (h) was measured for strains carry empty vector, WT pfAAT1 or S258L mutant pfAAT1. We measured doubling time under two culture conditions: control or with 1 mM CQ. Three independent experiments were performed for each assay.

pfAAT1 protein structure analysis

Three-dimensional homology models for pfAAT1 were predicted using AlphaFold29,66 and I-TASSER30,67,68 and analysed with PyMol software (v2.3.0; Schrödinger, LLC). At the primary sequence level, we used TOPCONS32 to predict transmembrane helix topology for comparison. We plotted a cartoon version of the protein transmembrane topology based on the computationally predicted structures and membrane topology (Extended Data Fig. 9). Models were truncated to exclude amino-terminal residues 1–166, probably positioned outside of the membrane, because AlphaFold assigns low confidence to this N-terminal stretch. Furthermore, mutations of interest map only to transmembrane helices according to both 3D models and TOPCONS. I-TASSER generated models with topology similar to AlphaFold with the highest variations in AlphaFold low-confidence regions 1–166 and 475–516. The top five I-TASSER models superimpose on the AlphaFold model with a root mean square deviation range of 2.4–2.8 Å over 303–327 of 440 aligned residues using the PDBeFold Server (http://www.ebi.ac.uk/msd-srv/ssm). The four common SNPs (S258L, F313S, Q454E and K541L) overlay closely between the homology models. We evaluated the effect of different mutations on protein stability using the mutagenesis function in PyMol.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

留言 (0)

沒有登入
gif