Hybrid allele-specific ChIP-seq analysis identifies variation in brassinosteroid-responsive transcription factor binding linked to traits in maize

Plant material and growth conditions

Construction of the ZmBES1/BZR1-YFP transgenic line was previously described [17] and obtained in the HiII background from A.W. Silvester. B73 and Mo17 wild-type inbred seeds were obtained from the Germplasm Resources Information Network (GRIN). To backcrosse ZmBZR1-YFP from its original HiII (B73xA188) background, transgenic lines were used as pollen and B73 as ear donor to eliminate cross contamination. Six backcross lines were independently backcrossed six times, using heterozygous plants to avoid gene silencing. Lastly, B73-BC5 lines were used as pollen and ear donors for Mo17xB73BZR1-YFP and B73BZR1-YFPxMo17 crosses, respectively. For allele-specific analysis, tissues from 12 plants were pooled per replicate. The residual HiII regions cannot be completely removed but were minimized by the backcrosses, sample pooling and further addressed by deep sequencing of the input. Wild-type and ZmBES1/BZR1-YFP, and BR-deficient mutant (brd1) plants were grown side by side in greenhouses, under long-day conditions (16h day/8h night, 28–30°C), and in the 2013–2016 Carnegie Institution for Science summer fields (Stanford, California, USA).

B73 ChIP-seq; B73 / Mo17 HASCh-seq, and ChIP-qPCR

ZmBZR1-YFP/B73, Mo17 inbred as well as ZmBZR1-YFP/B73xMo17 and Mo17xZmBZR1-YFP/B73 F1 hybrid plants and their non-YFP-carrying siblings (as negative control) were grown under greenhouse conditions for 26 days. The oldest 2 leaves were removed and 2 cm of meristem-enriched tissue was used (Additional file 2: Fig. S1). Per replicate n=12 plants were pooled. Tissues were first treated with 1 µM BL for 4 h at room temperature in water. After BR treatment, tissues were cross-linked with 2% formaldehyde for 10 min under vacuum with 5 min incubation after release. Tissues were homogenized to a fine powder in liquid nitrogen, and nuclei extraction was performed as described in [25]. Nuclear extracts were sonicated using a Branson 250 Sonifier (2× 4 min on time, 20 s on/off cycle with 10 min rest between repeats, 20 % amplitude), and after removing an input aliquot, incubated for 2 h with 10 µg polyclonal Anti-GFP antibody [25] (Additional file 2: Fig. S1). Protein-DNA complexes were captured on Dynabeads-Protein G (Life Technologies, #10003D), and the beads were washed with low-salt buffer (50 mM Tris-HCl at pH 8.0, 2 mM EDTA, 150 mM NaCl, 0.5% Triton X-100), with high-salt buffer (50 mM Tris-HCl at pH 8.0, 2 mM EDTA, 500 mM NaCl, 0.5% Triton X-100), with LiCl buffer (10 mM Tris-HCl at pH 8.0, 1 mM EDTA, 0.25 M LiCl, 0.5% NP-40, 0.5% deoxycholate) and twice with TE buffer (10 mM Tris-HCl at pH 8.0, 1 mM EDTA) and eluted with elution buffer (1% SDS, 0.1 M NaHCO3) at 65°C overnight. After a column purification (Quiagen, PCR purification kit), ChIP-seq libraries were generated using the Ultra II kit (NEB), following the manufacture’s recommendations using 10 ng per sample as starting material. ChIP-qPCR was performed using the Bioline SensiFAST SYBR Kit following the manufacturer’s recommendations on a Roche LightCylcer 480 at 63°C annealing temperature. Primers used for the analysis are listed in Additional file 15: Table S14 and F1-sequencing information in Additional file 16: Table S15.

Enzymatic methyl-seq

Leaf tissue from BZR1-YFP/B73xMo17 F1s was harvested (n=6 plants, 3 replicates) and treated the same way (including BL treatment) as described for ChIP but without crosslinking. Tissues were homogenized in liquid nitrogen, and DNA was isolated with the DNeasy Plant Mini Kit (Qiagen). Libraries were prepared using the NEBNext Enzymatic Methyl-seq Kit (NEB) following the protocol for large DNA inserts. Therefore, 200ng genomic DNA was combined with 0.002 ng CpG methylated pUC19 DNA and 0.04 ng unmethylated lambda DNA. Fragmentation was done by using the Diagenode Bioruptor NGS in three rounds, 30s on, 90s off. Agilent Technologies 4200 Tape Station was used to determine the size distribution and concentration of the libraries.

ChIP-seq data analysis

Quality-filtered ChIP-seq reads were aligned to the B73 AGPv4 genome using bwa-mem (v. 0.7.16a) [53] with default parameters, followed by removal of PCR duplicates using samtools (v. 1.3.1.) [54]. To determine BZR1 binding peaks, IP and negative control samples, after normalization for read depth, were analyzed using the GEM package (v. 3.0) [55] (using parameters: --fold 5, --k_min 5, --k_max 8). After samples were analyzed individually, peaks reproducible in all 3 replicas, using the GEM peak summits +/− 200 bp around, were determined using R (v. 3.3.2) and considered high-confidence peaks.

HASCh-seq data analysis

To analyze the HASCh-seq data, we created a diploid genome concatenating the recently released B73 V5 genome with the Mo17 CAU genome [20, 56]. Potential adapter contamination and low-quality reads were removed using Seqpurge (v2019-03-26). Reads were then mapped to the B73xMo17 genome using STAR [57] (v.2.7.10a), with the options --alignIntronMax 1 to allow DNA mapping. Only uniquely mapping reads (MAPQ 255) were retained and duplicates removed with samtools (v1.9). Bam files were converted to normalized bedgraph and bigwig formats using bamCoverage (deeptools v3.5.1) with parameters --effectiveGenomeSize 3491781308 (determined using unique-kmers.py -q -k average readlength), --normalizeUsing RPGC, --exactScaling, --smoothLength 0, --binSize 1).

BZR1 binding peaks were determined using the GEM (v3.4) pipeline described using IP samples against the negative control obtained from ChIP on non-YFP sibling plants. First high-quality peaks were called using the merged file of all ChIP-seq replicas (parameters --k_min 6, --k_max 8, qval 0.001, 10:1 IP:control cutoff). To obtain enough coverage for GEM peak calling in the replicates, we combined the 6 into 3 replicates (1x B73xMo17 and 1x Mo17xB73 replicate) (parameters changed: qval 0.01, 5:1 IP:control cutoff). Only high-quality peaks that overlap with peaks in all 3 replicates were retained using bedtools (v2.29.0). Although our focus was to perform allele-specific analysis, we also include a peak file of the merged hybrid BZR1-YFP ChIP-seq data compared to the negative control in all replicates that contains not only unique, but also shared peaks between the B73 and Mo17 genome (Additional file 17: Table S16).

Whole genome alignment between B73 AGPv5 and Mo17 CAU was performed with progressive cactus [58]. SNPs between B73 and Mo17 and their respective matching coordinates were determined with the halSnps function of progressive cactus, using parameters “unique” and “noDupes”. At those SNP positions, reads were counted per allele using bedtools and awk.

ZmBZR1-YFP ASBs were determined using custom R (v. 3.3.2) scripts. In order to accurately access allele frequencies of all homozygous SNPs, we set a minimum read coverage cutoff of ≥ 1 reads for both alleles and ≥ 25 for at least one of the alleles, neglecting SNPs located on scaffolds (n=429,236). Of the remaining 429,236 SNPs, we determined significant variation of median allele frequency of 0.494 using a binomial test with a p-value cutoff of ≤ 0.001 adjusted for multiple testing using Bonferroni correction (n=57,414 SNPs). To focus on ASBs with potential biological relevance, we further restricted ASBs to those located in high-confidence BZR1-binding peaks reproducible in all the biological replicates (n=33,267 ASBs). While TFs usually bind small DNA regions of ~10 bp [59], we used 75 bp paired-end sequencing with an average insert size of ~200 bp achieved after sonication. Therefore, SNPs in close proximity with causative polymorphisms will show biased allele frequency due to linkage. To address this, we identified SNPs with significant bias within a 150-bp rolling window of each other and defined the lead SNP with the smallest distance to the peak summit as ASB (n=7817). Finally, we excluded ASBs that showed a significant bias (p<0.05) in their surrounding region (+/− 1 kb) in the ChIP-input data (i.e., before immunoprecipitation), to avoid potential artifacts (e.g., mapping artifacts or errors in whole genome alignment). To measure bias significance of ASBs and establish an empirical significance threshold, a specific set of control background SNPs were proportionally sampled (excluding ASBs) per chromosome in order to establish a distribution of biases in their surrounding regions (+/− 1000 bp per background SNP). Hence, ASBs whose biases were beyond the upper and lower 5 percentiles (i.e., empirical p-value < 0.05) were excluded (n=6143 ASBs) from further analysis.

Gene ontology analysis

Functional enrichment analysis for Arabidopsis and Maize BZR1 target genes was performed using the Bingo plugin of Cytoscape (V 3.7.2). Arabidopsis identifiers of Maize homologs were used to avoid any bias of the annotation state of the two species.

Control background SNP sampling

Functional GWAS variants have been shown to be significantly enriched in gene proximal regions [2]. Therefore, control bgSNPs were proportionally sampled (excluding ASBs) per chromosome and genomic location (i.e., 5 - 1 kb upstream, 1 kb upstream - TSS, 5′UTR, exon, intron, 3′UTR, TTS - 1 kb downstream, intergenic) to match the genomic distribution of the ASB dataset. Additionally, we checked that ASBs and bgSNPs showed a similar minor allele frequency (Additional file 2: Fig. S7). In total, 317,094 bgSNPs were sampled, yielding approximately 50 times as many background SNPs per genome location, compared to the number of ASBs within each location.

Fluorescence imaging

Heterozygous BZR1-YFP plants were grown in the dark at RT for 10 days in vermiculite with and without 10 µM PPZ. Prior to imaging, 1-cm root tip segments were removed from the mock and PPZ-treated seedlings. The root tip of PPZ-treated plants again was treated for 15 min with 10 µM PPZ with or without 1 µM 24epi-BL.

RNA extraction, RNA sequencing, and differential expression analysis

BR-deficient brd1 mutant siblings were grown in soil under greenhouse conditions for 26 days as described above. The oldest 2 leaves were removed, and 2 cm of meristem-enriched tissue (Additional file 2: Fig. S1) was placed in 1 µM BL for 4 h at room temperature (RT) in water. Total RNA was isolated using acidic phenol extraction as described previously [60]. Purification of poly-adenylated mRNA using oligo(dT) beads, construction of barcoded libraries, and sequencing using Illumina HiSeq 2500 technology (75 bp paired-end reads) performed by Novogene Co. using the manufacturer’s recommendations. Trimmed and QC (Seqpurge v. 2019-02-11) filtered sequence reads were mapped to B73 AGPv4 using STAR (v. 2.54) [57] in two-pass mode (with parameters: --outFilterScoreMinOverLread 0.3, --outFilterMatchNminOverLread 0.3, --outSAMstrandField intronMotif, --outFilterType BySJout, --outFilterIntronMotifs RemoveNoncanonical, --quantMode TranscriptomeSAM GeneCounts). Unique reads were filtered by mapping quality (q255) and PCR duplicates removed using Samtools (v. 1.3.1). Gene expression was analyzed in R (v. 3.4.1) using the DEseq2 software (v. 1.16.1) [61]. Genes were defined as differentially expressed by a 1.5-fold expression difference with a p-value, adjusted for multiple testing, of < 0.05.

For the analysis of gene transcript differences between B73 and Mo17, two parallel data sets were analyzed. First, a previously published RNA-seq data set was used [34] including their differentially expressed genes. Secondly, B73xMo17 F1 plants were grown under greenhouse conditions for 21 days as described above. Leaf tissues of three replicates (n 12 plants each) were harvested and total RNA extracted using the RNeasy kit (incl. DNAse treatment, Qiagen). The NEB directional Ultra II RNA library kit was used to construct poly-A enriched, barcoded libraries. The default fragment insert size was increased to ~400 bp + adapters, to enhance the yield of reads containing B73/Mo17 variants. Sequencing reads were mapped to the concatenated B73 and Mo17 genome described above using STAR (v.2.7.10a, --outSAMmultNmax 1, --outFilterMultimapNmax 1, --winAnchorMultimapNmax 100, --sjdbOverhang 149, --outFilterIntronMotifs RemoveNoncanonical, --outFilterType BySJout, --twopassMode Basic, --quantMode GeneCounts). The maizegdb pan-gene dataset was used to determine orthologous B73 and Mo17 genes. In contrast to B73v5, almost all Mo17 CAU gene models lacked 5′ and 3′ UTRs. In addition, the increased use of PacBio long-read technology for the B73v5 compared to the Mo17 CAU annotation may explain the B73 bias in our initial allele-specific transcript quantification. To reduce this bias, we standardized 5′ and 3′ UTRs of both B73 and Mo17 genes to 500 bp from the translation start/stop and removed orthologous transcripts with > 50 bp cds length differences from the analysis. Only reads mapping uniquely to B73 or Mo17 and only those overlapping with a single gene model and at least 20 reads in total and at least one read per allele were considered for further analysis. For comparison with ASB genes, genes with 1.5-fold variations in transcript allele frequencies were considered. To further avoid annotation differences in Mo17 and B73 due to the missing UTR annotations in Mo17, ASB genes were annotated to the B73 allele only in this case and then their mRNA abundance in the respective B73/Mo17 pair was considered.

Genomic feature profiling of ASBs (methylation, motifs, and enhancers)

Inbred methylation levels for CG, CHG, and CHH for B73 and Mo17 were extracted from Regulski et al. [27]. Low-quality reads and eventual adapter contaminants were filtered from inbred and hybrid data by Trimmomatic (version 0.39) [62] and Trim Galore (https://github.com/FelixKrueger/TrimGalore), respectively. Both inbred and hybrid methylation data was mapped to the respective genomes (B73 v5, Mo17 CAU and the diploid hybrid genome) using Bismark (v0.22.3) [63] with bowtie2 (2.4.4) [64] as mapper allowing only unique mapping, methylation counts were extracted with Bismark using the -CX option of the bismark methylation extractor. Resulting CX reports were manually converted into bedgraph format using awk and converted to bigwig format using bedGraphToBigWig [65]. Methylation frequency versus distance (up to +/− 2 kbp) around each ASB were averaged over 10-bp bins, and visualized by regions bound by BZR1 with either high or low affinity levels depending on the inbred line. For B73, high- and low-affinity bound regions were defined by a post frequency of ≥ 0.85 or ≤ 0.15, respectively and oppositely for Mo17 by a %B73 binding frequency (B73/(B73+Mo17)) ≤ 0.15 and ≥ 0.85, respectively.

For local motif enrichment analysis (Fig. 2E), we extracted +/− 100 bp of the high-affinity BZR1 bound allele surrounding ASBs. The MEME CentriMo suite (v. 5.3.3) was used to determine the local distribution along the 201-bp fragments for the canonical BZR1 motifs BRRE (CGTG[T/C]G, C[G/A]CACG) or G-box (CACGTG) and a control motif, with SBP ("GTACGG", "CCGTAC") [51], with a similar GC content.

To identify ASBs overlapping with motif variation (Fig. 2h), we extracted the +/− 5 bp of the high-affinity BZR1 bound allele surrounding ASBs. Using Meme-suit Centrino (v.5.3.3), we scanned those 11-bp fragments for canonical BRRE (CGTG[T/C]G, C[G/A]CACG) or G-box (CACGTG) motifs and determined ASBs where the SNP changed a BRRE or G-box motif into an altered (non BRRE or G-box) motif.

To identify ASBs which overlapped with significant variation in either CG, CHG, or CHH methylation between B73 and Mo17, we first, per ASB, assigned averaged methylation levels of Mo17 and B73 methylation levels (separately for the CG, CHG, or CHH methylation datasets) within a given window of +/− 20 bp around the ASB position. Differentially methylated alleles were defined as described previously [27]. Accordingly, we defined ASBs as overlapping with differentially methylated regions if the B73 or Mo17 methylation level in F1 hybrids or inbreds, depending on the analysis, of one allele was ≥ 70% while the level of the corresponding allele was ≤ 10%.

Putative B73 enhancer (Fig. 4a) regions were extracted from Oka et al. [29] and uplifted to B73 AGPv5 using NCBI Remap. To determine potential enrichments, +/− 10 kbps surrounding enhancer regions were intersected with ASBs and bgSNPs.

GWAS enrichment, kinship matrices, and variance components analysis

We tested association of ASBs with the curated 4041 significant GWAS hits for 41 different phenotypes of the NAM population [2]. Per trait, we performed an enrichment between the ASBs compared to the control bgSNPs (with the same average genomic distribution and minor allele frequency) (Additional file 2: Fig. S7). GWAS hits were counted if they were located within 2 kb of ASBs or bgSNPs, and the subsequent enrichment analysis was based on a hypergeometric test.

We estimated the variance components explained by different ASB SNP subsets and the remaining SNPs using the maize NAM population [38]. To conduct the analysis, we downloaded the phenotypic data (/iplant/home/glaubitz/RareAlleles/genomeAnnos/VCAP/phenotypes/ NAM/familyCorrected), consisting of Best Linear Unbiased Predictors (BLUPs) for different traits [2], and the imputed genotypic data (/iplant/home/glaubitz/RareAlleles/genomeAnnos/VCAP/genotypes/NAM/namrils_projected_hmp31_MAF02mnCnt2500.hmp.txt.gz) [66] from CyVerse database as described in Panzea (www.panzea.org). In the analysis, we mapped ASB SNPs and randomly sampled bgSNPs that shared the similar genomic patterns to upstream 5 kb - TSS, within the CDS, TTS - 5 kb downstream of genes as well as intergenic regions. Since the Mo17 CAU annotation often lacks 5′ and 3′ UTRs, both the ASB and bgSNP positional annotation for VCAP was based on their B73 coordinates. The bgSNPs were resampled based on the ASB distribution for a total of 254,017 VCAP-bgSNPs. For each SNP subset, we calculated an additive kinship matrix using the variance component annotation pipeline (VCAP) implemented in TASSEL5 [67]. We then fed these kinship matrices along with the NAM phenotypic data to estimate the variance components explained by the ASB subsets, using a Residual Maximum Likelihood (REML) method implemented in LDAK [68].

Co-localization of candidate genes within joint linkage QTLs for resistance to NLB and SLB was represented graphically using R version 3.2.3 (R Core Team, 2015). The start and end sites of NAM QTL were identified in AGPv2 for NLB [69] and SLB [70]. Corresponding AGPv2 locations of candidate genes were identified via maizegdb.org [71].

Northern leaf blight assay

A 2 × 2 factorial experimental design included three replications, B73 and Mo17 genotypes, and two treatments (with or without PPZ) within each replication. B73 and Mo17 were planted (n>7) in 10 cm pots (1 plant per pot) in the greenhouse. At the V6-stage, the plants were treated with either 400 µM PPZ or mock solution through soil fertilizer drenching every 3 to 4 days. Four days after the initial PPZ treatment, the plants were inoculated with a 500-µl liquid spore suspension of Setosphaeria turcica isolate NY001 (race1), containing 4000 spores/ml, into the whorl during the late afternoon. An overhead mister provided water for 10 s every 10 min for approximately 15 h. Disease phenotype was measured as the incubation period (IP), the number of days following inoculation when a necrotic lesion was first observed. Forty-eight days following inoculation, the disease measurements were concluded; a survival analysis was performed in JMP® Pro Version 13.1.0 (SAS Institute, Inc, Cary, NC, 1989–2019) to test for differences within each genotype for the effect of PPZ on IP using a log-rank test statistic and visualized in a Kaplan-Meier plot.

留言 (0)

沒有登入
gif