Tissue-Specific Transcription Footprinting Using RNA PoI DamID (RAPID) in Caenorhabditis elegans [Communications]

Abstract

Differential gene expression across cell types underlies development and cell physiology in multicellular organisms. Caenorhabditis elegans is a powerful, extensively used model to address these biological questions. A remaining bottleneck relates to the difficulty to obtain comprehensive tissue-specific gene transcription data, since available methods are still challenging to execute and/or require large worm populations. Here, we introduce the RNA Polymerase DamID (RAPID) approach, in which the Dam methyltransferase is fused to a ubiquitous RNA polymerase subunit to create transcriptional footprints via methyl marks on the DNA of transcribed genes. To validate the method, we determined the polymerase footprints in whole animals, in sorted embryonic blastomeres and in different tissues from intact young adults by driving tissue-specific Dam fusion expression. We obtained meaningful transcriptional footprints in line with RNA-sequencing (RNA-seq) studies in whole animals or specific tissues. To challenge the sensitivity of RAPID and demonstrate its utility to determine novel tissue-specific transcriptional profiles, we determined the transcriptional footprints of the pair of XXX neuroendocrine cells, representing 0.2% of the somatic cell content of the animals. We identified 3901 candidate genes with putatively active transcription in XXX cells, including the few previously known markers for these cells. Using transcriptional reporters for a subset of new hits, we confirmed that the majority of them were expressed in XXX cells and identified novel XXX-specific markers. Taken together, our work establishes RAPID as a valid method for the determination of RNA polymerase footprints in specific tissues of C. elegans without the need for cell sorting or RNA tagging.

DIFFERENTIAL gene expression across cell types encompasses both key determinants and markers of cell identity. Cataloging these differences can provide critical insights and entry points in research aiming at elucidating the mechanisms controlling fundamental biological processes, such as organismal development and cell/tissue physiology. Caenorhabditis elegans is a widely used model animal, particularly well-suited for integrative studies bridging our understanding across the molecular, cellular, and organismal levels. With its transparent body, C. elegans was the first animal used to analyze tissue-specific transcription in vivo with GFP reporters (Chalfie et al. 1994), an approach still extensively used. In contrast to the versatility of the model for individual gene expression analysis, more holistic approaches such as tissue-specific transcriptomics still remain relatively challenging, in particular in postembryonic animals due to the tough cuticle and the difficulty to isolate intact tissue or cell types.

Two main general strategies have been developed to analyze specific tissues/cell types in C. elegans. A first general strategy involves the purification of tissue-specific messenger RNAs (mRNAs) from whole animals, which we will call “RNA tagging/pulling” hereafter. The most widely used RNA tagging/pulling method rely on the immunoprecipitation of a tagged poly-A binding protein-1 (FLAG::PAB-1) expressed in a specific cell type and cross-linked to RNA (Roy et al. 2002; Kunitomo et al. 2005; Pauli et al. 2006; Hrach et al. 2020). One of the latest versions of this method is referred to as polyA tagging and sequencing (PAT-seq; Blazie et al. 2015, 2017). The PAB-based method is technically demanding, in particular for the analysis of a small number of cells (Takayama et al. 2010) and has been shown to be associated with significant background noise (Von Stetina et al. 2007; Ma et al. 2016). Two more recent alternatives to PAB-based mRNA tagging are tissue-specific translating ribosome affinity purification (TRAP; Gracida and Calarco 2017; Rhoades et al. 2019) and trans-splicing-based RNA tagging (SRT; Ma et al. 2016). TRAP analysis focuses on ribosome-engaged mRNAs recovered after cross-linking and immunoprecipitation of a tagged ribosome subunit. TRAP allowed the identification of genes expressed in a specific neuron type representing only two cells per animal (Rhoades et al. 2019). SRT uses a modified SL1 splice leader expressed in target tissues, which is trans-spliced to the transcripts by the cellular machinery (Ma et al. 2016). While SRT bypasses the noise inherent in immunoprecipitation procedures, it has so far only been applied to large tissues and the approach is limited to SL1-associated transcripts (62% of the C. elegans genes; Yang et al. 2017).

The second general strategy relies on animal disruption and cell isolation, or dissociation, followed by the in vitro culturing or sorting of labeled cells (Von Stetina et al. 2007; Spencer et al. 2011), or nuclei (Haenni et al. 2012; Steiner et al. 2012) before transcriptomic analysis, and which we will call “dissociation-based” methods. The tough cuticle in larval stages and a fortiori in adults constitutes a significant obstacle, and initial studies focused on embryonic cells which were more easily dissociated. More recent protocols combining FACS and RNA-seq were successfully used to analyze major tissues (Kaletsky et al. 2018) as well as neuronal subsets in adults (Wang et al. 2015; down to six neurons per animal; Kaletsky et al. 2016). Combined with single-cell RNA-sequencing (scRNA-seq), large-scale dissociation-based studies can address transcript profiles in multiple cell types at the same time (Cao et al. 2017; Packer et al. 2019), including the analysis of individual neuron types (Hammarlund et al. 2018; Lorenzo et al. 2020). However, how efficiently specific cell types, especially rare cells, can be purified varies depending on their morphologies, how fragile they are, and the developmental stage considered, suggesting that dissociation-independent methods could be useful complementary approaches.

We investigated whether we could use DNA adenine methylation identification (DamID) to footprint actively transcribed genes in specific cell types in whole worms. This approach has been successfully used in Drosophila to determine transcribed genes in rare brain cell types (Southall et al. 2013). DamID relies on the low-level expression of a fusion between the Escherichia coli Dam methyltransferase and a protein of interest, here a subunit of the RNA polymerase (Pol). Binding of the latter to DNA leads to the methylation of GATC sites in the vicinity of the binding site (Figure 1A). After DNA extraction, methylated GATCs are specifically cleaved by the restriction enzyme DpnI and amplified using adapter-mediated PCR before sequencing (Supplemental Material, Figure S1A). The adaptation of the method to C. elegans seemed promising since endogenous adenine methylation/demethylation is very rare in worms and not targeted to GATC motifs (Greer et al. 2015). Moreover, the GATC site frequency is expected to provide good spatial resolution. Indeed, C. elegans has 269,049 GATC sequences per haploid genome, corresponding to an average of one site for every 374 bp and a median of 210 bp (Gómez-Saldivar et al. 2016a). In C. elegans, DamID has been used to study the genomic footprint of the DAF-16 transcription factor (Schuster et al. 2010), and large-scale interactions between the genome and the nuclear periphery (Towbin et al. 2012; Sharma et al. 2014; Cabianca et al. 2019; Harr et al. 2020).

Figure 1Figure 1Figure 1

RAPID: RNA Pol II DamID scheme. (A) Dam::rpb-6 bound to all three RNA polymerase types modifies proximal GATC motifs. DamID analysis provides RNA polymerase footprints in vivo. (B) Gene-level correlations in young adult whole animals of RAPID and RNA polymerase ChIP-seq using different antibodies [a: Garrido-Lecca et al. (2016); b: Kalinava et al. (2017); c: Miki et al. (2017)]. (C) Comparison of RAPID signal with mRNA expression profiling. All genes are represented on the x-axis, ranked from left to right, based on RAPID signal in the entire animal (shown on left y-axis). Averages transcripts per million (tpm; calculated with salmon) using data set from study c above (ribosomal RNA depleted) were calculated, using the genes falling into each bin of 690 genes on the x-axis (values on right y-axis). (D and F). Profiles at large scale (D) and gene scale (housekeeping genes, F) of RAPID and the different RNA polymerase II ChIP-seq studies cited in B. (E) Metagene plot of the RNA polymerase II ChIP-seq and RAPID signals on WS270 genes. TES, transcription end site; TSS, transcription start site.

Here, we describe the RNA Polymerase DamID (RAPID) approach for transcriptional footprinting in specific tissues in C. elegans in both embryonic blastomeres and in young adults. Using a fusion between a small subunit present in all three RNA Pols, we show that the technique can be used on both fluorescently sorted blastomere cells and DNA isolated from entire young adults using cell-type-specific expression generated by Cre/lox. To test the versatility of the method, we determined the Pol footprints in three different tissues at each stage. In young adults, these tissues represent between 10 and 0.2% of the somatic cells of the animal. We show that meaningful transcriptional patterns can be recovered using this technique, in line with previously used RNA tagging/pulling and dissociation-based methods. We further explore how this technique can be used to discover tissue-specific markers and report the identification of eight new reporters expressed in adult XXX cells. Additionally, as a result of the phasing-out of older sequencers able to sequence amplicons of different sizes as produced by DamID-sequencing, we show that new long-read sequencing methods can be used to sequence DamID libraries, which makes it possible to carry out the experiments, from DNA extraction to amplicon sequencing, in less than a week, without any external sequencing facility.

Materials and MethodsPlasmids and transgenic strains

All plasmids and worm strains are listed in Tables S1 and S2. Tissue-specific DamID plasmids were generated using Gibson assembly. All DamID plasmids were integrated as single copies using MosSCI either on chromosome II or IV (see details in Table S1; Frøkjaer-Jensen et al. 2008). Strains and plasmids are available upon request.

Worm growth

Worms were grown on solid NGM, seeded with OP50 bacteria for maintenance culture and genetic crosses. All worm cultures were grown at 20°. For cell sorting, animals were grown on peptone plates seeded with HB101, synchronized twice by hypochlorite treatment. Young adults containing up to 10 embryos were recovered 66 hr after plating synchronized L1 larvae. For DamID experiments in young adults, worms were grown on NGM seeded with Dam-negative E. coli GM48 for at least two generations. Around 4000 synchronized L1s were seeded onto 100-mm plates (1000–1200 per plate) and collected 53 hr later. Worms were washed extensively with M9 (at least 10 times) and distributed in aliquots of 30 μl, removing the excess liquid. Samples were snap-frozen and stored at −80°.

Sorting of embryonic blastomeres

Synchronized gravid adult hermaphrodites containing 8–10 eggs were treated with hypochlorite. Eggs were then incubated for 3 hr in M9 at 25° until they reached the 1.5-fold stage. Eggs were transferred to 500 μl egg buffer (25 mM HEPES pH 7.3, 118 mM NaCl, 48 mM KCl, 2 mM Cacl2, 2 mM MgCl2) and pelleted 1 min at 2000 rpm. The supernatant was then aspirated, leaving 100 μl of buffer with the pellet. Then, 500 U of chitinase (C8241-25U; Sigma-Aldrich, St. Louis, MO) was added and the mixture was resuspended and further incubated for 1 hr at room temperature. Chitinase was neutralized with 800 μl Leibovitz medium. Digested embryos were recovered by centrifugation at 3000 rpm for 5 min at 4°. Embryos were then dissociated into isolated blastomeres by pipetting up and down with a P1000 pipette, up to 150 times until dissociation was complete. The cell population was filtered with a Millex-SV syringe 5-μm filter (SLSV025LS; Millipore, Bedford, MA). A total of 500 fluorescent cells (one technical replicate) were sorted using a BD FACSAria Fusion (8000 events/second; 85-µm nozzle) in sterile PCR tubes containing 1 μl of pick buffer (50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2, 137 mM NaCl). Collected samples were then frozen in liquid nitrogen before further processing. After DamID processing and PCR (see below), two technical replicates were pooled for sequencing library preparation.

DamID amplification, library preparation, and sequencing

For sorted cells, frozen samples were lysed by addition of 2 μl of lysis buffer (10mM TrisAc, 10 mM MgAc, 50 mM KAc, 0.67% Tween 20, 0.67% Igepal + 1 mg/ml Proteinase K) and incubated for 2 hr at 60° before Proteinase K inactivation at 95° for 15 min. DamID amplicons were obtained as previously described (Gómez-Saldivar et al. 2016b), using 30 PCR cycles.

For young adults, DamID was performed on 500 ng gDNA extracted from the animals using DNeasy Blood and Tissue Kit (#69504; QIAGEN, Valencia, CA). Two replicates for each stage and cell type were processed. DamID amplicons were obtained as previously described (Gomez-Saldivar et al. 2016a), with 20 PCR cycles for worm-wide DamID, 22 PCR cycles for muscle DamID, 22–24 PCR cycles for intestine DamID, and 24–26 PCR cycles for DamID of XXX cells. New Illumina patterned flow cell technology does not allow sequencing of amplicons larger than 600 bp. Both types of DamID PCR amplicons were therefore sequenced using nanopore sequencing. After AMPure XP (Beckman Coulter) purification with 1.8 bead volume, DamID PCR amplicons were directly used for nanopore library barcoding and preparation using NBD-104 and LSK-109 kits (Oxford Nanopore Technologies). Libraries were sequenced to obtain at least 1 million reads per library on MinION, using R9.4.1 flow cells. We compared the performance of both sequencing approaches by sequencing the same libraries with both old paired-end HiSeq2500 Illumina flow cells and nanopore sequencers. Coverage obtained by both approaches were similar (see snapshot of mapped reads in Figure S1E). However, nanopore sequencing allowed the direct sequencing of longer amplicons (Figure S1F), most likely because Illumina cluster amplification does not work well on molecules longer than 800 bp, even on old-generation (HiSeq2500) flow cells. We compared whole animal DamID sample libraries generated using previously described strains expressing either a free GFP-Dam fusion or a perinuclear lamin fusion (Sharma et al. 2014). Statistical comparison of both techniques at the single restriction fragment level gave a Pearson correlation coefficient between short paired-end (PE) and long reads between 0.84 and 0.86 (Figure S1G).

Bioinformatic analysis

Nanopore sequences were base-called and demultiplexed using guppy 3.6 with the high-accuracy model, before mapping to the ce11 genome using minimap2 (Li 2018). Reads were considered as DamID amplicons when both ends mapped ±8 bp from a genomic GATC motif. Filtered libraries were then used to call Pol footprinting values using the damidseq_pipeline package (Marshall and Brand 2015), using bam files (parameters:–bamfiles–extend_reads = 0). The damidseq pipeline normalizes for GATC accessibility using a free Dam (in our case Dam::GFP) profile. Per-gene values were extracted using polii.gene.call (https://owenjm.github.io/damidseq_pipeline/) with WS270 gene annotations. Correlations between libraries and methylation footprinting values were made using ad hoc R scripts available upon request. Normalized footprinting tracks were visualized using IGV. For the comparison with chromatin immunoprecipitation sequencing (ChIP-seq) tracks, fastq files were downloaded from the Gene Expression Omnibus (GEO), mapped to ce11, normalized for sequencing depth with a pseudocount of 8 and log2-normalized to the input. Final figure construction was made using IGV and Adobe Illustrator.

Array construction and microscopy

Reporter plasmids were assembled using the three-fragment MultiSite Gateway system (Thermo Fisher Scientific). Promoters of candidate genes were PCR amplified from N2 genomic DNA. Primer sequences and promoter lengths are described in Table S3. Candidate promoters were cloned into pDONR-P4-P1R vector (Invitrogen, Carlsbad, CA) by Gateway cloning, generating slot 1 entry vectors (Table S2). Final expression plasmids were generated by Gateway cloning between promoters (slot 1), mNeonGreen fluorescent ORF (slot 2; dg353; Hostettler et al. 2017), unc-54 3′ UTR (slot 3; pMH473, gift from Marc Hammarlund), and pDEST-R4-R3. The control localization/expression plasmid [sdf-9p::NLS::wrmScarlet] was generated through a Gateway recombination reaction between dg801 (slot 1), dg651 (slot 2; Marques et al. 2019), pMH473 (slot 3), and pDEST-R4-R3. Finally, promoter::mNeonGreen and sdf-9p::NLS::wrmScarlet plasmids were co-injected into N2 young adult hermaphrodites at 20 ng/μl each, with 20 ng/μl of dg9 (unc-122p::RFP; red coelomocyte) as a co-injection marker (#8938; Addgene; Miyabayashi et al. 1999). Two stable lines of each candidate promoter were selected (Table S1). For imaging, young adult animals identified based on the red coelomocyte reporter expression were transferred to a 2% agarose pad containing 0.01% sodium azide. Worms were then imaged on a Zeiss Axiovert microscope with a ×20 objective driven by Visiview, using a Photometrics Coolsnap Myo CCD camera with GFP and RFP settings (60 planes spaced by 2 μm), plus a middle plane section in DIC settings. Using Fiji, the acquired optical stacks were then partially z-projected to capture the worm section containing the XXX cells, identified using the sdf-9p::NLS::wrmScarlet signal. Final figure construction was made using Adobe Illustrator. For each promoter except asp-9, two independent extrachromosomal arrays were scored in at least five animals.

Identification of detected and unique tissue-specific genes and validation

Gene coverage of RAPID profiles are listed in Table S4 (blastomeres) and Table S5 (young adults) (Sheet 1–2). Detected genes in a tissue were defined as genes that are significantly expressed [false discovery rate (FDR) < 0.05] in one replicate of one tissue (Table S4, sheet 3 and Table S5, sheet 3). Genes consistently detected in a tissue were defined as genes that are significantly expressed (FDR <0.05) in both replicates of one tissue (Table S4, sheet 4 and Table S5, sheet 4). Representative examples of consistently detected genes used in Figure 2C were selected from the CeNGEN project (C. elegans Neuronal Gene Expression Map & Network; https://cengen.shinyapps.io/SCeNGEA/; Hammarlund et al. 2018) and visualized using IGV (https://igv.org/).

Figure 2Figure 2Figure 2

Tissue-specific expression profiles using RAPID by Cre/lox recombination in young adult animals. (A) Experimental system for tissue-specific expression of Dam fusions and RAPID analysis. (B) Venn diagram of overlap between expressed genes identified using RAPID in three different tissues. (C) RAPID profiles for previously characterized genes expressed in a tissue-specific manner in intestine, muscle, and the XXX neuroendocrine pair.

Unique tissue-specific genes were defined as genes significantly and differentially expressed relative to the other two tissues (Table S4, sheet 5 and Table S5, sheet 5). Analysis of the uniqueness of the genes from each tissue was performed using the JavaScript library jvenn (http://bioinfo.genotoul.fr/jvenn).

To ensure an unbiased comparison between genes identified as expressed in muscle with RAPID and the set of genes identified with SRT (Ma et al. 2016), only SL1 trans-spliced genes were selected, based on the intersection with the 10,589 mRNAs annotated as SL1-trans-spliced by modENCODE (Allen et al. 2011), generating a subset of 3477 genes. For the comparison between the genes identified in the intestine with RAPID and the set of genes identified with fluorescence-activated nuclear sorting (FANS; Haenni et al. 2012), downregulated genes were eliminated from the list and only genes found as expressed and upregulated (total 9169 genes) were analyzed. For the comparison between the genes identified with RAPID and the set of genes identified with FACS (Kaletsky et al. 2018) in muscle and intestine, we directly used the expressed genes list from Table S1, which contained 7690 and 9603 genes, respectively. Chi-squared statistical tests were performed using the GraphPad QuickCalcs (https://www.graphpad.com/quickcalcs/contingency1/; accessed August 2020).

Tissue-specific prediction analysis

Tissue expression prediction analyses were performed using the top 500 statistically significant genes that were uniquely enriched in muscle and intestine (Table S5, sheet 6), selecting a multigene search within the Tissue-specific Expression Predictions for C. elegans program, version 1.0 (http://worm-tissue.princeton.edu/search/multi). For the tissue expression prediction test using the RAPID muscle-SL1 group, the parameters published by Ma et al. (2016) were used.

Gene ontology analysis

Gene ontology (GO) analyses were performed on unique tissue-specific gene lists. GO terms and q-values were obtained using gProfiler (https://biit.cs.ut.ee/gprofiler/gost; version e100_eg47_p14_7733820) with the g:SCS multiple testing correction method, applying a significance threshold of 0.05 (Table S5, sheet 7; Raudvere et al. 2019).

Functional enrichment analysis for genes detected with different methods was performed using WormCat (http://www.wormcat.com/index). Annotations classified into category 2 were visualized with heatmap diagrams.

Candidate selection for reporter analysis

Candidate genes were selected for follow-up promoter analyses with two goals in mind: demonstrating that they are indeed transcribed in XXX cells, and identifying markers with expression restricted to XXX cells. Since XXX cells represent only 0.2% of the C. elegans cellular content, we reasoned that XXX-specific markers should produce very little or no signal in the whole animal samples. Therefore, out of the 862 XXX-enriched genes, with a significant RAPID signal in XXX, but not in muscle or in intestine samples, we further removed genes detected in at least one worm-wide sample. We obtained a refined list of 275 XXX-marker gene candidates, which still included the known XXX cell markers daf-9 and eak-4 (Table S5, sheet 8). From this candidate list, we selected dhs-17 as an uncharacterized gene with a plausible link to XXX cell function, plus 11 random candidates. To limit caveats related to operons, we excluded a candidate if another transcript was located <200 bp upstream. The XXX-specific RAPID signals (RNA Pol occupancy values) in the 12-gene subset (average = 1.35, SD = 0.59) were similar to those in the starting 275-gene set (average = 1.47, SD = 0.64; P = 0.16 by Student’s t-test).

ResultsRNA Pol footprinting using RAPID

The RAPID approach relies on the expression of trace levels of the E. coli Dam fused to a subunit of the RNA Pols, and the analysis of its DNA occupancy to evaluate the transcriptional state of genes. We first tested whether we could use AMA-1, the largest catalytic subunit of RNA Pol II, as reported in targeted DamID, done in neuronal lineages in Drosophila (Southall et al. 2013). However, the Dam signal was too weak, possibly due to the fact that the expression level of this very large fusion protein under transcriptional control of the uninduced hsp-16.2 promoter was too low (not shown), or that the localization of the Dam domain within the C. elegans Pol complex was not favorable to access its DNA substrate. We therefore replaced AMA-1 with RPB-6, the C. elegans homolog of the RNA Pol subunit F present in all three RNA Pols (D. Katsanos, M. Ferrando-Marco, I. Razzaq, T. Southall, and M. Barkoulas; unpublished data). RPB-6 together with AMA-1 and RPB-2, forms a “clamp” that retains DNA near the active center of Pol II (Cramer et al. 2000), stabilizing the transcription on the DNA template. In contrast to AMA-1 attempts, we obtained a strong DamID amplicon PCR signal in animals carrying a single-copy transgene of the rpb-6::Dam fusion. As these animals had not been subjected to heat shock, they therefore expressed only low levels of the Dam fusion, yet the methylation levels were sufficient to perform DamID. Amplicons were sequenced using an ONT MinION nanopore device (see Materials and Methods for details and experimental validation of the technique). After signal normalization with GFP::Dam data to control for overall chromatin accessibility, we could detect general patterns of methylation that were consistent with Pol-dependent methylation (Figure 1, D–F) and a good correlation at the gene level between replicates (Figure 1, B, D, and F). We compared the RAPID profiles with published recent RNA Pol II ChIP-seq data sets in young adult animals (Garrido-Lecca et al. 2016; Kalinava et al. 2017; Miki et al. 2017). Visually, the patterns appeared similar (see examples of profiles in Figure 1, D and F). Even if the nature of the data are not identical, we performed a comparative analysis between the data sets for all genes, downscaling the resolution of the ChIP-seq data to GATC fragment (Figure 1B). RAPID and RNA Pol ChIP-seq show a significant correlation (R = 0.38–0.46). Overall, interreplicate correlations were higher between the RAPID replicates (R = 0.86) and between the RNA Pol II ChIP-seq experiments (R = 0.63–0.97). The difference between the RAPID and RNA Pol II ChIP-seq profiles most likely results from the different approach to recover DNA and from the fact that RAPID in adults represent a picture of cumulative transcriptional activity, while the RNA Pol II ChIP-seq represents a Pol occupancy snapshot at the young adult stage.

To get a more quantitative understanding of the relation between RAPID and transcript abundance, we compared RAPID with RNA-seq data, using one of the published young adult RNA-seq data sets (Miki et al. 2017). We ranked all genes based on their RAPID signal and calculated the average RNA-seq signal across 30 equally sized bins. We observe a clear correlation between RAPID and RNA-seq: genes with high RAPID signal are highly expressed as determined by RNA-seq, while lowly expressed genes harbor low RAPID levels (Figure 1C).

To understand how the methylation signal spreads along the body of the genes, we constructed metagene plots over all 20,000 C. elegans genes for RAPID and RNA Pol II ChIP-seq experiments. ChIP-seq signals show a characteristic profile, as the Pol accumulates at the transcription start site (TSS) and the transcription end site (TES) (Figure 1E). RAPID signal increases steadily from the TSS to the TES, where it peaks before decreasing from the TES onward (Figure 1E). This 3′ end accumulation is similar to the one observed in ChIP-seq. In contrast, the difference between ChIP-seq and RAPID profiles (absence of a 5′ peak in RAPID) is very likely due to the localization of the RPB-6::Dam fusion inside the RNA Pol complex. Located on the opposite side of the AMA-1/RPB-2 complex relative to the DNA strands, the presence of the preinitiation complex at the promoter greatly restricts access to DNA [for review, see Cramer (2004); Schier and Taatjes 2020]. Once the Pol switches to elongation, leaving the preinitiation complex on the promoter, Dam is likely to gain access to the DNA and efficiently methylates the transcribed region, as observed in the RAPID profiles.

We also examined the profile in genes transcribed by Pol I and Pol III. As expected from the differential inclusion of AMA-1 and RPB-6 subunits in the different Pols, RAPID with RPB-6 also labeled genes transcribed by RNA Pol I and III. RAPID showed high enrichments on the ribosomal RNA genes at the end of chromosome I (Figure S2A), as well as a majority of previously characterized small nucleolar RNAs (snoRNAs) transcribed by RNA Pol III (46 out of 57; Ikegami and Lieb 2013). snoRNA are sometimes located in introns of Pol II-transcribed genes, making it difficult to differentiate whether the RAPID signal originates from the snoRNA transcription or from the overlapping gene. Nevertheless, 24 out of 44 RAPID-positive snoRNA genes could be unambiguously assigned to Pol III transcription as there was no overlap (Figure S2B). In contrast, the corresponding signal was markedly weaker in the young adult RNA Pol II ChIP-seq data sets and very high for the RNA Pol III subunit RPC-1 (performed in embryos, Figure S2B; Ikegami and Lieb 2013).

Taken together, these data show that the RAPID method is suitable to reveal RNA Pol footprints, serving as indirect indications of transcriptional activity by the three different RNA Pols.

RAPID in FACS-sorted embryonic cells

Since RAPID could label transcribed genes in whole animals, our next goal was to adapt the method for tissue-specific analysis. We first explored whether RAPID could be used to footprint transcription in specific cell types of the embryo purified using fluorescence sorting. Using strains constitutively expressing Dam fusions, we sorted 1000 cells for 3 different tissues of various abundance: body wall muscle, intestine, and the rectal Y cell (80, 20, and 1 cell per embryo, respectively; Figure S3A and Table S4). RAPID could produce footprints for all samples (Figure S3C), and the reproducibility of the RAPID signal at the gene level between replicas was good for the intestinal sample (r = 0.78) but lower for muscle and Y cell samples (r = 0.46 and 0.24, respectively; Figure S4A). When using an FDR < 0.05, we detected a total of 4986, 3165, and 4819 genes with a high RAPID signal in the muscle, intestinal, and Y data sets, respectively; and 1566, 1570, and 840 consistently detected genes (FDR < 0.05 in both replicates) in these tissues (Table S4). In these experimental conditions, RAPID appears suitable to identify transcription footprints of cell-type-specific genes, such as the intestinal genes cpr-6 (cathepsin B; Pauli et al. 2006) and elo-2 (fatty acid elongase ELOVL; Han et al. 2017), or the Glutathione S-Transferase gene gst-4 expressed in muscle, as well as the muscle-specific WDR1 homolog unc-78 (Figure S3C; Mohri and Ono 2003; Hasegawa et al. 2007). Comparison of the footprints obtained for all three tissues allowed us to define 395 Y-specific genes (Figure S3B), and among them, shallow Y-specific RAPID signal was visible for ceh-6 and sox-2, two transcription factors previously shown to be involved in Y-to-PDA transdifferentiation initiation (Figure S3C; Kagias et al. 2012). In contrast, we observed some unexpected RAPID footprints for genes that we expected to be expressed in a tissue-specific manner. For example, the myosin myo-3 was observed in all three tissues (Figure S3D). Conversely, some genes which are known to be expressed in tissue-specific manner did not show high RAPID values, e.g., the intestine-specific asp-1 and spp-5 genes or the muscle-specific pat-10 gene (Figure S3D). These differences might have a biological significance: most transcription patterns have been defined in older animals, yet those genes might be expressed more broadly, or not at all, in embryonic blastomeres. Alternatively, the lack of a robustly established transcription pattern might hinder faithful footprinting. Indeed, RAPID requires the enzymatic action of the methyltransferase to build up the signal, while footprints are removed at each replication round in rapidly dividing blastomeres. Together with the stochasticity of single methylation events, this might decrease the precision of the profiles. In agreement with this, intestinal cells, most of which are born and remain postmitotic early, display high reproducibility in the RAPID signal, while late-born muscle and the Y cell are less reproducible (Figure S3, C and D). In conclusion, RAPID allows transcriptional footprinting of sorted cells, provided cells have been born long enough for the Dam fusions to methylate reproducibly their target genes. These results thus suggest that RAPID may be more suited for the study of later developmental stages.

Tissue-specific RAPID in young adult animals

We next asked whether we could use the RAPID approach for the analysis of tissue-specific transcription profiles in adult worms. To circumvent the technical difficulties associated with mature animal dissociation and cell sorting, we sought to implement a transgenic approach for the cell-specific targeting of the RPB-6::Dam fusion. Several strategies have been developed to target Dam fusion expression to specific tissues, using either bicistronic constructs attenuating Dam translation (Southall et al. 2013), tamoxifen-induced nuclear translocation (Pindyurin et al. 2016; Redolfi et al. 2019), or tissue-specific recombination cassettes (Muñoz-Jiménez et al. 2017). We used the latter strategy using an insertion of a floxed mCherry ORF located between the hsp-16.2 promoter and the Dam fusion (Figure 2A). Tissue-specific expression of the Cre recombinase leads to low-level expression of the Dam fusion (Ruijtenberg and van den Heuvel 2015). We used three different Cre drivers expressed in muscle, intestine, and a pair of head neuroendocrine cells called XXX, using myo-3, elt-2, and sdf-9 promoters, respectively. We conducted cell-specific RAPID with only 4000 young adult hermaphrodites as starting material for DNA extraction. DamID signal across genes was highly correlated between replicates, and less correlated between different Cre drivers, suggesting tissue-specific DamID footprinting (Figure S4B).

Among the genes with high RAPID signal, we defined sets of detected genes in each tissue (with an FDR < 0.05 in at least one replicate), representing 4379, 4583, and 3901 genes with high RAPID footprint in intestine, muscle, and XXX cells, respectively. Among these, we also defined sets of consistently detected genes (with an FDR < 0.05 in both replicates), encompassing 2052, 2296, and 2362 genes in intestine, muscle, and XXX cells, respectively. Out of these, 1011, 747, and 862 were detected in only one tissue and defined as tissue-specific (Figure 2B, Table S5, sheet 5). A total of 536 genes harbored high RAPID footprints in all three sampled tissues (Figure 2B, Table S5, sheet 9), and a GO term analysis highlighted many very general terms consistent with the function of housekeeping genes (Table S5).

We validated these results focusing first on individual genes with well-characterized, tissue-specific expression (Figure 2C). The intestine-specific progastricsin aspartyl protease homolog gene asp-1 and the saposin-like channel defense gene spp-5 were exclusively footprinted in intestinal samples (Tcherepanova et al. 2000; Roeder et al. 2010). myo-3, the muscle myosin gene and pat-10, a troponin gene, both exclusively expressed in muscle tissues, harbored high RAPID footprints in muscle samples (Miller et al. 1986; Kagawa et al. 1997). Finally, the XXX-specific genes daf-9 (encoding a P450 cytochrome homolog) and eak-4 (encoding a membrane protein) were marked with RAPID exclusively in XXX cells (Gerisch et al. 2001; Hu et al. 2006).

Comparison of tissue-specific RAPID with available data sets in muscle and intestine

To expand the validation of our RAPID footprint gene lists beyond selected tissue markers, we next engaged three more comprehensive approaches. These approaches focused on muscle and intestine, since no large-scale gene expression data are available in XXX cells.

First, we focused on a narrower set of genes with high RAPID footprints present exclusively in one of the tissues. These gene lists with the top 500 unique genes of each tissue, were analyzed with a computational method that predicts tissue-specific enrichment, using a multigene query (http://worm-tissue.princeton.edu/; Chikina et al. 2009). The tissue prediction tool showed that muscle-only and intestine-only RAPID footprinted genes gave a high prediction score for muscle and intestine, respectively (Figure 3, A and B). In contrast, their scores were much lower for nontargeted tissues (including neurons, hypodermis, and germline).

Figure 3Figure 3Figure 3

Validation of RAPID muscular and intestinal transcriptomes. (A and B) Tissue enrichment prediction of tissue-specific RAPID transcriptomes. The top 500 hits of unique tissue-enriched genes for muscle (A) and intestine (B) were analyzed by the online tool Tissue-specific Expression Predictions for C. elegans (http://worm-tissue.princeton.edu/search). Tissue enrichment scores (mean and SEM) showed that the top genes of each tissue identify their corresponding original tissue. (C and D) Semiquantitative comparison between RAPID and FACS/RNA-seq in intestine and muscle (Kaletsky et al. 2018). All genes are represented on the x-axis, ranked from left to right, based on the RAPID signal in the considered tissue (shown on left y-axis). Averages of transcripts per million (tpm) for those genes were calculated, binning genes by pools of 690 (30 bins) on the x-axis (values on right y-axis). The muscular and intestinal marker genes presented in Figure 2C are indicated in green and blue, respectively. (E) Overlap of genes footprinted using RAPID with muscle-expressed genes detected by FACS (left; Kaletsky et al. 2018) and SRT (right; Ma et al. 2016). For the comparison to SRT data, the subsets of SL1-trans-spliced genes detected using RAPID and FACS were used. (F) Overlap of genes footprinted using RAPID with intestine-expressed genes detected by FACS (Kaletsky et al. 2018) and FANS (Haenni et al. 2012).

Second, we compared the genome-wide RAPID footprint signal levels with expression levels of genes determined by RNA-seq in FACS-sorted body wall muscle and intestinal cells (Kaletsky et al. 2018). As for whole animal samples (Figure 1C), we observed a correlation between RNA-seq levels and RAPID signals for both tissues (Figure 3, C and D). Similar correlations were observed when comparing RAPID signal with scRNA-seq data (Figure S8), with the caveat that the scRNA-seq experiment was performed in the second larval stage, not in young adults (Cao et al. 2017).

Third, we compared the RAPID-detected gene sets with those reported for adult muscle and intestine in RNA-seq-based studies [RNA-seq of FACS-sorted cells by Kaletsky et al. 2018), SRT (Ma et al. 2016), FANS (Haenni et al. 2012), or PAT-seq (Blazie et al. 2017)]. A majority of genes with detectable RAPID footprints were also identified with these tissue-specific transcriptomic methods. We found that 90% of the muscle RAPID hits (Figure 3E and Figure S5) and 87% of the intestinal RAPID hits were detected by at least one of the other methods (Figure 3F and Figure S5). Remarkably, the overlap between the genes detected by RAPID and other methods is in the same range as that found between other methods [our own data in Table S5, and similar analyses in Haenni et al. (2012); Ma et al. 2016]. Although these results are encouraging and suggest that RAPID hits are genuinely expressed in these tissues, we note that the number of genes detected with RAPID is overall lower than that detected with other methods, especially in the intestine, where FACS-sequencing and PAT-seq detect twice as many genes as RAPID (Figure S5C). To evaluate what kind of genes were not identified by RAPID under our experimental conditions, we performed GO analyses on the common gene sets (identified by two methods) and method-specific gene sets (identified by one but not the other method) (Figures S6 and S7 and Table S6). For this purpose, we used the WormCat analysis tool (Holdorf et al. 2020), specifically developed to categorize and visualize C. elegans GO data. For the RAPID/FACS and RAPID/PAT-seq common gene sets, we mostly found GO categories related to widely expressed genes (Development, Transmembrane proteins, Signaling, Metabolism…) and categories that one would expect in these tissues (Figures S6 and S7). For example, significantly enriched categories in muscle included Muscle function and Cytoskeleton; and in the intestine, Metabolism of lipids, Trafficking of vesicles, Endocytosis, and Lysosomes (Figure S7). GO categories associated with RAPID-missed genes (only-FACS, o

留言 (0)

沒有登入
gif