Expanding the list of sequence-agnostic enzymes for chromatin conformation capture assays with S1 nuclease

Boosting data yield in Hi–C protocol through DNase I substitution with S1 nuclease

Previously, we established a robust and efficient DNase I Hi–C protocol, yielding high-quality Hi–C maps [11]. Nevertheless, this method generated more dangling ends (DEs) compared to the traditional Hi–C approach. DEs represent non-chimeric DNA fragments that do not contribute valuable information for Hi–C analysis. These fragments are reduced during Hi–C library preparation through a following process: DNA ends are labeled with biotin after chromatin digestion; biotinylated nucleotide internalization following DNA end ligation; biotinylated nucleotides remaining on the unligated DNA ends are removed using exonuclease; molecules containing internal biotin (i.e., ligation products) are enriched by streptavidin pulldown. We suspect that the surplus of DEs in the DNase I Hi–C protocol is attributed to the nickase activity of DNase I. Following DNase I digestion, the 5’–3’ exonuclease activity of the Klenow Fragment elongates the nicks, incorporating biotin–dCTPs into the DNA molecule. Consequently, not only are the ligation products internally labelled with biotin, but also DNA molecules that have not participated in the ligation process.

Like DNase I, S1 nuclease is a sequence-agnostic nuclease capable of cleaving dsDNA, nicks, and ssDNA. We theorized that these activities of S1 nuclease could prevent the generation of nicked DNA during chromatin digestion. To verify this, we first ensured that the S1 enzyme could digest formaldehyde-fixed chromatin, producing a discernible digestion pattern (Fig. 1A). Subsequently, we modified the cell lysis and chromatin fragmentation steps (see Methods) to make the Hi–C protocol compatible with the S1 chromatin digestion. Using this modified method, we prepared S1 Hi–C libraries for 16 human peripheral blood samples and the K562 human immortalized cell line. During libraries preparation, we examined the products of digestion and ligation steps and found that they satisfy Hi–C quality standards (Fig. 1A), although we note that the pattern of chromatin fragmentation by S1 nuclease looks slightly different for different cell types (Fig. 1A).

Fig. 1figure 1

S1 Hi–C protocol allows the generation of high-quality Hi–C maps. A Chromatin digestion and ligation of K562 cells and peripheral blood mononuclear cells. Lanes M show a 100 bp DNA ladder. 1 — intact gDNA, 2 — S1 digestion of cross-linked chromatin, 3 — ligation of S1-digested chromatin from lane 2. B Quality metrics of S1 Hi–C and DNase I Hi–C data sets. Each dot represents an independent Hi–C library preparation; we analyzed 14 DNase I Hi–C libraries [11] (protocol with biotin fill-in) and 16 S1 Hi–C libraries. P-values were calculated using the Mann–Whitney test. (∗ ∗) indicates p-value < 0.01, (ns) indicates p-value > 0.05. C Representative heatmap of chromatin interactions in K562 cells obtained using DNase I Hi–C protocol (below the diagonal line) [11] and S1 Hi–C protocol (above the diagonal line). D Genome-wide read coverage depth histograms. Each histogram shows distribution of coverage depth for 500 bp genomic windows. Data were obtained by merging all replicates E Boxplot showing coverage distribution similar to D, but for each replicate independently. Numbers near boxplots present quantification of interquartile range. F Boxplot showing distribution of Spearman’s correlation coefficient, calculated between pairs of Hi–C matrices for replicates. Numbers near boxplots represent median value

Next, we performed shallow sequencing of these samples to assess data quality and deeper sequencing for K562 sample to produce Hi–C map (Fig. 1B, C). The quality assessment shows that compared to DNase I Hi–C, S1 Hi–C produces similarly high-quality data (Fig. 1B); however, the quantity of DEs was lower for S1 Hi–C, resulting in higher overall yield of valid Hi–C pairs. This confirms that the fraction of DE is a consequence of DNase I nickase activity, and that replacing DNase I with S1 can reduce the amount of DE fragments. In addition, S1- and DNase I-based Hi–C assays produce the smallest number of PCR duplicates compared to MNase and DpnII Hi–C data (Additional file 1: Fig. S1).

The DNase I enzyme exhibits a crucial characteristic: its ability to generate Hi–C libraries with a relatively even coverage distribution. Despite a moderate enrichment of A-compartment sequences in DNase I Hi–C libraries [28], this enrichment is less pronounced than that observed at the ends of restriction fragments in conventional Hi–C libraries. To evaluate the potential coverage bias in S1 Hi–C libraries, we computed the distribution of coverage depth across the genome in S1, DNase I, MNase, and DpnII Hi–C samples from K562 (Fig. 1D). Our findings revealed that S1, MNase, and DNase I Hi–C maintain relatively even coverage distribution. The same result can be obtained using interquartile range of coverage distribution as measure of its uniformity, as shown in Fig. 1D. In contrast, DpnII Hi–C exhibits a bimodal distribution, highlighting the disparity between sequences proximal and distal to restriction sites. Median coverage of loci attributed to A- and to B-compartment was almost similar in case of DNase I and S1 enzymes. In the MNase Hi–C data we observed a slight preference towards higher coverage of A-compartment (Additional file 1: Fig. S2A). We also performed more detailed analysis of coverage including 15 chromatin states annotated by HMM tool and did not identify any substantial bias specific to S1 enzyme (Additional file 1: Fig. 2B).

Fig. 2figure 2

Identification of TADs, Loops and Chromatin Compartments using Hi–C data produced with different enzymes. A Quantification of the compartment strength using saddle plots. Left plot shows how preference of homotypic interactions (i.e., interactions within the same compartment) of the locus scales with its compartment attribution score. Right panel shows the same data in the form of a single score computed as area under the curve. B Aggregate TAD analysis of the Hi–C maps. The average insulation score is shown inset into each corresponding panel. C Aggregate loop analysis of the Hi–C maps. The average strength of loops is shown inset into each corresponding panel

We next assessed how well patterns of Hi–C data, such as TADs, loops and compartments can be detected using different enzymes. For this analysis, we again utilized Hi–C data for K562 cells, which were generated using different enzymes. As can be seen from Fig. 2, A, compared to MNase data, results obtained using S1 and DNase I provide better contrast of within vs between compartment interactions. On the other hand, TADs and loops are better resolved using MNase enzyme (Fig. 2B, C). Enzymes with specific recognition site, MboI and DpnII, generate data that is comparable with MNase in compartments analysis, and show the worst result in TADs and loop detection benchmarks (Fig. 2A–C). In addition, S1 Hi–C data show highest reproducibility scores (measured as average Spearman’s correlation between replicates) (Fig. 2F).

Finally, we assess robustness of S1 Hi–C method across cell types. For this aim, we prepared Hi–C libraries from human fibroblasts and iPS cells using the same chromatin digestion conditions as for K562 and PBMC cells. In both cases, we obtained high-quality results (Additional file 3: Table S2, Additional file 1: Fig. S1). This indicates that chromatin digestion conditions presented here are robust enough, and although optimization might be required for specific cell types, the released protocol can be used as a starting point in S1 Hi–C experiments.

Altogether, our results show that the use of S1 nuclease for chromatin fragmentation makes it possible to achieve the coverage uniformity as in DNase I and MNase Hi–C and, at the same time, to improve the quality compared to DNase I Hi–C data in terms of dangling ends fraction.

Evaluating the cut site distribution in chromatin following S1 nuclease digestion

Our data suggest that S1 nuclease can be used to digest chromatin, which opens the possibility to apply S1 digestion in various chromatin profiling applications. Although the Hi–C data analysis shows that S1 digestion is fairly uniform across the genome, the complex structure of Hi–C library molecules precludes the precise identification of cut site locations. Therefore, we decided to characterize profiles of S1 nuclease digestion using fixed chromatin, which we fragmented by the enzyme and sequenced. The analysis of the obtained NGS reads revealed that the genomic fragments generated by S1 nuclease start with guanine at their 5’-end approximately two-times more frequent than expected (Fig. 3A). Thus, S1 nuclease has a slight preference to cut the primary strand immediately upstream of the guanine. Interestingly, we did not observe enrichment of cytidine in genomic position immediately before cut site, which would be expected in the case of symmetric cut (Additional file 1: Fig. S3A, B). This suggests that S1 nuclease probably cleaves DNA strands asymmetrically to form 3'-sticky ends (Additional file 1: Fig. S3A, B); the length of the overhang cannot be determined from our data. These ends could be degraded by S1 nuclease or during subsequent end repair steps of the library preparation.

Fig. 3figure 3

S1 nuclease chromatin digestion pattern. A Motif logos representing the sequence specificity of S1 nuclease cut sites. Data are shown separately for 5’-(left) and 3’-(right) ends of the digested fragments. In both cases, we show the same (reference) strand. The arrow indicates the direction of the sequencing read, and the numbers indicate the distance from the sequenced fragment end: positive numbers for internal (located within the sequenced fragment) nucleotides, negative numbers for external (located outside the sequenced fragment) nucleotides. B K562 and peripheral blood mononuclear cells (PBMC) chromatin digestion by different concentrations of S1 nuclease. Lanes M1 and M2 show a 1000 bp and 100 bp DNA ladders, respectively. C Fragment size distributions of the mapped paired-end reads for different S1 nuclease conditions in K562 cells. D Fragment size distributions of the mapped paired-end reads for S1 nuclease, different MNase conditions and DNase I in K562 cells

To obtain a comprehensive map of DNA accessibility and understand how enzyme concentration affects digestion pattern, we treated fixed K562 cells chromatin with different concentrations of S1 nuclease: 10, 200, 500 and 1000 units. Gel-based analysis of the digestion products showed that, expectedly, higher enzyme concentration results in smaller average fragment lengths (Fig. 3B). The digestion pattern was slightly different for K562 and peripheral blood mononuclear cells (PBMC), the latter showing pronounced nucleosome-sized ladder.

To better characterize digestion profiles, we size-selected the digestion products to remove DNA fragments larger than 1000 bp and subjected the remaining DNA to NGS-library construction and paired-end sequencing. In all studied conditions, we observed clear mononucleosomal peak (~ 150–200 bp) (Fig. 3C). Interestingly, treatment of K562 chromatin with the highest S1 concentration (1000 units) results in two peaks, one corresponding to mononucleosomes (~ 150–200 bp) and another to dinuclosomes (~ 350 bp) (Fig. 3C). Besides that, there is little difference in fragment lengths distribution for studied S1 concentration (Fig. 3C). For PBMCs, we profiled single S1 concentration (200 units) and observed prominent mononuclesosmal peak accompanied by less pronounced dinucleosomal peak (Fig. 3D). The S1 fragments sizes distribution resembled the nucleosomal pattern observed for MNase, thus we reanalyzed data from [29] to compare S1 digestion pattern with the pattern produced by different MNase concentrations (Fig. 3D). MNase digests unprotected linker DNA between nucleosomes, while the DNA protected by the nucleosomes remains intact [30]. Low MNase concentrations generate fragment length distribution corresponding to mono-nucleosome-bound fragments and linker DNA. An increase in MNase concentration leads to a reduction of linker DNA due to its exonuclease activity and fragment length shift to 147 bp (Fig. 3D). Comparison of fragment length distributions suggests that S1 nuclease generates longer fragments than MNase under all conditions, arguing that it is more likely to introduce breaks between nucleosomes and has either no or reduced (compared to MNase) exonuclease activity.

Next, we aggregated the S1 nuclease, DNase I, and MNase DNA break location frequencies across annotated open chromatin features: ATAC-seq peaks and DNase I hypersensitive sites (HS) in K562 cells. As DNase I HS and ATAC-seq peaks both align with cis-regulatory elements, such as promoters and enhancers of actively transcribed genes, the aggregation of cut sites for these enzymes displays a high degree of concordance (Fig. 4). The location of MNase cut sites is dependent on enzyme concentration: at low concentrations, the signal heightened across open chromatin regions, implying that these were the first sites accessible to the enzyme. Conversely, at higher MNase concentrations, open chromatin regions were depleted due to elevated digestion of accessible chromatin. The pattern observed for S1 enzyme across DNase I or ATAC-seq peaks also shows similar trend. For low S1 concentration, we detect enrichment around open chromatin regions, whereas for higher concentration we observe reduced signal in the middle of the peak, followed by gradual increase with nucleosomal pattern (waves). This signal resembles the pattern observed for moderate or high MNase concentrations; however, nucleosomal pattern was less pronounced than for high MNase concentration, suggesting that S1 nuclease may not have the strong exonuclease activity required to digest linker DNA.

Fig. 4figure 4

Fragment size distributions of the mapped paired-end reads and signal distributions at: ATAC-seq peaks, DNase I hypersensitive sites, and TSS for different S1 nuclease conditions, different MNase conditions and DNase I in K562 cells. Blue line shows observed signal. Red curves and shaded area between them show average + −3 standard deviations of the data obtained from 100 random shuffles of genomic feature locations (i.e., obtained by shuffling ATAC-seq peaks, DNase I hypersensitive sites or gene promoters)

Nucleosome positioning and chromatin accessibility shows non-random pattern across gene promoters, with the level of accessibility correlating with the gene expression. We analyzed the chromatin cut frequency across transcription start sites (TSS) stratifying genes by the expression level (Fig. 4). Both S1 nuclease and MNase show decreased fragment ends frequency near TSS of actively expressed genes, presumably because these regions are accessible to enzyme and over-digested, producing very small fragments (or even digested into individual nucleotides), that cannot be captured by sequencing. The reduced signal region was broader in S1 nuclease data than in MNase data, and does not show a clear nucleosome pattern.

Drawing from these findings, we speculate that S1 nuclease exhibits greater activity on DNA bound to nucleosomes than MNase, while its exonuclease activity to digest linker DNA is either lower or non-existent. This results in a more uniform distribution of S1 nuclease cut sites in comparison with MNase (which is additionally supported by analysis of Hi–C reads coverage distribution presented in Fig. 1E). In relation to DNase, S1 nuclease presents a decreased representation of fragment ends within open chromatin regions, a pattern that may be attributable to the presence of exonuclease activity, or possibly due to high endonuclease activity that reduces these loci into fragments too minuscule for detection. Finally, S1 nuclease shows slight preference towards cleavage of guanine 5-phosphate bonds, leaving a 3’-overhang on the complementary strand. Despite these preferences, the overall pattern of S1 cuts is relatively uniform (compared to DNase I or MNase digestion) and thus allows studying both open and closed chromatin.

留言 (0)

沒有登入
gif