Determining chromatin architecture with Micro Capture-C

Within the nucleus, chromosomal DNA combines with protein and RNA to form highly complex, dynamic structures that play a key role in determining gene expression1,2. Chromatin structure also plays vital roles in other key cellular processes including the DNA damage response3, DNA replication4, V(D)J recombination5 and cell division6, and it can become dysregulated in the context of disease7. Assaying the three-dimensional (3D) contacts of chromatin is thus an invaluable tool for multiple research questions.

Chromatin architecture is organized on a number of different scales8. The chromosomes are largely segregated into chromosomal territories9. Within these, chromatin forms large-scale structures called topologically associated domains (TADs)10. TADs tend to colocalize to form ‘A’ and ‘B’ compartments11. The A compartments generally contain more active euchromatin and are preferentially localized to the center of the nucleus in nearly all cell types. Whereas the ‘B’ compartments colocalize with heterochromatin, which preferentially localizes toward the nuclear periphery in lamina-associated domains12. Within TADs, there are further layers of complexity, with subcompartments containing genes and enhancers13. Inside these compartments enhancers, promoters and CTCF sites make highly specific contacts14, which are probably mediated in part by transcription factors. Furthermore, we are now able to resolve contacts between individual transcription factor binding sites and we have recently identified that complex ordered structures occur on an even smaller scale, within individual enhancers and promoters15.

Our understanding of DNA structure has been informed by several orthogonal techniques including fluorescence microscopy16 and genome architecture mapping (GAM)17 but many of the key discoveries have been made using chromosome conformation capture (3C)-based methods, which remain the best way of determining sequence-specific contacts at high resolution18. Although the 3C field was initiated by the seminal paper by Dekker et al.19, the first restriction enzyme-based proximity ligation assays were undertaken to detect DNA–DNA interactions in plasmids in the late 1980s20 and subsequently in minichromosomal DNA21. However, it was Dekker et al. who described the principle of cutting fixed eukaryotic chromatin with restriction enzymes followed by religation to detect physical proximity.

Over the past two decades, improvements in these techniques have resulted in marked increases in throughput, sensitivity and resolution. Initially, individual ligation junctions were painstakingly detected using polymerase chain reaction (PCR) and gel electrophoresis or quantitative PCR (qPCR)22. The adoption of high-throughput sequencing to analyze these libraries dramatically improved the sensitivity and scale of these assays. Although possible in smaller genomes, such as bacteria23 and fly24, the full human 3C library cannot be sequenced due to the cost of sequencing such large amounts of DNA. Instead, methods have been developed to sample all fragments containing ligation junctions (‘all-versus-all’ methods, e.g., Hi-C25, DNase Hi-C26,27 and Micro-C28) or focus on specific regions of the genome (‘many-versus-all’ methods, e.g., Capture-C29 and 4C-seq30), reducing sequencing requirements and markedly improving data quality.

One limitation of these 3C techniques is that their resolution is inadequate below ~500 bp (Fig. 1a,b). Recently we have developed a new method called Micro Capture-C (MCC)14, which allows DNA structures to be determined down to base pair resolution, thus representing a substantial advancement in the field of 3D genome architecture (Fig. 1b,d). Given that the key proteins that drive the formation of chromatin topology such as insulator elements (e.g., CTCF) and transcription factors bind short sequences of DNA (7–20 bp), this technique is providing unique insights into gene regulation because it allows the sites of binding of the proteins involved in mediating contacts to be determined precisely (Fig. 1d).

Fig. 1: Comparison of MCC with other techniques at the Klf1 locus.figure 1

a, Hi-C at 40 kb resolution in murine erythroid cells. The whole of chromosome 8 is shown on the left, and the 5 Mb region, which encompasses the Klf1 locus, on the right75. b, The 400 kb region encompassing the gene-dense Klf1 locus showing a comparison of MCC with Capture-C31 and Promoter Capture Hi-C76 profiles from the promoter of Klf1 (ref. 31). Interestingly, genome editing of the enhancers in the red box has been shown to alter Klf1 expression77. c, The region highlighted with the black box in b, demonstrating contacts with a potential regulatory element very close to the gene promoter (<2 kb), which are not easily visualized with other techniques. d, Boxed region in c showing base-pair resolution data of ligation junctions separated by whether the junction is upstream or downstream of the read.

This paper discusses the experimental design and protocol of an MCC experiment in detail and places the technique in context of other 3C and non-3C methods to analyze chromatin topology. We will discuss expected results and new experimental opportunities arising from the development of MCC.

Development of MCC

We were able to substantially increase the resolution of 3C assays by applying five advances to the Capture-C method29,31,32,33, which previously provided the greatest sensitivity and resolution 3C data from individual viewpoints. Capture-C itself combines 3C library generation with targeted oligonucelotide capture, which allows very deep sequencing from hundreds to thousands of sites in the genome simultaneously from multiple samples. The improvements to Capture-C include replacement of strong detergents to maintain cellular integrity; use of micrococcal nuclease (MNase) in place of conventional restriction enzymes; extremely deep targeted sequencing; direct sequencing of ligation junctions and new bioinformatic approaches for precisely locating ligation junctions and visualizing data.

First, one of the limitations of 3C methods is that the resolution is dictated by restriction enzymes that cut in a sequence-specific manner. This caps the resolution because data are only generated at the restriction enzyme cut sites; interactions will always be between restriction enzyme cut sites at the end of fragments, rather than between the precise regions making the contact in vivo. Initially, 3C techniques used six-cutter enzymes, which generate data points on average every 5 kb, but this has now been superseded by approaches that use four-cutter enzymes, which generate a mean fragment size of 256 bp. However, restriction enzyme-based approaches struggle to generate data below 500 bp because the fragment sizes follow a geometric distribution, which means that the bin size needs to be substantially higher than the mean fragment size to prevent large numbers of bins without data (for a four-cutter restriction enzyme the bin size has to be >580 bp for over 90% of bins to contain at least one fragment). Recently, several groups have increased the resolution using two restriction enzymes34, which increases the potential resolution, with over 90% of fragments being <295 bp. We adopted the use of MNase because this fragments the genome largely independently of DNA sequence, and it has a propensity to cut in between nucleosomes35, which are the basic building blocks in the chromatin fiber.

Micro-C was the first 3C method to utilize MNase in place of restriction enzymes, with initial data produced in yeast28 before being applied to human cell types36. This all-versus-all method generates high-resolution contact data across the genome. Nuclease-based approaches are more technically demanding than restriction enzyme-based Hi-C methods, and the challenges with obtaining sufficiently deep data with Micro-C mean that for most applications, Hi-C generally offers higher resolution. However, the signal from MNase-based approaches (Table 1) delineates more precise contacts than restriction enzyme-based approaches. When combined with targeted capture we find that even subnucleosomal detail can be resolved when MNase digests are carefully titrated to maintain internucleosmal linkers

Table 1 Summary of the major differences between Micro-C and MCC

Second, we find that the resolution can be substantially improved through minimizing the disruption of nuclear architecture by the avoiding detergents required to make a nuclear preparation from cells. Previous 3C methods have largely used chromatin in solution or purified nuclei, which are usually extracted using detergents such as NP-40 to remove the cytoplasmic membrane33. This is generally required to enable restriction enzymes to cut chromatin. It is possible to permeabilize cells with digitonin and digest the chromatin with MNase, which substantially improves the signal-to-noise ratio by avoiding spurious trans ligations of chromatin between cells. Restriction enzymes such as DpnII are larger proteins and these do not digest chromatin adequately in cells that have been treated with digitonin.

Third, we generate extremely deep data from individual viewpoints (on average 120,000 up to 500,000 unique contacts per 120 bp viewpoint) equating to over 1,000-fold the depth of data obtained with all-versus-all approaches such as Hi-C and Micro-C. Over 3 trillion ligation junctions would be required for this depth of coverage genome wide. This is achieved by performing sequencing adaptor ligations, in parallel generating highly complex sequencing libraries.

Fourth, we directly sequence the ligation junctions. This allows us to locate precisely two regions of interacting DNA and generate contact maps with base pair accuracy. We reconstruct single reads from paired-end (PE) sequencing by sonicating MNase 3C libraries to 200 bp fragments and sequencing with 300 bp reads. Most conventional 3C approaches such as Hi-C and Micro-C sequence the libraries with 50 bp paired end of the reads, which is sensible to reduce the cost per read and allow deeper sequencing. However, this means that the position of the ligation junctions is inferred, which limits the resolution for Micro-C.

Finally, we have developed a new analysis pipeline on the basis of using nonstringent aligners to identify ligation junctions precisely within reads. In addition, our novel bioinformatic approach allows us to footprint the contacts in a manner analogous to DNase I footprinting37 and reconstruct detailed chromatin interactions.

In addition, by designing adjacently binding oligo probes over a much larger region, MCC can also generate Hi-C (all-versus-all)-like data (Tiled-MCC)38. These datasets are higher-resolution maps than those that can be achieved by Micro-C and Hi-C. Due to the subnucleosomal levels of detailed afforded by the MCC methods, it can be used to visualize intraregulatory region contacts, showing the complex topological landscape within promoters, enhancers and insulator elements.

Applications of MCC

MCC can be used to target any region of the genome so long as a unique oligonucleotide probe can be designed. MCC has already been used to study promoters, enhancers, super-enhancers, and insulator and boundary elements14,38. The increased resolution of MCC allows previously impenetrable loci to be investigated, particularly since many genes are clustered in gene-dense loci in the genome39. MCC can produce interpretable tracks at these loci and start to untangle the underlying biology. For example, in mouse erythroid cells, the Klf1 locus contacts 15 other promoters and enhancers within the TAD (Fig. 1).

Many enhancers are located within 5 kb of a gene promoter40. Most 3C techniques are unable to differentiate close contacts but MCC is able to define specific contacts extremely close to the viewpoint, for example it shows a specific contact with an enhancer 2kb upstream of the Pou5F1 promoter14. The high signal-to-noise ratio also allows the absence of 3D contacts to be observed. For example, the silenced embryonic genes at the α and β globin loci are clearly seen not to contact the active enhancers with MCC41.

The largest impact from MCC is likely to be in interpreting the effects of disease-associated variants in the noncoding genome. This is exemplified by the demonstration that the major genetic determinant of death from infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)42 is caused by a gain-of-function variant in an enhancer that specifically contacts the promoter of the LZTFL1 gene43.

Comparison with other methodsAll versus all

Hi-C couples the 3C principle with massively parallel sequencing, to visualize chromatin contacts genome wide25,34. Hi-C creates contact matrices that sample all interactions from all regions of the genome (all versus all). Hi-C has been pivotal in elucidating key principles in the organization of the genome, such as chromatin territories44, TADs10, X-inactivation45,46 and even interchromosmal trans-interactions47. The resolution has markedly improved with optimization of the technique and increased sequencing depth34,48. For example, to circumvent the limitation in resolution inherent with sequence-specific restriction endonucleases, MNase (Micro-C28) and DNaseI (DNase Hi-C26) can be used in 3C approaches. However, it is challenging to sequence the libraries to sufficient depth to generate higher-resolution data compared with restriction enzyme-based approaches.

The major limitation of this family of techniques remains the cost of sequencing to the depth required for high-resolution maps. At any given region, MCC will produce much greater resolution, and using tiled MCC one can generate all-versus-all data at Mb-regions. However, if one is interested in global chromatin topology, Hi-C and Micro-C remain the techniques of choice.

Many versus all

Many 3C methods improve data quality by targeting specific regions of the genome. This improves the data quality and reduces the cost per sample, allowing different cell types and experimental conditions to be analyzed. The downside of these approaches is that these protocols are more complex and great care needs to be taken to minimize loss of library complexity, which can result in large numbers of PCR duplicates. This is particularly problematic with protocols with more than one enrichment step and can even result in worse data quality than nontargeted approaches.

Many-versus-all methods either enrich for specific regions of the genome on the basis of sequence, or they combine 3C with chromatin immunoprecipitation to enrich contacts between specific histone markers or proteins (such as polymerase or CTCF). Techniques such as ChiA-PET49, PLAQ-seq50 and HiChIP51 offer the possibility of defining genome structure genome wide from all sites of interest. However, the immunoprecipitation step in these techniques leads to loss of library complexity, which limits the depth of data at individual loci. There is also potential for co-enrichment bias to erroneously identify contact between regions that are enriched by the antibody. We therefore favor methods that enrich specific sites in the genome.

The first such method developed was 4C30,52, which circularizes small ligated fragments and uses inverse PCR to generate high-resolution interaction profiles. A number of seminal studies have leveraged 4C, including the complex regulation of the Hox locus53, linking disease-associated single nucleotide polymorphisms (SNPs) in the FTO locus to the IRX3 gene54, and monitoring enhancer–promoter interactions during Drosophila development55.

Targeted oligonucleotide capture approaches such as Capture-C33, Tiled Capture-C56 and Capture Hi-C57 have a substantial advantage over 4C because they allow very flexible experimental design, allowing anything from a single viewpoint up to tens of thousands of sites to be assayed in a single experiment. In addition, they have intrinsic PCR duplicate filtering, analogous to UMI-4C52.

MCC was initially developed from Capture-C, which is a well-established technique with years of optimization29,31,32,33. Although it is unable to generate data with the same resolution of MCC, Capture-C is more straightforward to undertake and the profiles are interpretable with lower-quality data than MCC. In addition, both Capture-C and Tiled Capture-C can give good-quality 3D data from as few as 2,000 cells56,58, making them the technique of choice when working with rare primary cell types or patient samples. Since the Capture-C protocol is very similar to MCC but more forgiving, we would recommend that researchers without extensive 3C experience consider gaining experience with Capture-C before undertaking MCC.

Non-3C methods

There are also non-3C methods for the study of chromatin topology. GAM17 takes ultrathin slices of nuclei before sequencing to assay the position the chromatin in a single-cell manner. GAM and Hi-C contact matrices largely agree, although GAM seems to better detect contacts in active euchromatin whereas Hi-C has more detail in inactive heterochromatin, when comparing the contacts directly59. Recently, GAM has been applied to specific mouse brain cell types (immunoGAM)60 and discovered ‘melting’ of long active genes in those cell types, showing that this technique can offer insight not seen by traditional 3C methods. Other non-3C methods (which work at the single cell level) include split-pool recognition of interactions by tag extension (SPRITE)61, DNA-seqFISH+62 and DIP-C63. Although the resolution of these methods is improving, at present these techniques are largely restricted to the study of large-scale chromosome formations.

Expertise needed to implement the protocol

The molecular biology techniques utilized in MCC, such as DNA ligation, library preparation and streptavidin bead pull down are standard practice in many other protocols. Titration of MNase concentrations improves with experience of the technique and requires careful laboratory technique to minimize sources of noise. This protocol requires a large number of steps and therefore can be challenging for those with limited laboratory experience. Data analysis requires basic understanding of the Linux command line and we provide an in-depth protocol below.

Limitations

The major limitation of MCC currently is the large number of input cells required to generate useable libraries (currently ~3 million per replicate, with at least six replicates being required for footprinting quality data). This will make study of rare primary cell populations or patient samples challenging, and conventional Capture-C33 would be a more appropriate method.

MCC has a small but quantifiable bias toward detecting contacts between regions of open chromatin. Extensive work to quantify this bias through sequencing both the digestion controls (which are nonligated) and MNase libraries has shown that this bias mainly results from the ligation step rather than incomplete digestion. We have found that there are ~40% more ligation junctions in regions of open chromatin compared with the inactive chromatin background. Importantly, at hypersensitive sites there was no correlation with the degree of DNase I hypersensitivity. This is partially caused by a biological effect. In nucleosome depleted regions there are greater numbers of potential sites for MNase to cut because transcription factors bind shorter DNA sequences than histones, and this results in shorter fragments and larger numbers of potential ligation junctions. It is possible to correct for this effect by sequencing the raw 3C library but this probably is not necessary unless it is important to detect very small differences in contact frequencies between a hypersensitive site and one that is not hypersensitive.

Like all 3C techniques, MCC is not dynamic; it takes a static snapshot of chromatin conformation. A time course can be used to increase dimensionality64, but 3C methods can be complemented by microscopy studies of chromatin conformation such as live cell imaging16 to gain a better understanding of spatiotemporal and dynamic elements of chromatin topology.

Experimental designSamples

The high-throughput and multiplex nature of MCC, through the use of readily available 120 bp biotinylated oligos and captures performed on pooled libraries, makes designing experiments relatively straightforward. Different cell types, conditions, genotypes, etc. can be compared and batch effect minimized by the pooling of uniquely indexed libraries into one tube before capture.

Interpretable read pileup data is generated from every replicate, and data of this resolution suffices for most applications. The analysis pipeline produces these tracks by default. These data are plotted without windowing but have an effective window size of ~100 bp, compared with plots of base pair resolution, due to the size of the collapsed reads. It is more challenging to generate footprinting quality data and this normally requires merging data from at least six replicates; we usually use a minimum of two technical replicates for each of three biological replicates (Fig. 2c).

Fig. 2: Overview of experimental workflow.figure 2

a, The 3C principle involves initially fixing chromatin with formaldehyde, which causes covalent cross-links between proteins and nucleic acids (Steps 1–6). In MCC the cells are permeabilized with digitonin, in contrast to other methods which generally use stronger detergents to extract nuclei or chromatin. The chromatin is then digested using an endonuclease. In MCC we have adopted the use of MNase, which cuts between nucleosomes and between transcription factor binding sites in nucleosome deplete regions (Steps 7–14). A blunt ligation reaction is then performed, which requires prior end repair for MNase digested chromatin (Steps 15–22). This results in ligated fragments that can be used to define which sequences are in proximity in the nucleus. b, Overview of NGS library preparation and oligonucleotide capture process. DNA is first sonicted to 200 bp (Steps 23–32). Sequencing adaptors are added (Steps 33–42). The material is hybridized to a pool of oligonucleotides (Steps 43–54) and pulled down with a streptavidin bead cleanup (Steps 55–78). The material is PCR amplified from the beads (Steps 79–84), and the hybridization reaction is repeated to improve purity (Steps 85–91). Following this, the material is analyzed using PE sequencing (Step 92). Note that the central region of the fragments is sequenced from both sides, which subsequently allows reconstruction of the entire fragment. c, Overview of strategy for generation of lower-resolution data for read pileup tracks compared with footprinting quality data. Figure adapted with permission from ref. 14, Springer Nature Limited.

Viewpoint selection and design

One of the maj

留言 (0)

沒有登入
gif