To develop sciMET-cap we produced a sciMETv2 library using the rapid, splint ligation (sciMETv2.SL) workflow on a human peripheral blood mononuclear cell specimen (PBMC). This preparation performed comparably to previously published sciMETv2.SL data generated on human cortex, achieving a median of 283,317 unique CG sites assessed for 1915 cells at a sequencing depth of 489,303 median raw sequence reads per cell (1.13 billion total raw reads). At this depth of sequencing, a median of 90.09% of reads were unique per cell, suggesting additional depth would yield higher CG coverage before reaching diminishing returns. However, this read depth was sufficient to perform cell type clustering using two strategies (Fig. 1c).
We first leveraged the genome-wide nature of the data, using 50-kbp non-overlapping tiles across the main-series chromosomes as previously described [15]. This produced 6 clusters (Fig. 1c), resulting in 5914 cluster-specific DMRs (Fig. 1d; q-value ≤ 1e − 5, |methylation difference|≥ 25%). We next utilized the set of windows targeted by the Twist Methylome Capture Panel by merging all probe locations across main-series chromosomes (hg38: chr1-22, X, Y) for a total of 636,303 windows spanning 123,043,166 bp (target). We reasoned that these windows, which were designed based on a wealth of bulk-cell methylome literature, represent the most variable regions of the methylome composed of CG islands and shores, tissue-specific differentially methylated regions (DMRs), enhancers, and other putative regulatory elements [21]. This produced a total of 4 clusters without clear separation (Fig. 1c) and a total of 813 cluster-specific DMRs (Fig. 1d).
Importantly, for purposes of comparison between conditions throughout this work, we leveraged the same clustering parameters for consistency. We then aggregated cell profiles within each cluster to perform cluster-specific DMR identification, again using consistent calling parameters and using the same set of 1500-bp windows that slide by 500 bp. For all DMR analyses, we include every 1500-bp window that meets significance criteria for each cluster comparison, meaning some DMR windows may overlap and would typically be collapsed in subsequent analyses. However, for comparison purposes, retention of all significant windows provides a more accurate comparison of DMR identification power. Similarly, windows that are called as significant in more than one cluster when compared to all other clusters are also retained for the same reason.
Evaluation of sciMET-cap target enrichment conditionsThe nature of sciMETv2 enables a high final library yield due to the 96-well plate final PCR amplification, providing ample material for sequencing as well as multiple target capture reactions. We therefore leveraged the input PBMC sciMETv2.SL library to perform three target capture tests: capture using the manufacturer’s recommendations with standard blocking oligos and a wash temperature of 63 °C (stdBlock), adding in a custom blocking oligo targeting the Tn5 recognition sequence that is present in sciMETv2 libraries and not in typical bisulfite sequencing assays, also with a 63 °C wash temperature (custBlock), and finally using standard blockers and an elevated wash temperature of 67 °C to increase stringency (wash67). Each capture library was sequenced separately and downsampled to an identical raw sequence read count of 350 million read pairs along with the no capture control (noCap) to enable a controlled assessment of each method's performance (Additional file 1: Fig. S1).
As anticipated, the raw read count per cell across each condition was comparable with a substantial decrease in unique aligned reads for the capture libraries versus non capture (median = 213,723 noCap, 175,255 stdBlock, 175,004 custBlock, 154,192 wash67) due to an overall reduction in library complexity when enriching for on-target fragments. However, the number of unique CG sites covered was improved for capture libraries with a median ranging from 134,432 to 139,006 versus 93,676 for the non-captured library, resulting in an improved CG capture rate per raw sequence read. Furthermore, the CG sites in capture libraries were enriched for the regions in the panel and therefore may be more informative for cell state due to the variable nature of these loci. Coverage was observed to be uniformly distributed between the Watson and Crick strands as well as the conversion status (C to T or G to A) when comparing conditions, which was true for all reads as well as reads aligning to either on or off target genomic regions, suggesting the capture process does not introduce a strand bias (Additional file 1: Fig. S1). Site-level methylation was then evaluated for all CGs, producing Pearson correlations ≥ 0.96 across all conditions, with similar correlations for on target (≥ 0.97) and off target (≥ 0.93) CG sites, as well as for CGs within the top 25% GC-content of 2-kbp bins (≥ 0.97) and bottom 25% (≥ 0.91), confirming that the capture process does not alter base-level methylation calls (Additional file 1: Fig. S2).
We next compared the three capture conditions for overall target enrichment. Across all metrics, little difference was observed between the standard blocker and standard + custom blocker conditions, both with a wash at 63 °C, suggesting that the addition of blockers specific to the unique sequences present in the sciMETv2 workflow is not necessary. However, the condition with a wash at 67 °C exhibited a greater target enrichment of aligned reads with a median of 10.1-fold (40.1% median on target) versus 9.3 (36.0%) and 9.4 (37.5%) for stdBlock and custBlock, respectively. This improvement was eliminated when assessing only unique reads, with enrichment values dropping to a median of 7.4 (29.4%), 7.7 (30.6%), and 7.1-fold (28.4%) for stdBlock, custBlock, and wash67, respectively, owing to the decreased library complexity incurred from target enrichment that was more pronounced in the wash67 condition. Taken together, the higher wash temperature produces a modest improvement in reads on target, yet at a substantial cost in the number of unique reads that can be obtained, resulting in less overall information content obtained per cell at the same raw sequencing depth. Summary metrics and per-cell values for these assessments can be found in Additional files 2 and 3, respectively.
Clustering and DMR identification using sciMET-cap dataFor each sciMET-cap condition, we next assessed the ability to identify distinct clusters and cluster-specific DMRs using the equivalent raw read count datasets. As with previous assessments, we leveraged either the capture target windows or tiling 50-kbp windows and performed dimensionality reduction, clustering, and UMAP visualization (Fig. 2a). As anticipated based on the full non-capture dataset, the target windows for the downsampled no-capture control performed poorly and produced only 4 clusters, while an improved 6 clusters were resolved using 50-kbp tiling windows. Conversely, each of the capture conditions produced greater cluster counts using the target windows. The custBlock condition identified 7 clusters and the stdBlock and wash67 conditions identified 8 clusters using a target windows. However, with 50-kbp tiling windows, all conditions produced 5 clusters.
Fig. 2Comparison on sciMET-cap conditions on clustering and DMR analysis. a Clustering and UMAP visualization of each of the four capture conditions and no capture control using either target windows (top) or 50-kbp tiling windows (bottom). b Comparison of cluster-specific DMRs identified across conditions for each window set and colored by Genscript and Genehancer annotations. c Genome browser views of CG methylation levels over 1500-bp windows tiling by 500 bp split by clusters across conditions for four regions
To further evaluate clustering performance, we aggregated all read data from each capture experiment as well as the full no capture input sequencing data (Additional file 1: Fig. S3a), to produce an aggregate dataset (allData; 1,629,970 median raw reads, 1,169,756 median unique reads, and 551,685 median unique CG sites covered per cell). We then performed clustering with both window strategies, producing 7 clusters using target windows which resulted in 44,034 cluster-specific DMRs, and 8 clusters using 50-kbp windows which resulted in 68,547 DMRs (Additional file 1: Fig. S3b). We then compared cluster assignments between each downsampled condition and the allData dataset which produced greater concordance for each of the three capture conditions when compared to the no capture control (Additional file 1: Fig. S3c).
We next performed cluster-specific DMR identification for each condition using the downsampled datasets and leveraging both target and 50-kbp window clustering results. Across both clustering sets, the capture conditions produced greater numbers of identified DMRs when compared to the no capture control, at 2497 (custBlock), 6224 (stdBlock), and 9984 (wash67) versus 1164 (noCap) for the 50-kbp window clustering with a far greater difference for the target window clustering at 20,346 (custBlock), 21,034 (stdBlock), and 23,832 (wash67) versus 694 (noCap). We then compared these DMR calls with those present in the aggregated allData dataset (Additional file 1: Fig. S3d). Of the capture experiments, the stdBlock condition identified the largest proportion of DMRs that were present in the allData call sets, with the fewest condition-specific calls. The remaining conditions also identified a majority of DMR calls present in the allData call sets. The no-capture downsampled control (least DMRs) identified a minority of regions from the allData call sets, with some entirely unique regions identified. When evaluating marker gene methylation across clusters, each of the capture datasets produced distinct patterns at cell type specific marker genes in a cluster specific manner. DMRs identified by our prior analysis were frequently found at the promoters of these marker genes, suggesting that our differential methylation analysis is identifying true cell-type-specific patterns of methylation. In contrast, the no capture control lacked specificity, showing a broad pattern of CG methylation across most clusters (Fig. 2c).
sciMET-cap can produce rich single-cell methylome datasets using a benchtop sequencerWe next sought to demonstrate the throughput of sciMET-cap that can be produced from a single sequencing run on a benchtop sequencer, specifically the Illumina NextSeq 2000 benchtop sequencer capable of producing up to 1.2 billion 100-bp paired reads in a single run. We prepared another sciMETv2 library using the rapid splint ligation workflow on PBMCs from the same individual, targeting approximately 4000 cells. We then performed a single capture using the standard blockers and blocking conditions and sequenced the library on a single flowcell. This resulted in 4037 single cell methylomes with a median of 233,400 raw reads per cell, producing a median of 166,270 unique CG sites covered per cell with a median unique read target enrichment of 7.2 (28.7% median on target). Overall capture performance was comparable to the prior work for half the cell count, suggesting capture probe saturation was not a factor, even when doubling the cell count.
The additional cells were then merged with the non-downsampled stdCap dataset detailed previously (501 M raw reads), based on the matching preparation and capture conditions to produce a dataset of 5952 PBMC sciMET-cap profiles. The dataset was then taken through target window clustering and UMAP visualization producing 14 distinct clusters divided into four major groups representing monocytes, B cells, NK cells, and T cells, with two branching groups of clusters representing CD4 and CD8 subsets (Fig. 3a). Cell type assignment was determined by assessing the methylation status of promoter regions at known marker genes and confirmed by comparing profiles to bulk cell WGBS profiles produced on sorted PBMC cell populations as a part of the BLUEPRINT Epigenome project [22] (Fig. 3b). Aggregated coverage from all cells within each cluster produced a median of 60.1% coverage of individual CG sites across both strands, correlating with cell counts within each cluster. For purposes of DMR calling, we calculated the percentage of 1500-bp windows with a minimum CG coverage of 10, resulting in a median of 96.0% of windows passing this cutoff, with only one cluster falling below 90% (P8, one of 4 CD4 + T cell clusters, 71.6%), suggesting nearly the entire methylome can be assessed for differentially methylated regions using aggregated cell coverage within clusters (Fig. 3c). These windows were then used in DMR calling, as with prior analyses [15], which produced a total of 195,483 called regions (Fig. 3d) achieved using a total raw read count of 1.7 billion using a high (P3) and low (P2) capacity flowcell using a benchtop sequencer.
Fig. 3Demonstration of sciMET-cap performance using a benchtop sequencer. a UMAP visualization and cell type clustering of a nearly 6000 cell sciMET-cap PBMC dataset. b Genome browser tracks of DNA methylation at marker genes. Top: Epigenome Blueprint bulk bisulfite sequencing data on sorted cell populations. Bottom: Aggregated cells within each cluster from sciMET-cap dataset. c Percent of individual CG sites covered by aggregate cell coverage for each cluster colored by cluster identity in a (left), for all CG sites on target (middle), and percent of 1500-bp windows used in DMR calling with a minimum coverage of 10 (right). d Cluster-specific DMRs with Genscript and Genehancer annotations
sciMET-cap achieves improved cell type clustering in human cortex with equal read depthWe previously described a high-coverage sciMETv2 workflow that leveraged a linear amplification workflow (sciMETv2.LA) which enables greater library complexity at the expense of increased costs and labor. We had applied sciMETv2.LA to a middle frontal gyrus specimen from a human cortex and produced a total of 927 cells with a median of 1.67 million unique CG sites covered from a median of 5.5 million raw reads per cell for a combined total of 6.98 billion raw sequence reads. This high-depth dataset was combined with two additional preparations using the sciMETv2.SL workflow for a combined total of 2546 cells which were used to identify 14 clusters that were assigned to cell types, providing high-quality cell type classifications for each cell [15]. Cluster identification was performed using 250-kbp tiling windows and assessing non-CG methylation (CH), a context that is only present in abundance in neurons within brain tissue and is highly neuronal subtype-specific [23]. To enable a baseline assessment of the high-coverage sciMETv2.LA dataset on its own, we re-analyzed the 927-cell dataset using CG methylation levels in target windows as well as CH methylation across 250-kbp windows. This was compared to the full dataset cell type clusters and showed good concordance for the 250-kbp CH methylation analysis strategy with decreased granularity at 10 versus 14 clusters. Evaluation of target window CG methylation permitted a coarse resolution of cell types, identifying only 7 clusters (Additional file 1: Fig. S4). We decided that this high-coverage preparation is an ideal control for comparing sciMET-cap, with ample library to carry out capture on the same input material.
We next performed target capture on the sciMETv2.LA library using standard blockers and washing conditions (cap) and sequenced the library with a total of 339 million raw reads (340 M) along with a random selection of the same raw read count from the original non-captured sciMETv2.LA library (noCap) and evaluated performance. Similar to the results of the PBMC experiments, the cap condition produced a median aligned read enrichment of 7.7 (30.6% on target) with a median of 1.29 unique CG sites covered per raw sequence read versus 0.75 for the noCap condition (Additional files 2 and 4). We next used the matched read count data to perform windowed methylation assessment, clustering, and visualization (Fig. 4a). When leveraging CG methylation across target windows, the capture condition produced 10 distinct clusters that matched well with previously annotated cell types in contrast to 4 clusters produced by the noCap control (Fig. 4b, S4a). For the CH windowed analysis, both conditions resolved 7 clusters, suggesting no improvement or detriment in clustering capabilities when incorporating capture. Notably, the ability to discern between excitatory and inhibitory neuron populations was not possible using CH methylation yet CG methylation over target windows produced a clear separation with a single inhibitory and two excitatory clusters (Fig. 4b). In concordance with cell type separation, marker gene methylation levels for the capture dataset were more concordant with the profiles of previously determined cell types (Fig. 4c). We then assessed the ability to call differentially methylated regions in the downsampled datasets, which produced greater numbers of cell-type-specific DMRs (Fig. 4d), with a greater overlap in called loci with those called from the high-coverage dataset (Additional file 1: Fig. S5).
Fig. 4sciMET-cap performance on human brain. a Comparison of sciMETv2 and sciMET-cap using matched read counts, either leveraging CG methylation (left) or CH methylation (right). b UMAPs as in a but colored by previously defined cell type. c Cluster-specific DMR identification comparison between sciMETv2 and sciMET-cap at matched read depth using either CG or CH methylation for cluster identification. d Genome browser views of CG methylation at marker genes comparing the full sciMETv2 dataset and the downsampled dataset with and without capture
Finally, we prepared a sciMET-cap library from the same brain region of a separate individual (cortex: middle frontal gyrus) using a single sciMETv2.SL preparation which was pooled for a single capture and then sequenced on a single P3 flowcell of an Illumina NextSeq 2000 benchtop sequencer, representing a typical experiment with a modest budget. This produced a library containing 6123 cells and produced 12 distinct clusters using both CG methylation over target windows and CH methylation levels in tiling windows across the genome, with clusters clearly splitting into neuronal and non-neuronal cell types based on global CH methylation levels (Fig. 5a). Marker genes were also examined based on promoter methylation patterns (Fig. 5b), though overall genomic coverage was lower per-cluster due to the reduced read counts present across the large cluster count. This suggests increased sequencing depth may be recommended for tissue types with high heterogeneity, either with higher depth per cell or with more cells at a comparable depth, to ensure high coverage genome-wide. Despite the reduced coverage, 64,774 DMRs were identified that broke down into similar annotations as other datasets (Fig. 5c).
Fig. 5Human brain sciMET-cap using a benchtop sequencer. a UMAP visualization of clusters, annotated by cell type (left) or colored by global CH methylation levels (right) for over 6000 methylomes produced in a single experiment with a single capture and sequenced on a single flowcell. b Genome browser tracks of CG methylation levels at marker genes for 1500-bp windows sliding by 500 bp. c Cluster-specific DMRs identified colored by Genscript and Genehancer annotations
留言 (0)