Methylation patterns at the adjacent CpG sites within enhancers are a part of cell identity

A subset of adjacent CpG sites in the human genome does not follow the principle of co-methylation

Identical methylation status of adjacent CpG sites (methylated or not methylated), referred to as co-methylation, is considered essential for the regulatory function of CGIs [9]. We analyzed genome-wide methylation dynamics between adjacent CpG sites at 87 258 CpG site pairs present on the EPIC microarray, spaced less than 50 bp apart and with no additional CpG sites not assayed by the microarray in between. We named these CpG pairs “adjacent CpG loci” and calculated methylation level difference (delta beta-value, see Methods for details) between base and the consecutive CpG site, constituting adjacent CpG loci.

As expected, initial analysis of methylation at adjacent CpG loci in 112 EPIC microarrays from four different studies obtained for peripheral blood samples [10, 20,21,22], showed, that vast majority of CpG sites within adjacent CpG loci are co-methylated and display identical methylation status (delta beta-value close to 0; Fig. 1a—white color in the heatmap).

Fig. 1figure 1

Methylation patterns at adjacent CpG loci in different types of healthy tissues. Unsupervised hierarchical clustering of samples based on methylation differences (delta beta-value) between base and consecutive CpG site of adjacent CpG loci, a in healthy blood samples from four independent studies and b in different types of healthy tissues, including thyroid, bone marrow, colon, breast, skeletal muscle, and whole blood

Interestingly, however, we also identified a subset of adjacent CpG loci, within which base and consecutive CpG site had different statuses of methylation in all analyzed samples and named these, discordantly methylated (Fig. 1a, red or blue color in the heatmap). We then performed identical analysis using methylation profiling data from 306 microarrays from five different types of healthy tissues: bone marrow, thyroid, breast, colon and skeletal muscle [23, 25,26,27]. Similarly, to the previous analysis, a subset of adjacent CpG sites was discordantly methylated in each of the tissue types (Fig. 1b), but more importantly some of these adjacent CpG loci were co-methylated in one tissue type and discordantly methylated in others, suggesting tissue specificity of methylation patterns at the adjacent CpG loci.

Developmentally close cells and tissues display similar discordant methylation patterns at a subset of adjacent CpG sites

Measurement of methylation levels with microarray technology always represents average methylation level of all cells in the sample, and this limits analyses of cell specific methylation patterns (as reviewed in: [35]). Therefore, to elaborate the significance of changes of methylation at adjacent CpG loci of specific cell type, we analyzed EPIC microarrays data obtained for six fractions of sorted white blood cells, including granulocytes, monocytes, B cells, CD4 + T cells, CD8 + T cells, and natural killer (NK) [10, 22].

We measured the methylation difference between base and consecutive dinucleotide within adjacent CpG loci using regression model and considered an adjacent CpG loci as discordantly methylated when that difference was more than 30 percentage points (pp) (|slope|≥ 0.3 and FDR-corrected p value ≤ 0.05, t-test). With technical limitations of the BeadChip array technology in mind, that methylation difference is likely to reflect presence of methylation at one of the alleles of the adjacent CpG loci in the cell.

This analysis identified different numbers of discordantly methylated adjacent CpG loci in each of the cell types, specifically 2049 in granulocytes, 1905 in monocytes, 1909 in B cells, 2066 in CD4 + T cells, 1986 in CD8 + T-cells, and 1927 in NK cells (results of the analysis for each cell type are shown in Supplementary Tables 1–6). Unsupervised clustering, based on delta beta-values of all identified adjacent loci showed a very specific clustering of the samples by cell type, with the majority of the loci displaying changes of methylation status between discordantly methylated and co-methylated in different cell types (Fig. 2a). Only CD4 + T cells and CD8 + T cells mixed together, but those cells are developmentally very close.

Fig. 2figure 2

Methylation patterns at adjacent CpG loci with discordant methylation status in six types of sorted white blood cells. a Unsupervised hierarchical clustering of white blood cells (WBC) based on delta beta-values of CpG pairs identified in different cell types. b Pairwise comparison of identified loci between each two types of WBC. Boxes represent the Jaccard similarity index between the cell types

We then performed pairwise comparison of adjacent CpG loci displaying discordant methylation status, between types of blood cells included in our analysis. The results (Fig. 2b) showed, that CD4 + T cells and CD8 + T cells, together with granulocytes and monocytes, which are the most developmentally close cell lines in our analysis, had the largest number of common discordantly methylated adjacent CpG loci, 0.81 and 0.71, respectively. Whereas, for example, granulocytes and CD8 + T cells, together with monocytes and CD8 + T, which represent the most developmentally distant cells in our analysis, had the lowest number of common adjacent CpG loci with discordant methylation status, 0.40 and 0.41, respectively (Fig. 2b). These results suggested the function of discordant methylation at adjacent CpG loci in cell specialization.

We also compared the number of adjacent CpG loci with discordant methylation status between thyroid, bone marrow, colon, breast, skeletal muscle and whole blood (Supplementary Fig. 1). Again, the number of common adjacent CpG loci was the highest for developmentally close tissues, such as blood and bone marrow (0.71) and the lowest for developmentally distant tissues such as breast and bone marrow (0.21).

A subset of adjacent CpG loci displays highly heterogeneous methylation patterns

Next, we went on to analyze methylation patterns of base and consecutive CpG sites within adjacent CpG loci. Not surprisingly and in line with the definition of CpG island, the base and the consecutive CpG site at co-methylated adjacent CpG loci (n = 40 404) (|slope|≤ 0.05 and FDR-corrected p value ≤ 0.05, t-test test) were predominantly either methylated or not methylated in all analyzed cell types (Fig. 3a, b). Several of the co-methylated CpG loci in this analysis (n = 81; 0.2%), displayed about 50% (beta-value ≤ 0.6 and ≥ 0.4) methylation levels in both CpG sites (Fig. 3c). It is important to mention here that, this level of methylation recorded with BeadChip array can be attributed to the uniform (50–50) mixture of cells sub-populations, with both alleles methylated and not methylated in those cells, or one allele methylated and other non-methylated within one cell (Fig. 3c, right panel). However, because in our analysis, we used data from relatively pure cell populations, and moreover, we did not observe high levels of methylation measurement variance at analyzed adjacent CpG loci, it is most likely that these methylation patterns are attributed to methylation of both CpG sites, on only one of the alleles, possibly consequence of the phenomenon similar to genomic imprinting [36]. Also, a subset of 308 of the co-methylated loci in this analysis, displayed more than 30 percentage points (pp) of methylation level difference in at least one of the analyzed cell types (Fig. 3d). These loci may mark genes monoallelically expressed in different cell types [37] and potentially regulate allele specific binding of proteins such as Methyl-CpG-Binding Domain proteins [38].

Fig. 3figure 3

Types of methylation patterns within adjacent CpG loci displaying identical methylation (co-methylation) at base and consecutive CpG site of adjacent CpG loci. Plots (left panels) illustrate the association of methylation levels between base and consecutive CpG site, with the solid line indicating no methylation level difference and dashed lines ± 0.15 confidence interval, and graphical illustration (right panels) of those methylation levels at single cell level, a loci with base and consecutive CpG sites methylated, b loci with both base and consecutive CpG site not methylated, c loci with 50% methylation level at both base and consecutive CpG site, and d loci displaying different types of co-methylation patterns between the cell types, with 50% methylation level in granulocytes and monocytes, as well as 100% methylation levels in B cells, NK, CD4T and CD8T

Then, we analyzed methylation patterns within adjacent CpG loci with discordantly methylated base and consecutive CpG sites (n = 1890; |slope|≥ 0.3 and FDR-corrected p value ≤ 0.05, t-test) in at least one of the analyzed cells. About half of those loci (n = 850) displayed identical methylation difference between base and consecutive CpG site in all analyzed cell types, and the remaining loci (n = 1040) were discordantly methylated in at least one of the analyzed cells. Again, considering that in general, the methylation level measurements in our analysis displayed a very low level of variance between different cell types, the methylation levels observed at this subset of adjacent CpG loci, may reflect one of the eight variants of simultaneous co-methylation of one and discordant methylation of the other allele illustrated in (Fig. 4a). Interestingly, in this subset of adjacent CpG loci, there were a few loci with almost 100 pp methylation difference between base and consecutive CpG site (five in B cell, five in CD4, five in CD8, 13 in Gran, 13 in Mono, and nine in NK; with |slope|≥ 0.80, FDR < 0.05) (Fig. 4b). This confirms that base and consecutive CpG site at adjacent CpG loci can acquire a different methylation status at each of the alleles in a cell because observed at those loci methylation levels are unlikely to be explained by the mixture of cell subpopulations. Also, a number of discordantly methylated loci had different methylation patterns in specific cell types (Fig. 4c). Overall, these results illustrate the vast heterogeneity of methylation patterns at adjacent CpG sites and remarkable stability of one of those patterns in the specific cell type.

Fig. 4figure 4

Types of methylation patterns within adjacent CpG loci with discordantly methylated base and consecutive CpG site. ac Plots (left panels) illustrate the association of methylation levels between base and consecutive CpG site, with the solid line indicating no methylation level difference and dashed lines ± 0.15 confidence interval, as well as graphical illustration (right) of those methylation levels at single cell level, in a loci with about 50 pp methylation level difference between base and consecutive CpG site in all analyzed cell types, b loci with almost 100 pp methylation level difference between base and consecutive CpG site in all analyzed cell types, and c loci with discordant methylation patterns specific for the cell type

Different patterns of methylation at adjacent CpG loci mark specific regulatory regions of the genome

To understand if discordant methylation patterns are associated with particular biological functions, we subdivided analyzed adjacent CpG loci into four subsets: “common co-methylated loci” (n = 40,404) (Fig. 5a), displaying similar patterns of co-methylation in all cell types included in our analysis; “co-methylated cell specific loci” (n = 308) (Fig. 5b), displaying different co-methylation patterns between the cell types (these loci are in principle differentially methylated regions—DMRs, if we define that region as a region with two consecutive CpG sites with the identical methylation status); “common discordantly methylated loci” (n = 850) (Fig. 5c), displaying significant difference in methylation level between base and consecutive CpG site in all analyzed cell types; and “discordantly methylated cell specific loci” (n = 1040) (Fig. 5d), displaying more than 0.3 difference in delta beta-value between at least two types of blood cell types included in our analysis.

Fig. 5figure 5

Types of adjacent CpG loci in six types of sorted white blood cells. Results of unsupervised hierarchical clustering of six types of sorted white blood cells, based on delta beta-values at four subsets of discordantly methylated adjacent loci, including, a “common co-methylated”, b “co-methylated cell specific”, c “common discordantly methylated”, and d “discordantly methylated cell specific” adjacent CpG loci

Each type of methylation patterns that we identified  could potentially play different role in cell physiology. Therefore, we analyzed association of different  subsets of adjacent CpG loci with specific histone marks and regulatory regions of the genome, as well as performed gene set enrichment analysis (GSEA) of genes annotated to each of the categories of the loci. We also assessed enrichment of each type of adjacent CpG sites within transcription factors (TFs) binding sites.

The association of adjacent CpG loci displaying different patterns of methylation with histone marks in 11 types of cells including hematopoietic stem cells, from Roadmap Epigenomics Core 15-state Model [32], was analysed using Locus Overlap Analysis (LOLA) [31] (Fig. 6a–d). All CpG spaced in up to 50 bp targeted by EPIC microarray were used as background in these analyses. The results showed significant enrichment (FDR corrected p-value ≤ 0.05, Fisher exact test; odds ratio (OR) > 2) of “common co-methylated” loci in most of the analyzed cell types in regions marked with histones containing modifications associated with active transcription states (TssA) and bivalent regulatory states (TssBiv) (Fig. 6a), not surprisingly indicating that identical methylation status of the adjacent CpG sites within CGI is essential for the regulatory function of those genomic regions. The “co-methylated cell specific” adjacent CpG loci, were associated with histone marks, characteristic for transcribed state at the 5′ and 3′ end of genes that show both promoter and enhancer signatures (TxFlnk), and two of the enhancer related histone marks: enhancers (Enh), and genic enhancers (EnhG) (Fig. 6b). The “common discordantly methylated” loci were associated with histones occupying three repressive chromatin states including constitutive heterochromatin (Het) and repressed Polycomb states (ReprPC, ReprPCWk), as well as one of the enhancers related states (Enh) (Fig. 6c). The “discordantly methylated cell specific” loci, which displayed most heterogeneous methylation between the cell types showed significant enrichment in the same regions as “co-methylated cell specific” loci (Fig. 6d).

Fig. 6figure 6

Illustration of the genomic distribution of “common co-methylated”, “co-methylated cell specific”, “common discordantly methylated”, and “discordantly methylated cell specific” loci. ad LOLA-based, region set enrichment analysis of the regions with a “common co-methylated”, b “co-methylated cell specific”, c “common discordantly methylated”, and d “discordantly methylated cell specific” adjacent loci analyzed loci in specific WBC types, using Core 15-state model. e Results of enrichment analysis of each type of adjacent CpG loci  in relation to CpG island elements . f Venn diagram illustrating the overlap between transcription factor binding motifs identified for different subsets of adjacent CpG loci. 

Also using LOLA platform, we analyzed distribution of adjacent CpG loci in relation to CGI. As expected, “common co-methylated” adjacent CpG loci were significantly enriched in CGIs (OR = 4.85). The “co-methylated cell specific” in S-Shelf (OR = 2.65) and OpenSea (OR = 3.67). The “common discordantly methylated” adjacent CpG loci were enriched in N-Shore (OR = 2.19), and S-Shore (OR = 2.17). The “discordantly methylated cell specific” loci were enriched in N-Shelf (OR = 2.46), S-Shelf (OR = 2.14) and OpenSea (OR = 3.82) (Fig. 6e). This indicates  that discordantly methylated adjacent CpG sites are not only an attribute of bordering regions of CGIs. Overall, these results show  that adjacent CpG sites with most dynamic changes of DNA methylation levels in blood cells mark gene enhancers, and are located in OpenSea compartment of the chromatin.

The GSEA were based on genes annotated to each subset of the adjacent CpG loci according to relevant microarray  manifest and performed using GENE2FUNC function of FUMA GWAS (Functional Mapping and Annotation of Genome-Wide Association Studies) [39] with Molecular Signatures Database (MSigDB) [40, 41] used as reference. This analysis showed, that genes annotated to “common co-methylated cell specific” and “discordantly methylated cell specific” adjacent CpG loci displaying cell type specific methylation, were enriched in GO biological categories related to specialized functions of blood cells, such as cell activation, defense response, T-cell activation or adaptive immune system (Supplementary Table 7, 8). At the same time, we did not observe significant enrichment of genes annotated to “common co-methylated” and “common discordantly methylated” loci in any of the interrogated gene set categories. However, this is not surprising because those loci predominantly constitute CGIs and majority of human genes contains CGIs in promoters. Overall, these results suggests that adjacent CpG loci displaying heterogenous methylation between specialized cells, mark genes involved in specific cellular function, and patterns of methylation of those loci may reflect regulation state of those genes in specific cells. Contrary to the genes marked by “common co-methylated” and “common discordantly methylated” loci, which appear to mark genes involved in general cell physiology, expression of which is similar in all analyzed cells.

Finally, we used HOMER platform to assess enrichment of specific categories of adjacent CpG loci within transcription factors binding sites [33]. This analysis showed that “common co-methylated” loci are significantly enriched (q-value ≤ 0.05, hypergeometric test) in regions with 113 TF binding motifs (Supplementary Table 9), what is consistent with the results showing that those loci constitute CGIs that annotate to genes involved in general cell physiology. None of the specific TF binding site motifs was found to be enriched in regions with “common discordantly methylated” loci, but chromatin state association analysis showed that this type of adjacent CpG loci, are unlike any of the other analyzed loci, associated with repressed regions of chromatin. The TF binding motifs enriched in regions harboring “co-methylated cell specific” (n = 10, Supplementary Table 10) and “discordantly methylated cell specific” (n = 48, Supplementary Table 11) adjacent CpG loci were very similar and predominantly different from TF binding motifs annotated to “common co-methylated” adjacent CpG loci (Fig. 6f and Supplementary Table 12). The analysis of function of TFs binding motifs annotated to “co-methylated cell specific” and “discordantly methylated cell specific” loci, showed that these TFs are involved hematopoietic cells development and differentiation. This analysis has a limited power but suggests that adjacent CpG loci displaying cell type specific methylation patterns mark TF binding motifs that bind transcription factors involved in specialized cell functions. That  corroborates the results indicating that those loci also annotate to genes involved in cell function specialization.

Methylation at adjacent CpG loci changes during neoplastic transformation

So far, locally disordered methylation at CpG sites has been reported and considered a stochastic event in cancer development [15]. We lastly analyzed whether this phenomenon affects methylation patterns at the adjacent CpG loci that display stable discordant methylation in all WBC types (Fig. 5c) during carcinogenesis of acute myeloid leukemia (AML) [23] and chronic lymphocytic leukemia (CLL) [24]. We found that the methylation patterns at those loci undergo general deregulation that appears to be stochastic and involves majority of those loci (Fig. 7a and b). Interestingly however, we did identify loci that appeared to maintained methylation pattern observed in healthy cells throughout malignant transformation (Fig. 7c), as well as loci that changed methylation status specifically in one neoplasia (Fig. 7d).

Fig. 7figure 7

Comparison of methylation patterns at adjacent CpG loci between healthy blood, acute myeloid leukemia (AML), and chronic lymphocytic leukemia (CLL). a Unsupervised hierarchical clustering of healthy blood, AML and CLL based on delta beta-values of CpG sites within “common discordantly methylated” loci (Supplementary Table 13). bd Association of methylation levels at base and consecutive CpG site, with the solid line indicating no methylation level difference and dashed lines - ± 0.15 confidence interval, in b loci non co-methylated in whole blood, but displaying stochastic changes of the pattern in CLL and AML samples, c loci discordantly methylated in all sample types, and d loci discordantly methylated in whole blood, but co-methylated (at the level of 0%) in CLL, and stochastically changed in AML samples. e Comparison of standard deviation of delta beta-values of CpG sites within analyzed loci, between healthy blood AML, and CLL

We also compared overall standard deviation (SD) of the beta-values at discordantly methylated CpG sites between CLL, AML and healthy blood. The violin plots in Fig. 7e show that the SD of the delta beta-values in both CLL (0.215 (95% CI 0.211–0.220)), and AML (0.150 (95% CI 0.147–0.153)) is significantly increased as opposed to the rather stable SD in healthy blood (0.058 (95% CI 0.057–0.060). Moreover, the average level of the standard variation was statistically significantly different (FDR corrected p-value ≤ 0.05, Mann–Whitney U test) between all compared groups.

Validation of discordant methylation phenomenon using NGS and Sanger sequencing

We validated our findings using currently available whole-genome bisulfite sequencing (WGBS) data obtained for the pools of FACS sorted cells [14] and Sanger sequencing. Identical to analysis based on EPIC microarray data, using NGS data we were able to unambiguously classify each type of blood type cells (Supplementary Fig. 2).

Sanger sequencing of representative adjacent CpG loci that we found to be discordantly methylated in whole blood using EPIC data (Fig. 1a) also unambiguously confirmed the discordant methylation patterns of those loci in each of the 24 whole blood samples that we used in the validation experiment. The representative Sanger sequencing chromatograms from this analysis are shown in Supplementary Fig. 3a–c, where for example first chromatogram (Supplementary Fig. 3a) depicts the adjacent CpG loci (marked as 1 and 2), in which thymine originating from not methylated cytosine is present at cytosine within base CpG site, while cytosine that must have been methylated before bisulfite modification is present within consecutive CpG site.

留言 (0)

沒有登入
gif