Genome-wide DNA methylation in relation to ARID1A deficiency in ovarian clear cell carcinoma

Whole genome methylation signatures of OCCC tumors and cell lines

Genetic mutations of all samples and clinical characters of primary tumor used in this study are provided in Supplementary Table 1 and 2. Out of the ~ 850,000 methylation CpG probes in the Infinium MethylationEPIC BeadChip arrays, 692,994 CpGs remained after quality control using multi-step filtration. Based on the β-values of these 692,994 CpGs, hierarchical clustering analysis of genome-wide methylation of OCCC primary tumors (Fig. 1A) and cell lines (Supplementary Fig. 3A) was performed. ARID1A/PIK3CA mutations (Fisher exact test, p = 0.01), tumor stage (Fisher exact test, p = 0.02) and TP53 mutations (Fisher exact test, p < 0.001) were significantly related to the methylation-based clustering of OCCC primary tumors. Tumor stage was not related to mutational status. Furthermore, based on multiple linear regression analysis, there was a trend for primary OCCC tumors with more mutations in either ARID1A or PIK3CA (Estimate = 0.19, p = 0.054) and a lower tumor stage (Estimate = −1.95, p = 0.08) in methylation cluster 1. OCCC tumors with mutant TP53 were significantly enriched in cluster 2 (Estimate = −0.74, p = 1.67e-06). Methylation-based clustering of OCCC cell lines revealed a separation between ARID1Amt and ARID1Awt cell lines (Fisher exact test, p = 0.04). Hierarchical clustering of genome-wide methylation of both OCCC primary tumors and cell lines (Supplementary Fig. 3B) showed that tumor methylomes were more similar to each other than to cell lines, and vice versa. Additionally, ES2ARID1A−/− and OVCA429ARID1A−/− (the isogenic ARID1Ako cell lines) did not cluster with the ARID1Amt cell lines, instead they were strongest associated with their parental ES2 and OVCA429 cell lines, respectively.

Fig. 1figure 1

Methylation of OCCC tumors and cell lines. A Unsupervised two-dimensional hierarchical clustering of OCCC tumors based on β-values of 692,994 CpGs. The clinical data and genetic mutations of OCCC tumors are indicated. B Density distribution of methylation β-values of whole genome (up, 692,994 CpGs) and promoter and gene-body CGIs (down) are measured in ARID1Amt/ko vs ARID1Awt OCCC. Solid lines indicate the mean β-values, while dashed lines indicate the mean ± standard deviation of β-values

The distribution of genome-wide DNA methylation (β-value) of both tumors and cell lines showed a clear bi-modal distribution, as shown in Fig. 1B. A shift in distribution was observed when comparing ARID1A deficient (ARID1Amt/ko) and ARID1Awt OCCC cell lines, which was most evident at the highly methylated sites (0.8 < β ≤ 1) and low methylated sites (0 < β ≤ 0.2). The shift in β-value distribution at the highly methylated sites (0.8 < β ≤ 1) was also observed when only promoter and gene-body CGIs were analyzed. No differences in β-value distribution of global and CGI methylation were observed between ARID1Amt and ARID1Awt primary tumors.

To gain more detailed insight in DNA methylation per CpG, we performed one to one comparison of 692,994 CpGs in ARID1Amt/ko OCCC vs ARID1Awt OCCC (Supplementary Fig. 4). CpGs showing methylation changes in ARID1Amt OCCC vs ARID1Awt OCCCs were equally distributed over the chromosomes. In total, methylation of ~ 10% of the CpGs in ARID1Amt primary tumors, ~ 40% of the CpGs in ARID1Amt cell lines and ~ 20% of the CpGs in ARID1Ako models increased or decreased more than 0.1 β-value compared to matched CpGs in ARID1Awt OCCC. In addition, methylation of ~ 5% of the CpGs in ARID1Amt cell lines and ~ 1% of the CpGs in OVCA429 ARID1Ako increased or decreased more than 0.4 β-value. There were almost no CpGs (less than 1%) that showed a change in methylation of more than 0.4 β-value in ARID1Amt primary tumors and ES2 ARID1Ako.

Thus, ARID1A deficiency was related to methylation changes in global CpGs and in promoter and gene body CpGs and was most evident in OCCC cell lines.

Identification of differential methylated (DM) CpGs in promotor and gene body between ARID1Amt/ko and ARID1Awt OCCC

To detect methylation changes that are associated with the ARID1A mutational status, DM CpGs identified in 4 sample sets (“primary tumor”, “cell lines”, “ES2 vs ES2ARID1A−/−”, “OVCA429 vs OVCA429ARID1A−/−”) were compared (Fig. 2A, Supplementary Table 3). Only DM CpGs that were commonly identified in at least two out of four sample sets were selected, comprising 39,859 unambiguous DM CpGs (Fig. 2B). After annotation of these DM CpGs to the human genome, we found that 3627 DM CpGs were located in promoter or gene-body CGIs of 2004 genes (Table 1).

Fig. 2figure 2

Identification of DM CpGs between ARID1Amt/ko vs ARID1Awt OCCC. A Volcano plot showing the identified DM CpGs in OCCC tumor, cell lines and isogenic ARID1Ako models. Scattered dots represent CpGs. The x-axis is the methylation log2fold change based on M-value, whereas the y-axis is -log10 transformed significance p-value of differential methylation obtained from” limma” method. Dots are colored based on the cut-offs they satisfy. The top altered CpGs based on M-value log2fold change were specified. The names of the target genes of specified CpGs are adjacent to the corresponding CpGs. B Venn diagram showing the number and corresponding percentage of DM CpGs in the 4 sample sets (OCCC tumors, cell lines and isogenic ARID1Ako models). Common DM CpGs which identified in at least 2 out of 4 sample sets are marked by the yellow line

Table 1 Identified common DM CpGs located in gene promoter or gene-body CGI

Next, the M-E Spearman coefficients (indicated in green and red) were calculated for the 2004 genes using the β-value for each of the 3627 DM CpGs located in gene promoter or gene-body CGIs, and the expression level of their respective target genes using data from 11 ARID1Amt/wt cell lines. As shown in Fig. 3A, an inverse relation between fold changes in DM CpGs and fold changes in expression of their target genes comparing ARID1Amt and ARID1Awt cell lines, was often observed (hypermethylated-downregulated and hypomethylated-upregulated). A positive relation between the fold changes of DM CpGs and fold changes in expression of their target genes comparing ARID1Amt and ARID1Awt cell lines was also observed (hypermethylated-upregulated and hypomethylated-downregulated). Genes with the largest promoter methylation and expression changes between ARID1Amt vs ARID1Awt cell lines were depicted, such as PPP1R14A and UQCRH.

Fig. 3figure 3

Methylation of DM CpGs and expression of the target genes based on OCCC cell lines. A The x-axis is the methylation log2FC of DM CpGs. The y-axis is expression log2FC of their corresponding target genes. The shape of each CpG point is based on the transcriptional regulatory element it is located in, whereas the color of each CpG dot is based on the Spearman coefficient between its M-value and expression of its target gene (M-E Spearman coefficient). The enlarged points or triangles specify the CpGs that satisfy all of the 3 conditions: (1) methylation log2FC of the CpG is > 2; (2) expression log2FC of the target genes is > 2; (3) the absolute M-E Spearman coefficients between their methylation and expression of their target genes is > 0.75. The names of CpGs and their target genes are adjacent to the corresponding enlarged points. B The methylation changes of DM CpGs and the expression alteration of their target genes. DM CpGs and their target genes are divided based on the pattern of their methylation and expression alteration between ARID1Amt cell lines vs ARID1Awt OCCC cell lines: hypermethylated-upregulated (orange), hypomethylated-downregulated (yellow), hypermethylated-downregulated (light green) and hypomethylated-upregulated (green) genes. The percentage of each group in promoter, alternative promoter and gene-body DM CpGs are labeled on the corresponding bar

The methylation and expression alterations of the genes with absolute M-E Spearman coefficients ≥ 0.25 in association with the genomic location of the DM CpGs were summarized in Fig. 3B. In total, around 65% of the gene promoter DM genes were found to have hypermethylated-downregulated (light green) and hypomethylated-upregulated (green) patterns, indicating that majority of changes in promoter methylation in ARID1Amt cells were inversely related to the changes in expression of target genes. Around 60% of gene-body DM CpGs were found to be hypermethylated-upregulated (orange) and hypomethylated-downregulated (yellow), indicating that changes in methylation of most gene bodies were positively correlated to changes in target gene expression in ARID1Amt cells.

Taken together, ARID1A-dependent changes in promoter methylation correlated negatively with gene expression, while ARID1A-dependent changes in gene-body methylation were mostly positively correlated with gene expression, which is in line with the classical theory how DNA methylation regulates gene expression [32].

EZH2 related gene-set was enriched in ARID1A-related DM genes

We further investigated which genes and related pathways were mostly affected by ARID1Amt related methylation. The effects of ARID1Amt-related methylation on 2004 genes (DM in promoter or gene-body) were evaluated in data obtained from OCCC cell lines using 3 parameters: alterations on the methylation level, alterations on the expression level, and the correlation between DNA methylation and expression. According to the pre-ranked GSEA based on each of the 3 parameters, in total 202 significantly enriched gene-sets were identified (FDR ≤ 0.25, |NES|≥ 2, Supplementary Table 5). Noticeably, “LU EZH2 TARGETS UP” was the only commonly identified gene-set by all 3 pre-ranked GSEA methods (Fig. 4A, Supplementary Table 6). Its negative methylation NES (−2.00), positive expression NES (2.21) and negative M-E spearman coefficient NES (−2.34) followed the classic pattern of DNA methylation regulated gene expression. These results suggest that the methylation and expression of leading-edge genes of this EZH2 related gene-set (indicated by orange arrows) are strongly depending on the ARID1A status in OCCC cells (Fig. 4B). These findings are in line with previous studies, demonstrating the importance of EZH2 in ARID1Amt OCCC cells [26], and gives confidence to the approach taken here.

Fig. 4figure 4

Significant gene-sets identified by the 3 different pre-ranked GSEA analyses. A Venn plot of significant gene-sets derived from methylation log2FC based pre-ranked GSEA, expression log2FC based pre-ranked GSEA and M-E Spearman coefficient based pre-ranked GSEA. B Enrichment curve of “LU EZH2 TARGETS UP” derived from methylation log2FC based pre-ranked GSEA, expression log2FC based pre-ranked GSEA and M-E Spearman coefficient based pre-ranked GSEA. The corresponding normalized enrichment scores (NES) and FDRs of “LU EZH2 TARGETS UP” are annotated. Orange boxes and arrows marked the leading-edge genes of “LU EZH2 TARGETS UP” derived from methylation log2FC based pre-ranked GSEA, expression log2FC based pre-ranked GSEA and M-E Spearman coefficient based pre-ranked GSEA. The corresponding locations of leading-edge genes inside the enrichment curve were indicated by the color bar next to them. Leading-edge genes of “LU EZH2 TARGETS UP” commonly identified by all three pre-ranked GSEA were underlined

Dependency analysis of ARID1A-related DM genes

To identify DM genes that play a central role in various pathways, so-called hubs, we selected the leading-edge genes of significant gene-sets for each pre-ranked GSEA. We found 238 leading-edge genes that were common between the three used GSEA methods (Fig. 5A, Supplementary Table 6 and Supplementary Table 7). The average dependency score for 234 out of 238 genes in ARID1Amt and ARID1Awt OCCC cell lines were calculated using the DepMap dataset (Supplementary Table 8, 4 genes were not present in the OCCC samples from DepMap).

Fig. 5figure 5

ARID1A dependency and expression alteration of leading-edge genes. A Venn plot of leading-edge genes derived from methylation log2FC based pre-ranked GSEA, expression log2FC based pre-ranked GSEA and M-E Spearman coefficient based pre-ranked GSEA. B Dependency and expression alteration of leading-edge genes with consistent differential methylation in ARID1Amt OCCC tumor and cell lines. Leading-edge genes with differential methylation in both OCCC tumor and cell lines are shown in the columns of the heatmap, while the corresponding enriched gene-sets that shared more than 2 DM leading-edge genes are shown in the rows of the heatmap. The distance between every two components in columns (genes) or rows (gene-sets) was calculated based on Spearman coefficients. “ward.D2” method was used to cluster columns (genes) and rows (gene-sets) of the heatmap according to the corresponding Spearman coefficients. A black-colored cell depicts whether a certain gene is presented in a given gene-set. Methylation log2FCs of the DM CpG are indicated in the first panel of row annotations; Average dependency of corresponding leading-edge genes in ARID1Amt cell lines and ARID1Awt cell lines are indicated in the second panel of row annotations; Expression log2FC (9 cell lines) of leading-edge genes from GEO and expression log2FC from CCLE (12 cell lines) are indicated in the third panel of row annotations. The most significant FDR value of each enriched gene-set from three pre-ranked GSEA is indicated in the column annotation

In total, 24 ARID1A-related DM genes showed consistent ARID1A-related methylation alterations in primary tumor and cell lines and were present in 42 gene-sets. Considering that related genes may function in shared pathways and vice versa, the 24 genes and 42 gene-sets were further clustered into modules (Fig. 5B). R1–R7 and C1–C4 were used as horizonal and vertical coordinates of a certain module in the clustering heatmap. Noticeable, the most altered genes depicted in Fig. 3 were excluded from the analysis, since they were not commonly identified by three separate pre-ranked GSEA analyses. The leading-edge genes of “LU EZH2 TARGETS UP” (Fig. 4B), except TRIP6, did not pass the expression-based in-silico validation, because of inconsistent methylation changes between primary tumor and cell lines (Supplementary Table 9).

Possible functions of genes presented in modules were revealed. Tumor suppressor gene IRX1 was present in 3 EZH2 and H3K27me3 related gene-sets, indicating the possible association between IRX1 and EZH2 in OCCC. Additionally, members of module R7-C4 (TRIP6, TMEM101 and BCOR) shown in “HATADA METHYLATED IN LUNG CANCER UP” were detected as promoter hypermethylated in ARID1A deficient OCCC tumor and cell lines. Moreover, compared to ARID1Awt cell lines, TRIP6 and TMEM101 showed relatively low expression in ARID1Amt cell lines. For BCOR, methylation and expression were generally inversely correlated, except in TOV21G and KOC7C cells (Additional File 1). Interestingly, BCOR was also present in 2 TP53 related gene-sets ("PEREZ TP53 TARGETS" and "PEREZ TP53 AND TP63 TARGETS") of module R3-C4, suggesting a role of BCOR in p53 signaling as well.

Validation of ARID1A-related DM genes

At last, to have an in-depth view on the methylation status of the ARID1A-related DM genes, the methylation of the promoter or gene-body CGIs of these 24 genes in primary OCCC and cell line sample sets were visualized using UCSC genome browser online tool. Based on the UCSC visualization, 13 from these 24 genes showed a relatively high CpG level consistency between primary OCCC and cell line sample (Table 2, Supplementary Table 10). Therefore, these 13 DM genes were selected for further validation via BSP and/or RT-qPCR. For 7 genes (AK5, CBLN1, ESRRG, MYD88, NDN, PCDH8 and TCEAL3), we encountered difficulties to design and optimize BSP assays due to the extreme large size and high CpG density of the region of interest. For the other 6 genes (BCOR, IRX1, PCDHA1, TMEM101, TRIP6 and ZIK1), BSP was performed.

Table 2 Methylation, expression change, M-E Spearman coefficients and gene dependency of 13 ARID1A-related DM genes

The location of the BSP products for all 6 gene promoters were visualized using the UCSC genome browser together with the tested CpG probes from Infinium MethylationEPIC BeadChip arrays, presence of CGIs and signals of the histone marks H3K27me3, H3K27Ac, H3K4me3 and H3K4me1 (Fig. 6A, Supplementary Fig. 5A, 6A, 7A, 8A and 9A). The presence of the CGIs and the histone marks suggests that these genes are indeed epigenetically regulated. The significant Spearman correlation between the BSP semi-quantification and the Infinium MethylationEPIC BeadChip array demonstrated successful validation of most genes, whereas PCDHA1 showed a trend (Fig. 6B, Supplementary Fig. 5B, 6B, 7B, 8B and 9B). For IRX1, TRIP6, TMEM101, and ZIK1 differential promoter methylation was also shown in ARID1Amt/ko vs wt by BSP analysis (Fig. 6C, Supplementary Fig. 5C, 6C, 8C).

Fig. 6figure 6

DNA methylation and gene expression of IRX1 in ARID1A deficient OCCCs vs ARID1Awt OCCCs. A DNA methylation of IRX1 promoter in OCCC cell lines. UCSC genome browser (GRCh37/hg19) representation of the genomic organization of IRX1. The thick solid blocks indicate the coding regions, the thinner blocks indicate the 5' and 3'UTRs, blue lines indicate introns and arrows indicate the direction of gene transcription. The CGIs are represented as horizontal green bars. H3K27me3 (green), H3K27Ac (blue), H3K4me3 (black), H3K4me1 (orange) data from ENCODE project depict histone modification status as peaks. CpGs gaining methylation (red), losing methylation (blue), insignificant (gray) in ARID1A deficient vs ARID1Awt OCCC are represented as horizontal solid bars. BSP PCR product is indicated by solid boxes (primers) and green line (analyzed sequence). The BSP-analyzed region shaded in light blue is presented below with CpG located in the BSP-analyzed region depicted as yellow bars. The labeled and green CpGs are mutually analyzed by Infinium MethylationEPIC BeadChip arrays and BSP. B IRX1 BSP methylation ratio vs average β-value from Infinium MethylationEPIC BeadChip array in ARID1Amt (pink) and ARID1Awt (green) OCCC cells. The black solid line represents the regression line. C BSP result of IRX1 in OCCC cells. CpG sites located in the BSP-analyzed region are numbered and showed. CpG mutually analyzed by Infinium MethylationEPIC BeadChip arrays and BSP are specified with green color. Empty circles represent unmethylated CpGs, black circles represent methylated CpGs, half black circles represent hemi-methylated CpGs and empty triangles represent missed CpGs. ARID1A deficient cells are underlined. D IRX1 relative gene expression based on RT-qPCR vs publicly available expression profiles of ARID1Amt (pink) and ARID1Awt (green) OCCC cells. The black solid line represents the regression line. E RT-qPCR result of IRX1 in OCCC cells with (red) or without (blue) DAC treatment. ARID1A deficient cells are underlined. Statistical significance of Student T-test is notified as *p < 0.05; **p < 0.01; ***p < 0. 001

In order to test whether genes were indeed epigenetically silenced by methylation in ARID1Amt vs wt OCCC, we analyzed mRNA expression of IRX1, TRIP6, TMEM101 and BCOR by RT-qPCR. Expression of TRIP6 based on publicly available expression arrays could be validated by RT-qPCR (Supplementary Fig. 5D). The expression of IRX1, TMEM101 and BCOR as determined with RT-qPCR was not significantly correlated to their expression based on publicly available expression arrays (Fig. 6D, Supplementary Fig. 6D and 7D), possibly due to the limited number of cell lines included in the expression arrays. Although IRX1 expression was low in all cell lines, induction of expression by 5-aza-2′-deoxycytidine (DAC treatment) was especially observed in the ARID1Amt and ARID1Ako cell lines, consistent with the higher percentage of hemi or full methylated CpGs of the IRX1 promoter in these cell lines compared to ARID1Awt cell lines (Fig. 6E). Our results thus indicate ARID1A status dependent epigenetic regulation of IRX1. High TRIP6 and TMEM101 promoter methylation was demonstrated in all three ARID1Amt cell lines and one of the ARID1Awt cell lines (RMG1). Low TRIP6 expression and induction of expression after DAC treatment was, however, only found in ARID1Amt TUOC1 cells (Supplementary Fig. 5E). Interestingly, low TMEM101 expression and induction of the expression after DAC treatment was observed in another cell line (TOV21G) (Supplementary Fig. 6E). The clearly detectable BCOR mRNA levels were not elevated in any cell line by DAC treatment (Supplementary Fig. 7E).

Taken together, the results from the MethylationEPIC BeadChip arrays were successfully validated. Our observations indicate that frequent hypermethylation and epigenetic regulation of IRX1 expression occur especially in ARID1Amt and ARID1Ako OCCC cell lines.

留言 (0)

沒有登入
gif