A novel bioinformatic approach reveals cooperation between Cancer/Testis genes in basal-like breast tumors

A custom bioinformatic approach identifies the Cancer/Testis genes most associated with breast tumors

The first step of our study was to establish an exhaustive list of C/T genes, containing those described in three independent publications, for a total of 1350 genes [12, 29, 30]. Our second resource was genomics data, including RNA-seq, from The Cancer Genome Atlas (TCGA), covering 1090 tumor samples and 113 healthy juxta-tumoral mammary samples.

We then established a custom bioinformatic approach to identify C/T genes that show reactivation in breast tumors. An ideal biomarker should have little or no expression in healthy samples but high expression in at least some of the tumors: these properties are reflected mathematically in a zero-centered, single-mode density function in healthy breast samples, and a multi-mode density function with one or more non-zero maxima in tumor samples, reflecting one or more groups of tumors that have activated this gene. Such profiles can be detected automatically by examining changes in the derivative of the density function (Fig. 1A).

Fig. 1: A custom bioinformatic approach identifies the Cancer/Testis genes most associated with breast tumors.figure 1

A Schematic description of the bioinformatic pipeline. We depict the expression profile of a gene that passed the screen: it has a unimodal, zero-centered profile in normal tissue, and a multimodal profile in breast tumors. B Chow-Ruskey diagram showing the intersection between previously published C/T gene lists and the C/T genes that were selected for our study.

Implementing this idea, we created a two-step pipeline in which we first determined the distribution of expression for each C/T gene in both healthy mammary samples and breast tumors, then smoothed these distributions using kernel density estimation. As it is crucial to not overfit or oversmooth expression values, we systematically tested multiple values for the bandwidth parameter using positive and negative controls, (data not shown) and then selected a balanced value (bandwidth = 0.7). By analyzing the derivative of the distribution function, we obtained the number of distinct peaks, allowing us to focus on the C/T genes not expressed in healthy mammary samples (unimodal expression profile centered on 0 according to kernel density estimation), but activated in some breast tumor samples (multimodal expression profile).

Our method complements previously used approaches (for example: [30, 31]) in that it is orthogonal, less calculation-intensive, flexible, sensitive, and unaffected by the dynamic range of the data. Of note, this unbiased scheme is not restricted to C/T genes and could be broadly used to identify other genes that show abnormal expression in tumor samples compared to matched normal juxta-tumor tissues, such as potential tumor suppressor genes or oncogenes (Fig. S1A–C). With this approach, we defined a highly selective list of 139 C/T genes with abnormal expression profiles in breast tumors compared to the normal breast (Fig. 1B, Supplementary Table 1). The examination of GTEx RNA-seq data confirmed that these 139 genes are expressed in the human germline, but not in the breast (or other healthy tissues, Fig. S1D). Therefore, the reactivation seen in tumors is a pathological event.

Cancer/Testis gene expression accurately discriminates breast cancer subtypes; identification of the 6 most informative genes

To determine whether the expression of certain members of our 139-gene list was associated with specific subtypes of breast tumors, we applied Principal Component Analysis (PCA) on TCGA data using the subtype annotations provided for each tumor (Fig. 2A). A visual inspection suggested that tumor types could indeed be separated based on C/T gene expression (Fig. 2A), with a distinct group of basal-like tumors, for instance. These distinct clusters formed again when the tumors were classified based on their anatomohistological subtype rather than their transcriptome-defined subtype (Fig. S2A) and they remained visible when integrating more informative Principal Components through UMAP analysis (Figs. 2A and S2A). We thus hypothesized that the pattern of expression of the C/T genes in our list might suffice to stratify breast tumors by subtypes.

Fig. 2: Cancer/Testis gene expression accurately discriminates breast cancer subtypes; identification of the 6 most informative genes.figure 2

A Multidimensional analysis of TCGA breast tumor and healthy samples based on expression of the 139 selected C/T genes. Each dot represents a sample; the color code corresponds to breast cancer subtype. Left: Principal Component Analysis, dot sizes are proportional to the quality of representation in PC1/PC2 space. The C/T genes best correlated to PC1/PC2 are represented. Right: Uniform Manifold Approximation and Projection (UMAP). B Confusion matrix for breast tumor samples in the validation cohort (25% of the samples, randomly selected from the TCGA breast tumors), using the best Random Forest model. This model was established after a 500-tree training on the discovery cohort (75%), based on the expression level of the 139 C/T genes. C Top 15 most important variables in the best Random Forest model for breast cancer subtype prediction. The color of the gene name indicates the tumor type most associated. D Expression levels for 6 subtype-specific C/T genes in the breast TCGA cohort according to breast cancer subtype. E Relapse-free survival curves for ER+ Her2- or ER- Her2+ breast cancer patients according to LRGUK and DMRTC2 expression. The left two panels show survival curves for Luminal A and Luminal B tumors. The right panel shows the survival curve according to Her2-specific C/T gene DMRTC2. F Co-expression of HORMAD1 and CT83 based on RNA-seq analysis (log2 FPKM-UQ) in basal-like breast tumor samples from the TCGA. Thresholds for positive or negative expression are calculated based on the corresponding gene expression profile in tumors at the second inflexion point of the representative curve. The number of tumors belonging to each category are shown. Basal-like subculturing according to Lehmann’s classification is depicted. G Co-expression of HORMAD1 and CT83 based on RNA-seq analysis (FPKM-UQ) in basal-like breast cancer cell lines from the CCLE database. Same analysis as in F.

To test this hypothesis, we used a machine learning approach, establishing a random forest model on a training set of TCGA breast tumors (75% of all samples, n = 817) and testing the best model on the remaining tumors (n = 273). This model could very effectively identify basal tumors with high sensitivity (0.9) and high specificity (1.0), leading to a balanced accuracy nearing 100% (Fig. 2B). Again, similar results were found when the tumors were classified anatomopathologically, rather than transcriptionally (Fig. S2B). The specificity scores for Luminal B and Her2 subtypes were high (1.0 and 0.9, respectively), but the sensitivity was lower (0.4 and 0.2) (Fig. 2B). This discrepancy could be explained by some tumors in these groups not expressing any C/T genes, leading to a lack of available information for the prediction.

Using the best random forest model, we ranked the 139 C/T genes according to their predictive value; the top 15 C/T genes are depicted in Fig. 2C (and Fig. S2C for the analysis carried out with anatomopathological stratification). The two best predictors, HORMAD1 and CT83, are strongly associated with basal breast tumors: of the 190 basal-like breast tumors, 89% expressed either HORMAD1 or CT83, compared to only 13% of Her2-amplified, 6% of Luminal B, and 2% of Luminal A tumors (Fig. 2D). Using the histological classification of breast tumors, we found the same result: HORMAD1 and CT83 are the two best predictors of triple-negative breast cancers within C/T genes (Fig. S2B–D). These results are consistent with several previous reports that have associated HORMAD1 or CT83 expression with basal tumors [32,33,34,35,36], validating our approach. Analysis of an independent dataset [37] gave additional support to our findings, demonstrating that HORMAD1 and CT83 are the best predictors of triple-negative breast tumor subtype (Fig. S2 E, F). HORMAD1, a gene on human chromosome 1q21.3, is physiologically expressed by preleptotene spermatocytes [38] and regulates meiotic progression. CT83, on the other hand, is located on human chromosome region Xq23 and is expressed in mature sperm according to scRNA-seq data analysis [39], yet its precise reproductive function remains unknown.

The expression of two other markers, DMRTC2 and TDRD1, is associated with Her2-positive tumors (Fig. 2C, D), but the association is looser than that of HORMAD1/CT83 with basal tumors. Throughout spermatogenesis, DMRTC2 has essential functions during pachytene [40], whereas TDRD1 interacts with piRNAs and Piwi proteins to promote silencing [41]. To the best of our knowledge, neither DMRTC2 nor TDRD1 have been previously linked to breast tumors in general, nor to the HER-2 positive subtype in particular.

Lastly, we found two markers, LRGUK and TEX14, for which expression tends to mark luminal tumors (Fig. 2D). LRGUK is involved in diverse aspects of sperm assembly, including the microtubule-based shaping of spermatozoa [42]; it was more frequently overexpressed in luminal A breast tumors (Fig. 2D). As for TEX14, a factor necessary for intracellular bridges in germ cells [43], it marked luminal B breast cancers, as well as luminal A tumors to a smaller extent (Fig. 2D). While TEX14 was previously linked to basal breast tumors [44], we believe our study presents the first report demonstrating its more prevalent expression in Luminal tumors, especially of the more aggressive B subtype, and we are unaware of any publications linking LRGUK to breast tumors in general, nor to Luminal tumors in particular.

We next tested whether the associations we detected using tumor expression data also held true with cancer cell lines. As tumors are heterogeneous and consist of a mixture of cell types, including tumor cells and cells from the microenvironment, we asked if the expression of C/T genes detected in bulk RNAseq are due to their activation in tumor cells themselves. Detecting high expression of C/T genes in tumor-derived cell lines would be a strong cue for expression of the C/T genes in the tumor cells themselves. For this, we determined the expression level of the six markers described above in all the breast cell lines found in the Cancer Cell Line Encyclopedia (CCLE, Fig. S2G). We observed a good general agreement between tumors and cell lines of the same subtype. For instance, HORMAD1 and/or CT83 were highly expressed in the basal cell lines such as MDA-MB-436, MDA-MB-468, and HCC1599, but not in Luminal or Her2-positive cells. DMRTC2 and/or TDRD1 expression marked HER2-positive lines like AU565 or SKBR3. Finally, a typical Luminal A line, MCF7, expressed LRGUK and had the highest TEX14 levels. These results validate our findings, and also suggest that the overexpression of C/T genes detected in bulk RNAseq is due at least in part to the abnormal activation of these genes in tumor cells.

Finally, we asked whether the expression of these C/T genes could distinguish, within a breast cancer subtype, tumors with a different prognosis or therapeutic response. We examined relapse-free survival at more than 10 years, on a large panel of breast tumors of known subtype [45]. The activation of LRGUK in Luminal A or Luminal B tumors was an indicator of a good prognosis (Fig. 2E). Furthermore, this activation correlated with a better response to anthracyclines (used in standard-of-care chemotherapy regimens), although the trend failed to reach significance (Fig. S2H). Still in the luminal subtype, no significant association was found between survival and TEX14 expression (Fig. S2I), raising the possibility that the activation of certain C/T genes may be a neutral event, with no association with a particular phenotype, and not conferring a specific advantage or disadvantage, at least at this stage of tumor development.

For Her2-positive tumors, the expression of TDRD1 was not statistically linked to survival (Fig. S2J). In the same tumors, DMRTC2 expression tended to associate with poorer survival, however the trend did not reach statistical significance, maybe because the size of the DMRTC2-negative group was small (n = 20), in line with the prevalent re-expression of DMRTC2) in Her2-positive tumors (Fig. 2E). To detect other potentially useful characteristics of these tumors, we examined their immunological signature with the Immunoscore tool [46] (Fig. S2K): those with high DMRTC2 were more “hot”, i.e. more infiltrated, but also more immunosuppressive (high FOXP3 activation). Therefore, they might be attractive candidates for treatment with immune checkpoint inhibitors [47]. As far as we are aware, these associations are new and may be helpful for prognosis and treatment choice.

In the TCGA cohort, ~90% of basal-like tumors expressed HORMAD1 or CT83 at the RNA level, and ~60% expressed both (Fig. 2F). Basal-like tumors are a heterogeneous ensemble, but tumors expressing both HORMAD1 and CT83 tended to form a more homogeneous set, with fewer distinct anatomopathological groups and a reduced number of molecular signatures (Fig. S2N and Supplementary Table 2). Using the Lehmann classification [48], we found double-positive tumors in all subgroups (Fig. 2F). In breast cancer cell lines as well, 70% of basal-like cell lines from CCLE were positive for HORMAD1 or CT83 and 35% for both (Fig. 2G).

The activation of subtype-specific cancer/testis genes occurs in tumoral cells early during tumorigenesis and persists in metastasis

The association we report between expression of specific C/T genes and breast cancer subtypes was found in an unbiased analysis of the TCGA breast tumor set, but this set primarily contains mid- and late-stage malignancies. A practically and conceptually important question is whether these markers are already expressed at the early stages of tumorigenesis. To further explore this, we used RNA-seq analysis of early tumors (in situ and microinvasive) and invasive breast carcinomas of different subtypes (n = 55, our INVADE cohort, Fig. 3A). Twenty-four of the 35 early tumors (68%) expressed at least one of the markers, while 11 out of 20 invasive tumors (55%) did so. The association between marker and tumor type was generally respected: for instance, LRGUK was expressed in 14 tumors, of which 11 were luminal (p-value = 2 · 10−7), seven of those being early-stage, and the remaining four were invasive. TDRD1 was expressed in 12 samples, of which seven were HER2-positive (p-value = 2 · 10−4), and six out of those seven were early-stage. DMRTC2 was not found in any early HER2-positive samples, possibly indicating that its expression is induced later in tumorigenesis. The expression of HORMAD1 and CT83 was rare, which is unsurprising as basal-like tumors are rarely diagnosed at early stages.

Fig. 3: The activation of subtype-specific Cancer/Testis genes occurs in tumoral cells early during tumorigenesis and persists in metastasis.figure 3

A Expression of the indicated C/T genes in early (in situ or microinvasive) and late (invasive) tumors of the INVADE cohort. Hierarchical clustering of early and late breast tumor samples based on expression of the top 6 C/T genes previously described. A C/T gene is depicted as activated (black box) if its expression value is above the background expression threshold. Tumor subtypes are differentiated by color. B Expression level of breast-cancer specific C/T genes in matched primary tumor and metastasis. Color code reflects tumor subtype, as in (A). C UMAP representation of a scRNA-seq study on four triple-negative breast tumors (GSE161529). Each dot is either a tumor cell or a cell from the tumor microenvironment. Epithelial clusters of tumor cells are highlighted, HORMAD1 and CT83 expressions are shown. D Immunohistochemistry of HORMAD1 (top) or CT83 (bottom) in normal testis and in two breast cancer samples showing no (left) or positive (right) expression. E UMAP representation of a scRNA-seq study on 4 HER2 breast tumors (GSE161529). Each dot is either a tumor cell or a cell from the tumor microenvironment. Epithelial clusters of tumor cells are highlighted, DMRTC2 and TDRD1 expression is shown. F UMAP representation of a scRNA-seq study on 18 ER breast tumors (GSE161529). Each dot is either a tumor cell or a cell from the tumor microenvironment. Epithelial clusters of tumor cells are highlighted, TEX14 and LRGUK expression is shown.

We next asked whether the expression of these six markers is also present in metastases from breast cancer, depending on the subtype of the primary tumor. For this purpose, we used RNA-seq data from 83 primary tumors matched with the corresponding metastases (Fig. 3B). By separating the tumors according to PAM50 subtype, we observed two things: 1. All six markers are expressed by the corresponding subtype, including when the cancer is metastatic; 2. In the majority of cases, if the primary tumor expresses one of the six C/T markers, the metastases from this tumor will also express it. For HORMAD1 and CT83, we further validated this result on an independent dataset (Fig. S3A). Moreover, in the case of multiple metastatic seeding sites, metastases also maintained their expression of HORMAD1 or CT83 independently of the seeding site (Fig. S3B), implying that the expression of these C/T genes is maintained during metastatic progression. We hypothesized that the clones within the primary tumor that will evolve to form metastases are those which already express these C/T genes, further enhancing the potential of these genes as biomarkers.

Using single-cell RNA-seq data, we zoomed into intra-tumoral heterogeneity to understand whether C/T genes are indeed expressed by cancer cells (and not by cells from the microenvironment) and to analyze the clonality of this expression. We utilized a complete study on 39 breast tumors of different subtypes [49], selecting high quality cells and analyzing each tumor subtype separately. After dimensional reduction, cells formed clusters according to cell types (Fig. S3C). Of the four analyzed triple-negative breast tumors (Fig. 3C) we found several HORMAD1 (in 3/4 tumors) and CT83 (in 4/4 tumors) positive cells: they fall primarily into the tumor cell cluster. Interestingly, one of the four tumors shows two subclusters of tumor cells: one subcluster is positive for HORMAD1 yet the second is not, revealing that intra-tumoral heterogeneity may exist in some samples. Increasing the resolution of our analysis, we then used full-length scRNA-seq of triple-negative tumors [50] and again identified robust expression of HORMAD1 and CT83 in tumor cells (Fig. S3D). Within any given tumor approximately 20–40% of individual cancer cells express either HORMAD1 or CT83, and around 5–20% express both.

We then sought to confirm and complement these transcriptional analyses with immunohistochemistry (IHC). We screened antibodies and experimental conditions until we arrived at combinations under which the IHC pattern observed on human testis sections matched the results of single-cell RNA-seq in the same organ [39]. With these conditions, we could observe nuclear staining for HORMAD1 specifically in preleptotene spermatocytes, and staining in mature spermatozoids for CT83 (Fig. 3D). Using the same conditions on 99 tumor sections of mixed types, we verified that most triple-negative tumors (34 out of 40, 85%) expressed HORMAD1 and/or CT83 (Fig. S3E), and this activation is specific to the triple-negative subtype (p-value < 10−4). In the positive tumors, staining for HORMAD1 was predominantly nuclear, present in most or all tumor cells, and seemed absent from non-tumor cells of the microenvironment. CT83 staining was cytoplasmic but similarly marked most tumor cells, and few or no cells of the microenvironment (Fig. 3D).

For the HER2-related C/T markers DMRTC2 and TDRD1, only one tumor in the analyzed dataset was positive for TDRD1 (Fig. 3E). The expression in this tumor is due to the activation of TDRD1 primarily in tumor cells, with no obvious subclonality. The results for DMRTC2 were inconclusive. Finally, the expression pattern for the ER-related markers was more precarious, showing only a minority of tumor cells expressing either TEX14 or LRGUK (Fig. 3F), with a significant contribution of cells from the microenvironment for TEX14 expression.

The results from these datasets prompt several important conclusions: (1) the activation of C/T genes can be an early event during tumorigenesis, detectable within in situ tumors, (2) the type of C/T genes activated in a tumor is consistent between early and later-stage tumors, indicating there is no switch in expression, and (3) HORMAD1 and CT83 prove to be the most promising markers due to their association with the most deadly subtype of breast cancer and their robust pattern of expression (RNA and protein) in tumor cells detected at the single-cell level.

Single-cell RNAseq reveals the expression of the C/T markers in rare cells of the normal breast

As our prior results showed that all six C/T genes of interest are present in early tumors, we next investigated whether their activation is tumorigenesis-dependent, or if it occurs in rare cells within healthy tissue. If the latter is true, this activation could be a marker of plasticity of these few cells more likely to be transformed. Another hypothesis, compatible with the first scenario, would be that the early expression of C/T genes and the activity of the resulting proteins could facilitate tumorigenesis.

To delineate the origin of the C/T genes’ activation, we dove further into the different epithelial subtypes that compose the mammary gland (Fig. 4A). Here we utilized RNA expression data obtained on healthy cells sorted from reduction mammoplasties, where markers were used to FACS-sort stem cells, luminal progenitors, and mature luminal cells (Fig. 4B, [51]). Within this data, known genes displayed the expected expression pattern [51, 52]; for example, MSRB3 was expressed in stem cells but not more differentiated cells, whereas ESR1 had the opposite pattern (Fig. 4B). In contrast, none of the six C/T markers were detectably expressed in any of the sorted cell populations (Fig. 4B). In particular, HORMAD1 and CT83 were not detectably expressed in luminal progenitors, which are the proposed cells of origin for basal tumors [53, 54]. Therefore, from this bulk analysis, expression of the six C/T genes of interest in breast tumors does not seem to merely reflect pre-existing expression in any of the canonical cell subtypes from the mammary gland.

Fig. 4: Single-cell RNA-seq reveals the expression of C/T genes in rare cells of the normal breast.figure 4

A Schematic representation of the mammary gland, with the different epithelial and non-epithelial cell types indicated. B HORMAD1 and CT83 expression in sorted healthy mammary cells. The red dotted line represents the threshold for gene expression detection. MaSC mammary stem cell, LP luminal progenitor, ML mature luminal cell. C Left: scRNAseq of healthy breast samples from 26 healthy mammary glands (GSE161529) showing cell subtypes. Right: normalized expression of breast-cancer-specific C/T genes. Positive cells are emphasized, the percentage of positive cells in each cluster is indicated.

We investigated this question further using single-cell RNA-seq data from normal human breast samples, after FACS-enrichment for epithelial cells. Using a combination of dimensional reduction, unsupervised clustering approaches, and previously known markers, we were able to separate the three distinct epithelial cell types, one basal and two luminal cell types called secretory (Luminal 1 (L1), containing luminal progenitors) and hormone-responsive (Luminal 2, or L2) (Fig. S4A). The expression of Luminal (Krt18, LTF, AGR2) or Basal (Krt14) genes marked the expected populations (Fig. S4A), with no contamination from immune and stromal cells (Fig. S4A). Surprisingly, we detected some normal epithelial cells expressing C/T markers (Fig. S4B, red dots), however these cells were very rare: for example, only 12 out of 23,007 total cells expressed HORMAD1 and/or CT83, which is consistent with the lack of detection in the sorted cell populations of Fig. 4A. Interestingly, some of the C/T genes are expressed by specific cell subtypes (e.g. HORMAD1 and CT83 by L1 luminal cells), while others seem to be activated by several cell types (e.g. LRGUK, TEX14).

As this result was unexpected, it was necessary to strengthen it using an independent dataset, preferably using as many cells as possible. Utilizing a study containing nearly 80,000 cells derived from 15 mammoplasties [49], we were able to again identify the main cell types from the mammary gland (Figs. 4C and S4C, D). With this dataset containing four times as many cells as the previous one, we identified many more cells positive for the six C/T markers. Of these markers, HORMAD1 and CT83 differ in that they are activated primarily by L1 luminal cells, as before (Fig. 4C), however we did not detect any cell showing co-expression of these genes. This finding could be explained by the low number of events detected, combined with the dropout probability inherent to scRNAseq, as well as a possible counter-selection of this event in the healthy mammary gland. DMRTC2, on the other hand, is activated only by very few cells, all belonging to the L2 compartment (Fig. 4C). Here we find an association with the “reservoir” subtypes of basal-like and HER2 tumor-origin cells. In contrast, TDRD1, LRGUK, and TEX14 can be activated in virtually any cell of the mammary gland, including non-epithelial cells (Fig. 4C). For TEX14, this result is consistent with the heterogeneous expression patterns observed in tumors, where the tumor microenvironment also expresses this C/T gene (Fig. 3F).

We then tried to identify differentially expressed genes in C/T-expressing cells, compared to negative cells belonging to the same cell type, but we failed to identify any robust changes. This negative result may indicate that C/T gene activation is a neutral event for healthy cells and reflects only some alteration of transcription regulation, but it could also be an artifact due to the low number of positive cells identified.

These bioinformatic analyses provide clear evidence that, of the various C/T genes identified, HORMAD1 and CT83 are the most compelling as potential biomarkers for basal-like breast tumors due to their basal tumor-specific expression. Another surprising finding is their early activation in a small subset of specific cells in the healthy mammary gland, which is not shown for the other four C/T genes. This early activation event prompts the question of whether this expression predisposes cells to become transformed or is a marker of transcriptional and possibly epigenetic abnormalities of more plastic cells. We focused on these two genes to decipher whether their activation is linked to epigenetic alteration, and if their expression can induce functional changes in mammary epithelial cells.

Activation of HORMAD1 and CT83 in tumors involves epigenetic alterations

To begin uncovering the role of HORMAD1 and CT83 expression in basal-like breast tumors, we first wanted to understand how this aberrant expression becomes induced. Basal-like tumors are genetically unstable [55], so we examined whether HORMAD1 and CT83 overexpression could be due to gene amplification. We found two results arguing against this possibility. First, there were no correlations between Copy Number Variation (CNV) and mRNA levels for HORMAD1 or CT83 in basal tumors (Fig. S5A). Second, if the genes’ overexpression were due to an amplification of locus, then we would expect to see a positive correlation between the expression of HORMAD1 and its two adjoining genes (GOLPH3L, 1 kb away, and CTSS, 9 kb away), and/or between CT83 and its contiguous gene SLC6A14 (250 base pairs away). We failed to detect any such correlation, whereas the expression of a gene known to undergo amplification and used as a positive control in the analysis, ERBB2, correlated positively with the expression of the neighboring gene PGAP3 (Fig. S5B).

As amplification seemed unlikely to explain the overexpression of HORMAD1 and/or CT83, we next examined epigenetic events. These genes lack CpG islands, but both have promoters with an intermediate CpG density (ICP) (Fig. 5A). These promoters overlap the ATAC-seq peaks which are present in HORMAD1/CT83-expressing basal-like breast tumors, but absent in non-expressing tumors (Fig. 5A). Consistently, we found histone marks associated with promoter activity (H3K27ac, H3K4me3) in sperm and in a basal-like cancer cell line positive for HORMAD1 and CT83 (MDA-MB-436) but not in the normal breast or negative breast cell lines (Fig. S5C, D). We next investigated the DNA methylation status of these promoters using the Illumina 450 K arrays available in TCGA and GEO. As shown in Fig. S5E, we found high levels of methylation on the HORMAD1 and CT83 promoters in normal breast samples (that do not express the genes) and low levels of methylation in the sperm samples (where the genes are on). The tumor data show a strong correlation between expression and promoter demethylation for CT83 (Fig. 5B). The correlation is present but less absolute for HORMAD1, as some tumors overexpress HORMAD1 without displaying demethylation.

Fig. 5: Activation of HORMAD1 and CT83 in tumors involves epigenetic alterations.figure 5

A IGV representation of the HORMAD1 and CT83 genomic loci, with CpG density promoter classification according to the Weber/Schübeler criteria [89]. ATAC-seq data are from representative basal-like tumors (TCGA cohort). Differentially accessible regions (DAR) between these two groups of basal tumors were identified. B Inverse correlation between HORMAD1 and CT83 expression and the mean DNA methylation of their promoters (TSS ± 200 bp). Each dot represents a tumor and the color intensity indicates Copy Number Variation of the genomic locus. C Global epigenetic changes in basal-like breast tumors, according to HORMAD1 and CT83 expression status. Left: Total number of hypomethylated or hypermethylated CpG, expressed as percentage of all 450 K CpG with informative measure. Right: Chromatin accessibility, expressed as the number of accessible regions in ATAC-seq (mean ± SD). D VolcanoPlot of the differential expression of transposable elements, in basal-like tumors positive for HORMAD1 and CT83 expression vs. negative for both genes. E RT-qPCR analysis of HORMAD and CT83 expression in non-tumorigenic human mammary cell lines, in control condition or following a 48-h 5-Aza-dC treatment at the indicated concentrations. F Western Blot analysis of HORMAD1 and CT83 expression in non-tumorigenic human mammary cells, in control condition or following a 48-hour 5-Aza-dC treatment at 0.3 μM. G RT-qPCR analysis of HORMAD and CT83 expression at various time points, in the same cell line, after an initial perturbation with 0.3 or 1 μM 5-Aza-dC followed by a recovery period in drug-free medium.

Overall, tumors expressing HORMAD1 and CT83 show more permissive chromatin, with more hypomethylated regions compared to healthy tissue and more accessible chromatin regions (Fig. 5C). This hypomethylation is accompanied by the re-expression of many transposable elements, normally repressed by DNA methylation, in HORMAD1- and CT83- positive basal-like tumors compared to negative ones (Fig. 5D).

To understand whether demethylation is sufficient to induce HORMAD1 and CT83 expression, we used immortalized human mammary epithelial cells (HME and HMLE, [56]) treated in vitro with 5-aza-deoxycytidine (5-aza-dC). In the absence of treatment, we validated by RT-qPCR that these cell lines do not express HORMAD1 and CT83, in contrast to the triple-negative breast cancer line MDA-MB436 (Fig.

留言 (0)

沒有登入
gif