Analysing DNA methylation and transcriptomic signatures to predict prostate cancer recurrence risk

2.1 Clinical characteristics of patients

After screening and filtering, 490 patients diagnosed with prostate cancer each of whom comprehensive methylation and clinical records were obtained in TCGA portal. The median age at the time of diagnosis was 60 years (40–80). There was a reoccurrence rate to 15% within five years among the entire patients. The T stage of the patients with prostate cancer was classified, hence ranging between II to IV stages. More precisely, 184 patients (37.55%) were stage I, 286 patients (58.37%) were stage III and 10 patients (2.04%) were stage IV. The racial background of the study participants was as follows, Whites 396 (80.82%), Asians 14 (2.86%), Blacks 55 (11.22%), and undetermined racial backgrounds 20 (4.08%). Based on Gleason scores, patients were stratified into three groups: < 7 (43 patients, 8.78%), = 7 (241 patients, 49.18%), and > 7 (196 patients, 40.00%) (Table 1). All the patients were grouped into training and refrence cohorts keeping a proportion of 70:30 to ease in analysis. Figure 1 provides an overview of the study's overall design and workflow.

Table 1 Clinical characteristics of prostate cancer patientsFig. 1figure 1

Study outline for machine learned based identification of gene expression and methylation biomarkers for prostate cancer recurrence risk

2.2 Identification of DMGs and DEGs

To identify significantly differentially methylated genes (DMGs) and differential expression gene (DEG) sets, we explored 22,602 DNA methylated genes and 20,531 RNA expression data sets in the TCGA database between recurrence versus no-recurrence groups (Fig. 2a). Thus, volcano plots represented the division of DNA methylation or gene expression fold changes by each group (Fig. 2a, b). Overall, we defined 684 DMGs (p < 0.01; logFC > 0.025) and 691 DEGs (p < 0.01; logFC > 0.4). Of these, six genes were found hypermethylated while four were hypomethylated, and they overlapped with the DEGs. It, therefore, implied that a negative correlation (p < 0.05) was discovered between DNA methylation and mRNA expression levels in ten of these genes (TNNI2, SPIN2, COL5A3, RNF169, CCND1, FGFR1, SLC17A2, FAMM71F2, RREB1, AOX1). The recurrent prediction model was then developed based on the ten DNA-methylated genes.

Fig. 2figure 2

Selection of DMGs in recurrent and nonrecurrent prostate cancer samples. a Volcano plot displaying DMGs. b Volcano plot showing DEGs

Here we identified ten genes with significant epigenetic alterations, specifically six hypermethylated and four hypomethylated, all of which overlapped with the differentially expressed genes (DEGs). This overlap highlights a crucial relationship between DNA methylation and gene expression levels. Statistical analysis revealed a significant negative correlation (p < 0.05) between DNA methylation and mRNA expression for these genes, indicating that hypermethylation was associated with decreased expression, while hypomethylation correlated with increased expression.

2.3 The SVM-based predictions

We next trained the method of SVM model for recurrence outcome based on ten overlapped genes from both methylation differential expression data. In this, Fig. 3A, it is the receiver operating characteristic (ROC) curve with an area under the curve (AUC) of 0.773. Using the below formula hypotheses the recurrence rate (β values) of top DNA-methylated genes using SVM based recurrence construction.

Fig. 3figure 3

A ROC curve of SVM-based recurrence prediction with the AUC B Visualization and correlation DNA methylation-based recurrence score with RFS. Red points represent patients with recurrence, while blue points represent patients without recurrence. C Survival analysis of the prostate cancer patients using the prediction model recurrence score

$$Recurrence Score =0.46 \times _ -0.59 \times _ +0.15 \times _ +0.35 \times _ +0.10 \times _ +0.23 \times _+0.37 \times _+0.61 \times _+0.19 \times _+0.72 \times _$$

2.4 Regression analyses

On univariate analysis of these patients' SVM scores and clinical features. Univariate analysis of the risk factors, including SVM score, Gleason score, M stage, N stage as well as the pathologic grade (p < 0.05). Furthermore, the risk factors were used for multivariate regression analysis, which found that the SVM score (hazard ratio [HR] = 0.45; 95% confidence interval [CI] 0.28–0.69, P < 0.001) and N stage (HR = 2.96; 95% CI 1.21–7.31, P < 0.05) could independently predict clinical recurrence.

2.5 Recurrence-free survival by methylation-derived recurrence scores

We were testing for the effect of methylation risk score people on prostate cancer subject's RFS. It turned out that there is significant difference in the low-risk score and the high-risk score people in terms of RFS. Figure 3B shows that the patients defined as low-risk had consistently significantly better RFS than their high-risk counterparts, without consideration of the other differences. This is further clarified by a Kaplan–Meier curve (Fig. 3C) that the model based low-risk group showed an apparently extended recurrence free survival than the high risk.

2.6 GO and KEGG analysis

To examine the molecular functional landscape of DMGs, gene ontology (GO) analysis for biological processes (BPs), molecular functions (MFs), and cellular components (CCs) was conducted using DAVID. MFs enriched protein binding, GTPase activator activity poly(A) RNA binding, and protein homodimerization activity. BPs were enriched in regulation of transcription, signal transduction and positive regulation of activity immune response. CCs were enriched in nucleoplasm membrane plasma membrane, and cytosol. Furthermore, KEGG analysis results indicated enrichment of DMGs in pathways that related to human T-lymphotropic virus 1 infection, cytokine–cytokine receptor interaction, T-cell receptor signaling (TCR) pathway and natural killer cell-mediated cytotoxicity (Figure S1).

2.7 Findings of promoter methylation and realtime PCR

In this study, we confirmed the differential expression of ten genes (TNNI2, SPIN2, COL5A3, RNF169, CCND1, FGFR1, SLC17A2, FAMM71F2, RREB1, and AOX1) through quantitative real-time PCR (qRT-PCR). The gene expression levels in PC3 cells were compared to non-cancerous PNT2 cells, and the LogFC values were calculated to reflect fold changes in gene expression between the two cell types. The analysis revealed that TNNI2 (LogFC = 1.68), SPIN2 (LogFC = 1.54), COL5A3 (LogFC = 2.42), SLC17A2 (LogFC = 1.92), and AOX1 (LogFC = 1.25) were significantly downregulated in PC3 cells compared to PNT2 cells. In contrast, RNF169 (LogFC = -0.66), CCND1 (LogFC = − 1.47), FGFR1 (LogFC = -1.74), FAMM71F2 (LogFC = − 1.52), and RREB1 (LogFC = -1.43) were upregulated in PC3 cells. These results highlight notable differences in gene expression between prostate cancer (PC3) and non-cancerous prostate (PNT2) cells, suggesting that these genes may play significant roles in the progression or suppression of prostate cancer. The log fold change (LogFC) values reflect the magnitude of expression differences, with higher positive values indicating greater downregulation in PC3 cells and negative values indicating upregulation (Fig. 4AB).

Fig. 4figure 4

qRT-PCR Analysis of Gene Expression in PC3 and PNT2 Cells. A Relative gene expression levels of TNNI2, SPIN2, COL5A3, RNF169, CCND1, FGFR1, SLC17A2, FAMM71F2, RREB1, and AOX1 in PC3 and PNT2 cells. Error bars show standard deviation. B Log fold changes (LogFC) of gene expression in PC3 vs. PNT2 cells. Positive values indicate upregulation, and negative values indicate downregulation in PC3 cells

2.8 Validation of differentially expressed and methylated genes in prostate cancer cells using qRT-PCR

In this study, ten genes (TNNI2, SPIN2, COL5A3, RNF169, CCND1, FGFR1, SLC17A2, FAM71F2, RREB1, and AOX1), initially identified through TCGA analysis, were further validated for their differential expression at the transcript level and methylation patterns using quantitative real-time PCR (qRT-PCR). The gene expression trends closely mirrored the LogFC values observed from the microarray data. Specifically, TNNI2, SPIN2, COL5A3, AOX1, and SLC17A2 showed significant upregulation in PC3 cells compared to PNT2 cells, as shown in Fig. 5. In contrast, RNF169, CCND1, FGFR1, FAM71F2, and RREB1 were downregulated. These findings underscore notable gene expression differences between PC3 and PNT2 cells (Fig. 5A), highlighting the differential expression profiles of these genes.

Fig. 5figure 5

Gene Expression Analysis of PC3 vs. PNT2 cell lines. A LogFC values of gene expression in PC3 vs. PNT2, B LogFC values of methylation in PC3 vs. PNT2 Methylation in PC3 vs. PNT2, C R squared values showing the strength of relation between methylation versus gene expression data D. Pearson’s values showing the linear relationship between methylation and expression levels

Moreover, methylation profiles, as indicated by CT values, revealed clear hypermethylation of key genes like COL5A3, FGFR1, and SLC17A2 in PC3 cells (Fig. 2). This pattern aligns well with the microarray LogFC values, further corroborating our earlier findings. In summary, the data emphasize the significant role these genes play in distinguishing between Prostate Cancer (PC3) and Non-Cancerous (PNT2) cell lines.

Methylation analysis similarly showed distinct LogFC values between PC3 and PNT2 cells (Fig. 5B), with specific regions exhibiting increased or decreased methylation in PC3 cells. Correlation analysis revealed varying strengths of association between methylation and gene expression, as reflected in the R-squared values. Some gene-methylation pairs showed higher R-squared values, indicating a strong relationship between methylation changes and gene expression. Pearson's correlation coefficients further illustrated the linear relationship between methylation and gene expression levels, with both positive and negative correlations highlighting how alterations in methylation correspond to shifts in gene expression (Figs. 5C, D).

留言 (0)

沒有登入
gif