Comprehensive analysis of bulk and single-cell transcriptomic data reveals a novel signature associated with endoplasmic reticulum stress, lipid metabolism, and liver metastasis in pancreatic cancer

Identification of endoplasmic reticulum stress and lipid metabolism related genes in PDAC

This study was conducted following the procedures shown in Additional file 1: Fig. S1. Persistent ER stress and lipid metabolic disturbance are novel features of malignancies. To investigate the prognostic significance of ER stress and lipid metabolism in pancreatic cancer, lipid metabolic signal pathways and ER stress gene set were analyzed by ssGSEA and consensus clustering (Fig. 1A, C). The consensus matrix heatmap showed that k = 2 was the optimal classification method, dividing PDAC samples into Cluster 1(sample size = 89) and Cluster 2(sample size = 89). The clustering heatmap in Fig. 1C showed that ER stress and lipid metabolism related genes and pathways had higher enrichment scores in Cluster1. WGCNA analysis was then performed to identify co-expressed genes involved in high and low ERS_Lipid subgroups in PDAC. In the process of co-expression network construction, we observed that the soft thresholding power β was 7 when the fit index of scale-free topology reached 0.9 and 18 modules were identified (Fig. 1B, D). The purple module was observed to have most significant correlations with ERS_Lipid in PDAC (Fig. 1D). Ultimately, 1240 ER stress and lipid metabolism related genes from both marker genes and WGCNA module genes were screened out for further analysis.

Fig. 1figure 1

Identification of ER stress and lipid metabolism related genes. A Consensus clustering analysis of ER stress and lipid metabolic pathways, the TCGA data set is divided into two well-differentiated subgroups when k = 2. B This section describes related parameters of WGCNA analysis. C Heatmap and clustering of ssGSEA fractions of ER stress and lipid metabolism-related pathways. D WGCNA analysis screened out ERS_Lipid co-expressed gene modules

Identification of liver metastasis related genes in PDAC

Liver metastasis is a devastating factor for the poor prognosis and high mortality of PDAC. In order to elucidate the specific mechanism and intrinsic driving factors of liver metastasis in PDAC, both single cell and bulk transcriptional analyses were conducted to select the differentially expressed genes between primary PDAC tissues and liver metastatic tissues (|log FC|> 1& FDR < 0.05; Fig. 2 and Additional file 1: Fig. S3). Each cell in the detecting samples is scored based on the cell type reference in SingleR package and the results of cell type were shown by the cell score heatmap (Fig. 2A). Nine cell types were identified in the single cell transcriptional analysis of GSE154778 cohort and 11 cell types were identified in GSE197177 cohort (Fig. 2B and Additional file 1: Fig. S3). The gene enrichment analysis showed that the scores of cell cycle, purine metabolism, pyrimidine metabolism and metabolic pathways were significant higher in liver metastatic epithelial cells than in primary epithelial cells (p < 0.0001, Fig. 2C). Estrogen response, androgen response, protein secretion and other signaling pathways were significantly upregulated in both primary and liver metastatic epithelial cells, while reversed in macrophages, monocytes and T cells. And the unfolded protein response was significantly enriched in liver metastatic epithelial cells, but not significantly changed in primary epithelial cells (Additional file 1: Fig. S3D, E). Liver metastatic co-expressed genes were also screened through WGCNA analysis in GSE71729 and GSE34153 cohorts (Fig. 2D, E). Finally, 722 liver metastatic DEGs from Single cell and bulk transcriptomic analyses, 857 liver metastatic co-expression genes from WGCNA analyses were identified. And 3887 DEGs between normal and tumor tissues in TCGA and GTEx cohorts were screened out through Limma R package with |log FC|> 1 and FDR < 0.05 (Additional file 1: Fig. S2). All the identified genes related to ERS_Lipid, liver metastasis and tumorous DEGs were intersected and 92 signatures were included in the subsequent analysis.

Fig. 2figure 2

Identification of liver metastasis related genes. A Heatmap of cell scores. When the cell scores of a cluster significantly are higher in a reference cell type than other labels, they are annotated as that cell type. B Cell subpopulation clustering and annotation results of GSE154778 data set. C Alterations of cell cycle, purine metabolism, pyrimidine metabolism, and metabolic signaling pathways in epithelial cells between primary and liver metastasis groups. D, E WGCNA analysis screened the gene modules co-expressed with liver metastasis in GSE71729 and GSE34153. F The Venn diagram illustrated the presence of 92 shared genes

Machine learning based integration constructs a prognostic model for PDAC

The 92 consensus genes screened above were subjected to the univariate Cox analysis and 42 prognostic genes were identified (Additional file 4: Table S3). Then the 42 genes were analyzed in a machine learning based integration program to establish a consensus ERS_Lipid signature for predicting the overall survival of PDAC patients. The leave-one-out cross-validation (LOOCV) framework was employed to fit a combination of 10 machine learning algorithms with hyperparameter tuning in the training set, and the C-index of each model was further calculated across the testing set (Fig. 3A). The optimal model was the combination of stepCox[backward] and plsRcox with the highest average C-index (0.691) in all model types (Fig. 3A). Seven consensus genes with leading prognostic value were identified and the gene coefficients were further calculated in the model (Fig. 3B, C). A risk score for each patient was then calculated with the expression of 7 genes weighted by their regression coefficients. According to the optimal threshold determined, all patients were divided into high-risk and low-risk groups. In order to further assess the influence of sample size on model accuracy, mitigate overfitting, and enhance generalizability and reliability, we also validated the model in the TCGA and ICGC datasets and found that the model functioned well in distinguishing the patients into high and low risk groups (Fig. 3D). Univariate cox regression analysis and multivariate cox regression analysis suggested that risk score can be used as an independent risk factor for the prognosis of PDAC patients (Fig. 3E and Additional file 1: Fig. S4). A nomogram was also established to take both clinical variates and risk score into account for clinical use (Fig. 3F). Stratifying PDAC patients based on clinical characteristics, the study validated the prognostic predictive ability of the ERS_Lipid signature across different clinical subgroups. The results demonstrated that the ERS_Lipid signature exhibited good prognostic discrimination in patients of varying ages, genders, grade stages, and T, N, M classifications (Additional file 1: Fig. S4). These findings suggested the prognostic model exhibited robustness and generalizability across diverse patient populations.

Fig. 3figure 3

Construction and validation of a prognostic model for pancreatic cancer. A A combination of machine learning predictive models calculates the C-index for each model on the training set and the validation set. B Forest map of univariate cox regression analysis showed hazard ratios of the seven selected genes. C Coefficients of genes in the prognostic model. D Kaplan–Meier curves of OS in the train, test sets and TCGA, ICGC cohorts based on the model showed longer survival time in low-risk groups. E Univariate and multivariate Cox regression analyses were performed to assess the prognostic value of the risk score in conjunction with additional clinical features. F A nomogram combining risk scores and clinical information was constructed in the TCGA dataset

Validation and evaluation of the novel model and the related gene enrichment analyses

To further evaluate the prognostic value of the established model, receiver operating characteristic (ROC) curves were plotted and the area under the ROC curve (AUC) was calculated at 1, 3 and 5 years, respectively. The results suggested that the prognostic accuracy of risk score was significantly higher than that of age, gender and stage (Fig. 4A, B). The Decision Curve Analysis (DCA) indicated that the clinical benefit of risk score in evaluating prognosis of PDAC was greater than that of age, gender and stage (Fig. 4C). The calibration curves of risk score at 1-, 3- and 5-year showed that risk score was robust in predicting the survival time in the training and testing sets (Fig. 4D). The risk scores derived from this predictive model exhibited a significant positive correlation with tumor grade, with a notably higher proportion of G3-4 patients observed in the high-risk group compared to the low-risk group (Fig. 4E, F). These findings suggest an association between this risk features and the clinical progression of PDAC patients. Patients in high-risk group had shorter survival time than in low-risk group across all the training and validation cohorts (Fig. 4G). The GSEA analysis was also performed to elucidate the signal pathways alteration in high- and low-risk groups. The results showed that focal adhesion, pancreatic cancer, pathways in cancer and ubiquitin mediated proteolysis were enriched in high-risk group, while arachidonic acid metabolism, oxidative phosphorylation and phenylalanine metabolism were enriched in patients with low-risk scores (Fig. 4H, J and Additional file 5: Table S4). And the correlation between risk scores and hallmark signal pathways was analyzed with GSVA algorithm. Risk score was significantly positively correlated with most of the tumor-related signaling pathways, such as MTOR, Insulin, ERBB and Wnt signal pathways, while significantly negatively related to PPAR and Notch signaling pathways (Fig. 4I and Additional file 1: Fig. S5).

Fig. 4figure 4

Evaluation of the novel prognostic model. A The ROC curves showed the 1-year, 3-year, and 5-year survival prediction accuracy of the model in the training and testing sets. B AUC value of risk score was higher than that of clinical characteristics. C The DCA curves showed the respective benefit of risk score and other clinical features in predicting clinical outcomes. D The 1-, 3-, and 5-year calibration curves assessed the predictive robustness of the model. E, F Risk score was significantly associated with the Grade classification of patients in the TCGA cohort. G Survival time in the high- and low-risk groups. H, J The Top5 signal pathways enriched in high- and low-risk groups with adjusted P value < 0.05. I GSVA analysis showed relationship between KEGG pathways and the model genes

Furthermore, the comparison of the constructed model with 7 previously published prognostic models were also conducted [33,34,35,36,37,38,39]. All signatures exhibited excellent prognostic discrimination, with significantly lower survival times observed in the high-risk group compared to the low-risk group (Fig. 5A–H). However, the ERS-Lipid signature demonstrated larger AUC values at 1, 3, and 5 years compared to the seven previously published signatures, indicating higher prognostic accuracy in PDAC (Fig. 5I–P). Additionally, the ERS-Lipid signature displayed higher C-index values in both the entire training cohort and validation cohort, suggesting its superior robustness (Fig. 5Q). The results suggested that the ERS_Lipid signature functioned well in predicting the survival outcomes of patients with PDAC (Figs. 4, 5).

Fig. 5figure 5

Comparison of the ERS_Lipid-related prognostic model with seven published prognostic models. A–H The Kaplan–Meier survival curves demonstrated that both the ERS_Lipid signature and the seven previously published prognostic signatures exhibit significant prognostic discrimination in the integrated training dataset. I–P The area under the ROC curve of the ERS_Lipid signature was greater than that of the other seven previously published signatures. Q ERS_Lipid signature demonstrated a higher C-index value in both the integrated training and validation datasets. “*”p < 0.05, “**”p < 0.01, “***”p < 0.001, “****”p < 0.0001

Analyses of tumor mutation burden and drug sensitivity between high- and low-risk groups

In order to further explore the correlation between tumor mutation burden (TMB) and risk score, the somatic mutation landscape of each PDAC sample in the high- and the low-risk groups were visualized by waterfall diagram, and it was observed that the top 20 mutant genes in the two subgroups were basically the same, but the total mutation load was 93.55% in the high-risk group and 74.77% in the low-risk group (Fig. 6A, B). We found the top 4 genes KRAS, TP53, CDKN2A, and SMAD4 with the highest mutation load in the high- and low-risk groups. For the classification of variation, missense mutation, nonsense mutation and frame shift del are the top 3 across all mutation types. The results also showed that risk score had significantly positive correlation with TMB (Fig. 6C, D). And the survival time of patients in high-risk score and high TMB group was shorter than that of patients with low-risk score and low TMB (Fig. 6E, F). The drug sensitivity analysis in the high- and low-risk groups showed that the risk score was correlated with multiple drug sensitivity (Fig. 6G). The results showed that the sensitivity to 5-Fluorouracil, Afatinib, Irinotecan, Lapatinib and Trametinib in patients with high-risk scores was lower than in low-risk group, while reversed results were observed in the sensitivity to some drugs, such as Docetaxel, Epirubicin and Gefitinib (Fig. 6G).

Fig. 6figure 6

Analyses of tumor mutation burden and drug sensitivity in high-low risk groups. A, B The accumulated alteration of mutated genes in high- and low-risk groups. C, D TMB was positively related to risk scores. E, F Kaplan_Meier curves showed TMB and risk scores correlated with adverse prognosis. G The drug sensitivity in high- and low-risk groups. The Y-axis showed the IC50 value, which was negatively correlated with drug sensitivity

Analysis of tumor immune landscape between high- and low-risk groups

The heterogeneity of tumor immune microenvironment is an important factor affecting tumor development and prognosis. To assess the immune landscape in high- and low-risk groups, we analyzed the alteration of TIDE [30] scores and IPS scores (Fig. 6A–H). The results showed that TIDE score, immune exclusion score and MDSC score are higher in the high-risk group than in low-risk group, indicating higher likelihood of immune escape (Fig. 7A–D). And IPS of CTLA(+)PD1(−) was significantly lower in high risk group, while IPS of CTLA(+)PD1(+), CTLA(−)PD1(+) and CTLA(−)PD1(−) had no significant difference in high- and low-risk groups (Fig. 7E–H). This suggested that patients in the high-risk group may not respond well to immunotherapy. The immune infiltration alteration was also explored through CIBERSORT and ssGSEA analyses, which offered a computational estimation within the tumor microenvironment. The results showed that the infiltration levels of CD8+ T cells, M1 macrophages, T follicular helper cells, T regulatory cells and resting Dendritic cells were down-regulated in high-risk group, while that of plasma cells, resting CD4+ T cells memory, activated CD4+ T cells memory, resting NK cells and activated Dendritic cells were reversed (Additional file 1: Fig. S6). The enriched scores of immune infiltration and immune function showed that immune cells such as activated B cells, activated CD8+ T cells, CD56 bright NK cells, CD4+ central memory T cells and CD8+ effector T cells were significantly down-regulated in high-risk group. And The enrichment scores of activated CD4+ T cells, natural killer T cells, Th1 helper cells and Th2 helper cells was higher in the high-risk group (Fig. 7I, J). The correlation of check-point genes with risk score and prognostic genes was also analyzed and displayed in Fig. 7K. As for the immune functions, APC_co_inhibition, check point and type II IFN response signal pathways were upregulated in high-risk group, while immune pathways such as APC_co_stimulation, CCR, cytolytic activity and type I IFN response signal pathway were decreased in high-risk group (Fig. 7L). Furthermore, Fisher’s test revealed significant differences in the cohort distribution of response to immunotherapy among patients in high- and low-risk groups (Fig. 7M). Submap analysis provided additional validation that patients classified in the low-risk group exhibited a higher responsiveness to immunotherapy targeting CTLA4 compared to those in the high-risk group (adjusted p value < 0.05; Fig. 7N) (Additional file 6: Table S5).

Fig. 7figure 7

Immune landscape in high- and low-risk groups. Alteration of A TIDE score, B immune dysfunction, C immune exclusion and D MDSC in high- and low-risk groups. E–H IPS scores of PD1 and CTLA4 in high- and low-risk groups. I, J The immune infiltration variations in high- and low-risk groups. K Risk scores correlated with immune checkpoints. L Immune function altered in high- and low-risk groups. M Fisher’s test revealed the cohort distribution of immunotherapy response among patients categorized into high and low-risk groups within the TCGA dataset. N The responsiveness of patients to immune checkpoint inhibitor (ICI) treatment based on submap algorithm and the TIDE scores with Bonferroni corrected p values

Cell–cell communication analysis between high- and low-ERS groups

CellChat algorithm [32] was applied to explore and estimate the intercellular signaling communications based on gene expression information from single-cell transcriptomics. Communication patterns between high- and low-ERS groups were compared to predict pathological changes in cell–cell communication of PDAC. The results showed that the number and interaction strength of cell–cell communication were increased in high- ERS group (Fig. 8A, B). The interaction number and strength of epithelial cells (sender) to macrophages and monocytes were augmented, while the interaction number of monocytes, macrophages and T cells (sender) to epithelial cells and the interaction strength of other cells such as epithelial cells (sender) to T cells were down-regulated in high-ERS group (Fig. 8D). The information flows of each signaling pathway were then calculated to identify the communicating probability over all the pairs of cell types and the information flow between epithelial cells and other cells were further quantified (Fig. 8C, E). CLDN pathway was decreased, whereas some others like RESITIN, MIF, and SPP1 were increased in high-ERS group. And PECAM1 signal pathway was turned off, while some pathways, such as CEACAM, DESMOSOME and PTPRM signal pathways were turned on in high-ERS group (Fig. 8F, G). We selected three interesting signaling pathways for further analysis. The communicating interactions in MIF, APP and SPP1 signaling pathways among all cell types in the high- and low-ERS groups were analyzed and shown in Fig. 8G–L. We also investigated alterations of cell–cell communication in the high-low lipid metabolism groups. The number and intensity of cell–cell interactions in the high-lipid metabolism group also increased significantly. Concurrently, some signaling pathways related to tumor proliferation and progression, such as NOTCH, APP, and CEACAM, were also significantly activated in the high-lipid metabolism group (Additional file 1: Fig. S7).

Fig. 8figure 8

Cell–cell communication analysis between high- and low-ERS groups. A The number and strength of cell interaction mediated by individual signal pathways in high and low ERS groups. B Circle plots of communicating number and strength between immune cells and tumor cells. C The ranking bar chart showed the signal axes of interactive networks in high- and low-ERS groups. The signaling pathway with red labels was more abundant in the low-ERS group, the signaling pathway marked black was equally abundant in both groups, and the signaling pathway with green labels was more enriched in the high-ERS group. D Heatmap of cell–cell communication number and strength. Blue indicated reduced intercellular communication in the high-ERS group compared to the control group, while red indicated enhanced intercellular communication. E Bubble map of altered cell–cell communication mediated by individual signaling axes, with the horizontal axis showing the cell class that initiates and receives the signal, and the vertical axis showing receptor-ligand pairs of the signaling pathway. F Heatmaps displayed the overall (both outgoing and incoming) signal flows of each cell population. G, H MIF signal pathway, I, J SPP1 signal pathway and K, L APP signal pathway in low- and high-ERS groups

Metabolism reprogramming analysis based on the single cell transcriptomes

Analysis of the PDAC single-cell dataset using the UCell (Version 2.6.2) R package revealed active ER stress signals across all cellular subtypes, particularly prominent in Epithelial cells. Furthermore, downstream adaptive pathways of endoplasmic reticulum stress, such as the unfolded protein response, exhibited significant activation in all cellular subtypes. Concurrently, lipid metabolism-related pathways, including fatty acid metabolism, cholesterol homeostasis, and bile acid metabolism, were notably enriched in various cellular subtypes, with a pronounced enrichment observed in Epithelial cells. Notably, there was no discernible difference in DNA damage among the different PDAC cellular subtypes. Malignancy-associated gene sets related to metastasis, stemness, and proliferation demonstrated higher enrichment scores in Epithelial cells, tissue stem cells, other stromal cells, and macrophages (Fig. 9A). To further elucidate the metabolic alterations between high- and low-groups of ER stress and lipid metabolism, we investigated the metabolic pathways by scMetabolism R package [27]. Apart from phenylalanine metabolism, taurine and hypotaurine metabolism, the majority of metabolic pathways from primary tumor cells were activated in the high-groups of both ERS and lipid metabolism, such as purine metabolism, pyrimidine metabolism, steroid biosynthesis, nitrogen metabolism and fatty acid biosynthesis (Fig. 9B, D). Most metabolic pathways in liver metastatic cells were also significantly up-regulated in the high-ERS and high-lipid metabolic groups, except oxidative phosphorylation, glycosphingolipid biosynthesis, and one carbon pool by folate (Fig. 9C, E). The metabolic changes of tumor cells represented by epithelial cells and immune cells represented by T cells in the primary and liver metastasis groups were also significant (Fig. 9F, G).

Fig. 9figure 9

Metabolic alterations analyses based on single cell RNA_seq data. A Enrichment strength of signaling pathways related to endoplasmic reticulum stress, lipid metabolism, and malignant behavior in the pancreatic cancer single-cell dataset as depicted in the feature plot. B, C Metabolism reprogramming pathways in the high- and low-ER stress groups in primary and liver metastatic cells. D, E Metabolism reprogramming pathways in the high- and low-lipid metabolism groups in primary and liver metastatic cells. F The metabolic changes of epithelial cells in primary and liver metastatic groups. G The metabolic alterations of T cells in primary and liver metastatic groups

Validation of the expression and prognostic value of ERS_Lipid-related hub genes

In this investigation, a prognostically significant ERS_Lipid signature comprising seven genes was identified through a combination of machine learning algorithms. Notably, among these genes, SOD2, P4HB, and TNFSF10 were identified as hub genes associated with adverse prognostic outcomes in pancreatic cancer. To further validate the predictive value of the ERS_Lipid signature, experimental validation was conducted to assess the expression levels and prognostic relevance of the three risk genes. Single-cell sequencing data indicated widespread expression of SOD2 and P4HB across various cell populations, with TNFSF10 primarily expressed in epithelial cells, and APOE predominantly expressed in macrophages and a small subset of stromal cells. NPC1L1, RAP1GAP, and ADH1C show relatively low expression levels across different cell populations (Fig. 10A). The qPCR results showed that the expression of SOD2 and TNFSF10 was elevated in the PANC1, CAPAN1, BXPC3, and BXPC3-LMT cell lines compared to HPNE. The expression of P4HB was increased in PANC1 and CAPAN1, decreased in BXPC3, while significantly upregulated in BXPC3-LMT (Fig. 10B). The protein expression of the 3 hub genes was examined by IHC to determine significant differences between the PDAC group (n = 80) and the normal group (n = 80). SOD2 exhibited a total positive rate of 75% (60/80) in the PDAC group and 63.75% (51/80) in the normal group. P4HB showed a total positive rate of 96.25% (77/80) in the PDAC group and 100% (80/80) in the normal group. TNFSF10 displayed a total positive rate of 95% (76/80) in the normal group and 100% (80/80) in the PDAC group (Fig. 10C). The positive areas of SOD2 and TNFSF10 in the PDAC group were significantly higher than those in the normal group, while the positive areas of P4HB showed no significant change (Fig. 10D). However, P4HB is predominantly expressed in acinar cells in normal tissues, with significantly lower expression levels observed in pancreatic ductal cells compared to tumor tissues. Furthermore, the correlation between the positive area of each protein in tissues and progress free survival (PFS) time was examined in 35 patients with complete follow-up data. The findings revealed a negative correlation between SDO2, P4HB and TNFSF10 expression levels and prognosis in PDAC patients. (adjusted p value < 0.05, Fig. 10E).

Fig. 10figure 10

Validation of the expression levels and prognostic relevance of hub genes. A Gene expression levels of ERS_Lipid signature in single-cell dataset. B Relative RNA expression of SOD2, P4HB and TNFSF10. C, D The protein expression levels of SOD2, P4HB and TNFSF10 were detected by immunohistochemical staining in 80 pairs of PDAC tumor tissues and non-tumor tissues using multiple tests with adjusted p values. E Correlation analysis of the hub gene expression levels with prognosis in PDAC. “ns” adjusted p value > 0.05, “*” adjusted p value < 0.05, “**” adjusted p value < 0.01, “***” adjusted p value < 0.001, “****” adjusted p value < 0.0001

留言 (0)

沒有登入
gif