Colorectal cancer (CRC) ranks as the third most common malignancy worldwide and the second leading cause of cancer-related death across genders (Sung et al., 2021). GLOBOCAN data from 2020 forecasts an annual incidence of over 1.9 million new cases of CRC, resulting in 0.9 million deaths (Morgan et al., 2023). Projections from 2024 cancer statistics anticipate approximately 152,810 new CRC cases and 53,010 fatalities in the United States (Siegel et al., 2024). Noteworthy is the escalating incidence of early-onset or young-onset cancer (<50 years), projected to account for 23% of rectal cancer and 11% of colon cancer by 2030 (Spaander et al., 2023). Despite various treatment options for early-stage CRCs, such as local endoscopic or surgical excision and systemic chemotherapy, those in advanced stages encounter a less than 40% 5-year overall survival (OS) rate due to metastases and treatment resistance (Ciardiello et al., 2022). Although immunotherapy represents an innovative treatment for colon cancer, only a minority of patients with specific biomarkers, such as microsatellite instability-high and/or mismatch repair (MMR) deficient tumors, benefit from this approach (Bando et al., 2023). Relevant studies have shown that Chinese herbal medicines (CHMs) can inhibit CRC progression through multi-target molecular mechanisms and epigenetic regulation, as well as alleviate chemotherapy side effects. However, their clinical application requires addressing challenges such as complex composition, standardized production, and efficacy validation (Kong et al., 2020). Thus, there is a pressing need to explore novel potential biomarkers and signatures for early CRC diagnosis to facilitate targeted, personalized treatment strategies.
Cancer-associated fibroblasts represent the most common stromal cellular constituents in the tumor microenvironment (TME) and play multifaceted roles in cancer progression. They demonstrate significant heterogeneity in origin and phenotype, influencing distinct tumor biological behaviors, and predominantly facilitating tumor growth (Chen and Song, 2019; Sahai et al., 2020). Cancer-associated fibroblasts modulate cancer metastasis and invasion through synthesizing matrix-crosslinking enzymes to allow the tumor extracellular matrix (ECM) remodeling (Gaggioli et al., 2007; Nguyen et al., 2019; DuFort et al., 2016), releasing growth factors, cytokines, and exosomes that can influence angiogenesis, tumor mechanics, drug delivery, and therapy responses (Calon et al., 2012; Shi et al., 2019; Bruzzese et al., 2014; Straussman et al., 2012). Whereas, specific subtypes of cancer-associated fibroblasts also exhibit tumor inhibitory activities in some cancer types and have been associated with improved treatment outcomes (Ogawa et al., 2021; Bhattacharjee et al., 2021; Chen et al., 2021). In CRC, the prevailing theory posits that cancer-associated fibroblasts significantly contribute to tumor progression, which can promote angiogenesis (Hsu et al., 2023), epithelial-mesenchymal transition (EMT) (Hu JL. et al., 2019), metastasis, immunosuppression (Li et al., 2019), and chemotherapy resistance (Herrera et al., 2021), thereby exacerbating the prognosis for CRC patients. Despite multiple markers like α-smooth muscle actin (α-SMA/ACTA2) (Lazard et al., 1993), fibroblast activation protein (FAP) (Kraman et al., 2010), and periostin (POSTN) (Komura et al., 2024) have been identified in colon cancer, accurately defining cancer-associated fibroblasts remain challenging due to their marked heterogeneity. Consequently, novel methodologies are imperative to classify cancer-associated fibroblasts more precisely and elucidate their specific roles in tumor development. Single-cell transcriptome analyses have become essential resources for understanding the functional activities and heterogeneity of cancer-associated fibroblasts (Grout et al., 2022; Chen et al., 2020; Puram et al., 2017; Cords et al., 2023). Recent studies have identified cancer-associated fibroblasts-associated genes (CAFs) as biomarkers for risk assessment and clinical prognosis in CRC patients (Herrera et al., 2021; Zhao et al., 2022). It reported that CAFs model exhibited robust predictive capabilities for clinical outcomes and immune responses (Wei et al., 2024). These findings, in conjunction with previous research, underscore the potential of identifying CAFs cohorts as a viable method for assessing the effectiveness of immunotherapy and forecasting the clinical prognosis in CRC.
This study utilized The Cancer Genome Atlas (TCGA) RNA-seq dataset and single-cell RNA-sequencing (scRNA-seq) data to detect differentially expressed CAFs in CRC tumor tissues in comparison to normal. Subsequently, we performed a univariate analysis and utilized the LASSO method to screen prognostic CAFs. These genes were then utilized to generate a signature gene-related risk model for prognosis prediction. The accuracy and reliability of the disease model were assessed in both the TCGA set and a validation set. Additionally, we investigated the correlations between the CAFs expression and infiltration of immune cells in CRC. Our findings offer valuable insights and present an effective approach for investigating the role of CAFs during CRC progression.
MethodsData resourcesscRNA-seq data GSE231559 (Hsu et al., 2023) associated with colorectal cancer were retrieved from the GEO public database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE231559). This dataset included 26 samples, focusing exclusively on colorectal cancer-related tumors and normal control samples. Subsequently, we narrowed down the dataset to a total of 9 samples, consisting of 6 tumor samples and three normal control tissue samples based on the quality and completeness of samples. The detection platform utilized was GPL18573 Illumina NextSeq 500 (Homo sapiens).
Data preprocessing and cluster identificationData quality control was conducted applying the “Seurat” (Yu et al., 2022) (version 4.3.0.1, https://cran.r-project.org/web/packages/Seurat/index.html) in R4.1.2. The standard pre-processing workflow included filtering out cells (such as red blood cells, low-quality cells, doublets, mitochondrial, and ribosomal cells) based on quality control (QC) metrics, removing batch effects, normalizing, and scaling the data, and selecting highly variable features. Subsequently, datasets were merged, and integration anchors were defined using the “FindIntegrationAnchors” function (with reduction = ‘rpca’). The integrated objects were then clustered differentially (using the top 40 principal components with resolution = 0.4) and visualized using the UMAP algorithm. Cell types were annotated using “SingleR” (Aran et al., 2019) (version 2.2.0, http://www.bioconductor.org/packages/release/bioc/html/SingleR.html) and canonical cell marker genes from the update database “CellMarker” (Hu et al., 2023) (http://bio-bigdata.hrbmu.edu.cn/CellMarker/). For each cluster, the top gene was selected to generate a distribution diagram illustrating gene expression within each cluster, along with expression plots displaying top marker genes in each cluster.
Screening differentially expressed genesThe FindAllMarkers function in R4.1.2 Seurat package was used to screen differentially expressed genes (DEGs, threshold: min pct = 0.1, |log fold-change|>1 and p value <0.05) in each cluster. Functional enrichment analysis was carried out by using “clusterProfile” version 4.10.0 (Yu et al., 2012) with a threshold of corrected false discovery rate (FDR) value less than 0.05.
Cellular communication analysisCellchat could infer cell-state-specific signal communication within scRNA-seq profiles through analyzing the expression patterns of ligand-receptor among different clusters (Jin et al., 2021). Here, we utilized “iTALK” (Wang et al., 2019), a computational tool for characterizing and visualizing intercellular communication signals. iTALK categorizes receptor-ligands into four main groups: cytokines, growth factors, immune checkpoints, and others.
TCGA datasetGenomic expression profiles associated with colon and rectal cancer were retrieved from the Xena database (https://xenabrowser.net/datapages/). Normalized data was generated from the Illumina HiSeq 2000 RNA Sequencing platform, comprising 438 tumor samples and 41 normal samples, after matching with corresponding clinical information.
Additionally, the GSE39582 (He et al., 2022) dataset from NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/) serving as a validated set, was processed using the GPL570 Affymetrix Human Genome U133 Plus 2.0 Array platform. GSE39582 includes 585 samples, with 519 CRC samples providing prognostic information.
We analyzed the immune cell ratio in each sample based on the whole genomic expression patterns using GSVA (Ye et al., 2019) (version 1.36.3, http://www.bioconductor.org/packages/release/bioc/html/GSVA.html). The Kruskal–Wallis test was employed to assess the immune cell distribution between the tumor group and normal tissue. Subsequently, focusing on fibroblasts related to cancer progression, we integrated the single-cell results.
Using the “limma” package (Ritchie et al., 2015) (version 3.34.7, https://bioconductor.org/packages/release/bioc/html/limma.html), we identified DEGs in the tumor vs normal group from the TCGA profiles, with thresholds set at FDR<0.05 and |log2FC|>1.
Weighted gene co-expression network constructionFor these TCGA-derived DEGs, we adopt WGCNA package (Langfelder and Horvath, 2008) (version 1.61, https://cran.r-project.org/web/packages/WGCNA/index.html) in R4.1.2 to identify modules associated with CAFs. We selected an appropriate soft threshold to ensure that the constructed gene co-expression network exhibited scale-free properties. The cutHeight parameter was set to 0.995, and we employed a clustering algorithm to delineate distinct gene modules. Each module consists of genes that are highly co-expressed, with high connectivity among the genes within the same module. For each identified module, we computed its eigengene, which is the first principal component of the expression data for all genes in that module. Additionally, we calculated the correlation coefficient between the eigengenes of the modules and the CAFs to assess their association. Subsequently, we compared the WGCNA-screened DEGs with previously identified fibroblast-related DEGs from scRNA data. Genes overlapping between these two cohorts were deemed crucial for disease progression. Functional analysis of these overlapping genes was performed using DAVID version 6.8 (Huang da et al., 2009) online tool (https://david.ncifcrf.gov/) to identify GO processes and KEGG pathways.
A risk score model for CRC prognosis predictionThe screened DEGs underwent univariate analysis using the survival package (version 2.41–1, http://bioconductor.org/packages/survivalr/) (Wang et al., 2016), with a p-value less than 0.05 as threshold. Moreover, the LASSO algorithm regression analysis was employed using the “lars” package (version 1.2, https://cran.r-project.org/web/packages/lars/index.html) (Goeman, 2010) to identify the optimal gene combination. The resulting prognostic coefficients from LASSO were utilized to construct the risk score (RS) model, integrating the target gene expression. The RS formula was defined as follows:
Risk score=esumeach gene′s expression levels*corresponding coefficientesumeach gene′s mean expression levels*corresponding coefficientRS value was computed for each sample in both the TCGA set and GSE39582 validation set, followed by their classification. The association between risk grouping and actual clinical outcomes was evaluated using Kaplan-Meier curves generated.
Analysis of somatic mutations and tumor immune microenvironmentsMutation Annotation Format (MAF) files pertaining to CRC tissue were acquired from TCGA, and the “maftools” (Zhang et al., 2019) (version 2.6.05, https://bioconductor.org/packages/release/bioc/html/maftools.html) package in R4.1.2 was utilized to analyze somatic mutation status of signature genes across two risk groups.
Meanwhile, the “estimate” package (Hu D. et al., 2019) (http://127.0.0.1:29606/library/estimate/html/estimateScore.html) was employed to compute the estimate, immune, stromal scores, and tumor purity for all TCGA samples. Subsequently, the Kruskal–Wallis test was conducted to evaluate the immune cell proportion differences among two risk group, and assess the distribution variance of estimate scores. Furthermore, the correlation of model gene expression and immune cell ratio was analyzed.
Cell cultureThe CC cell lines (HCT116 and HT29 cells) and NCM460 cells were procured from the Cell Bank of the Chinese Academy of Sciences. The HCT116 and HT29 cells were maintained in an incubator set to 37°C and 5% CO2, utilizing McCoy’s 5A medium (Servicebio Technology Co., Ltd., China, G4541-500 ML) containing 10% Fetal Bovine Serum (Gibco, United States, 16,000–044) for growth. Additionally, the NCM460 cells were maintained in RPMI-1640 medium (Servicebio Technology Co., Ltd., China, G4532-500 ML).
qRT-PCR validationTotal RNA was isolated from the cells using an RNA extraction kit (Servicebio Technology Co., Ltd., China, product number G3013). Subsequently, a cDNA synthesis kit (TransGen Biotech, China, catalog number AU341-02) was employed for the reverse transcription process. The relative mRNA expression levels were assessed using the 2−ΔΔCq method, normalizing by the expression levels of GAPDH. The CAFs sequences are presented in Table 1.
Table 1. The CAFs sequences.
Statistical analysisStatistical analyses were conducted through R software (version 4.1.2) and GraphPad Prism 8.0 software. Multiple groups differences were assessed using the One-way ANOVA. P < 0.05 was considered significant.
ResultsClassification of cell group in CRC based on scRNA-seq dataThe scRNA-seq data were extracted from nine samples, comprising 6 tumor samples and three normal tissue samples. Following data quality control (minGene = 500, maxGene = 6,000, pctRibo = 50, mitochondrial-encoded genes <30%), a total of 21,992 cells were obtained, including 5,358 normal cells and 16,634 tumor cells. All cells underwent classification using dimensionality reduction algorithms, with median count RNA and feature gene in a single cell observed to be 4,462 and 1,372, respectively, as detailed in Table 2.
Table 2. Counting of cell counts after quality control.
Cell group classifications were annotated using specific gene markers from the CellMarker database, resulting in the identification of 20 different clusters, including naive T-cells, NK cells, epithelial cells, enterocyte progenitor cells, enteroendocrine cells (EECs), Paneth 1/2 cells, cycling B cells, myeloid cells, Treg cells, LGR5+ stem cells, T follicular helper (Tfh) cells, dendritic cells (DCs), enterocytes, plasma cells, fibroblasts, memory B cells, stromal cells, and plasmacytoid DCs (Figures 1A,B). These results indicated that the cancer cells were highly heterogeneous among patients.
Figure 1. Classification of cell groups in CRC utilizing scRNA-seq dataset. (A) Visualization of cell clusters through UMAP plotting. (B) Assessment of gene marker expression levels across various clusters. (C) GO and KEGG analysis of downregulated genes in fibroblast cluster. (D) GO and KEGG analysis of upregulated genes in fibroblast cluster. (E) Cell communication network of 20 cell clusters. (F) The top 20 highly expressed ligand-receptor interactions among different cell clusters.
A total of 3,306 DEGs were identified from tumor fibroblasts in comparison to normal. Functional enrichment analysis showed that upregulated cohorts were mainly associated with nuclease activity, replication, and repair pathways, while downregulated genes were predominantly involved in fatty acid metabolic processes and amino acid metabolism pathways (Figures 1C,D).
Intercellular communication among the 20 cluster groups was analyzed based on the expression of receptor-ligand pairs. Cellular interactions were evaluated across four major modules: growth factor, cytokine, immune checkpoint, and others. Numerous cellular communication signals were identified within these cluster groups, serving as crucial factors in intercellular communication (Figure 1E). The top 20 genes related to cellular communication in each cluster were depicted in Figure 1F. Of note, our focus lay on the communicational signal alterations in cancer-related fibroblasts. We observed that fibroblasts interacted with Tfh cells, myeloid cells, and EECs through ligand-receptor pairs of BMPR2-RGMA, BMPR2-BMP8B, and BMPR2-GDF7. Additionally, fibroblasts could interact with B cells involved in disease progression through the ligand-receptor pairs of CAMP-P2RX7 (Figure 1F).
Screening CAFs-associated genes in the TCGA and scRNA datasetIn the TCGA data, we assessed the immune landscape of CRC samples, wherein the immune cell proportion in tumor tissues was evaluated according to gene expression level. A comprehensive screening process identified 23 immune cell types with notable distribution changes in tumor samples compared to the normal group (Figure 2A), such as, fibroblasts, memory B cells, CD56bright NK cells, activated CD4 T cells, NK cells, immature DCs, and others. Notably, fibroblast proportion was significantly increased in tumor tissues, suggesting its prominent association with disease progression (Figure 2B). Leveraging the TCGA expression profile dataset, we utilized the limma package to screen 1,499 DEGs meeting threshold conditions in the tumor vs normal group (Figure 2C).
Figure 2. Identification of CAFs in CRC samples from TCGA and scRNA dataset. (A) The immune cell distribution in tumor and control group in TCGA database. (B) Comparison of fibroblast distribution between tumor and control groups in TCGA database. (C) The DEGs between tumor and control samples. (D) Selection plot of adjacency matrix weight parameter power (left) and mean connectivity as a function of the soft-threshold power (right). (E) Cluster dendrogram of the network modules. (F) Relationship of gene modules and cancer-associated fibroblast. (G) Venn diagram of the overlapped DEGs linked to CAFs. (H) GO terms and KEGG pathways enriched by overlapping DEGs. ***P < 0.001, NS, not significant.
To pinpoint genes within modules that correlate with CAFs, we analysed 1,499 DEGs using the WGCNA algorithm. To achieve a scale-free network distribution, we explored the parameter value of the adjacency matrix weight, selecting the value of weight power = 8 (Figure 2D). By calculating the dissimilarity coefficient among gene nodes, we obtained a systematic clustering tree (Figure 2E). With a pruning height set at cutHeight = 0.995, we identified seven different modules, different colors represent different modules. We then generated heatmaps to evaluate the correlation between these modules and disease traits, employing the Spearman correlation coefficient (Figure 2F). Heatmap colors indicate the strength of the correlation: red for positive, blue for negative, with deeper hues denoting stronger associations. Focusing on modules with significant associations to CAFs expression within tumors, we identified five modules (blue, brown, green, red, and turquoise) where the correlation values with CAFs exceeded 0.5 (blue module: r = −0.55, P = 3e−117; brown module: r = 0.63, P = 3e−163; green module: r = 0.67, P = 5e−194; red module: r = −0.68, P = 5e−203; and turquoise module: r = −0.52, P = 6e−105). This indicates that 950 genes within these modules are linked to CAFs and the progression of CRC. Notably, while the MEgrey and MEyellow modules exhibited strong correlations with tumor tissue, their correlations with CAFs were comparatively low. This suggests that the gene expression patterns in these modules may not be specific to CAFs but rather more generally associated with tumor tissue characteristics. Consequently, we chose to exclude these modules from further analysis.
Comparison of these modules’ screened DEGs with the previously identified scRNA fibroblast-related DEGs yielded 131 overlapping genes for subsequent analyses (Figure 2G). These candidate genes were associated with 14 GO terms and 7 KEGG pathways, encompassing functions such as B cell proliferation related to immune response, cell adhesion molecule, signal transduction, B cell receptor signal, and peroxisome (Figure 2H).
Construction and verification of CAFs risk modelUnivariate analysis results showed 18 genes exhibited prognostic significance values (Figure 3A). Further LASSO analysis revealed that an optimal combination comprising eight DEGs, namely, CD177, RAB36, ACSL6, PBX4, CLDN11, PLIN1, CCDC78, and KCNJ14, was determined (Figure 3B). Subsequently, the RS model was established based on the formula: RS = (−0.02774 ∗ ExpRAB36) + (−0.00025 ∗ ExpCD177) + (−0.00136 ∗ ExpACSL6) + (0.011975 ∗ ExpPBX4) + (0.014539 ∗ ExpCLDN11) + (0.031381 ∗ ExpPLIN1) + (0.039063 ∗ ExpCCDC78) + (0.078007 ∗ ExpKCNJ14).
Figure 3. Establishment and validation of CAFs prognostic model. (A) The 18 prognostic CAFs by univariate analysis. (B) The lambda values and the associated mean squared error by LASSO analysis. (C) The correlation between the eight gene expression and OS in CRC patients within the TCGA dataset. (D) Kaplan-Meier survival curves, survival status and ROC curves analysis in the TCGA cohort. (E) Kaplan-Meier survival curves, survival status and ROC curves analysis in the GSE39582 cohort.
Samples could be stratified into high- and low-expressed groups according to each gene expression. Kaplan-Meier curve analysis revealed a significant correlation between signature gene expression and patient survival times (Figure 3C).
To assess the predictive capacity of the RS model, samples from TCGA sets and GSE39582 datasets were classified into different risk groups. The association of RS status and survival times of patients were depicted (Figures 3D,E). In the TCGA training set, patients in the low-risk group exhibited a more favorable prognosis compared to individuals in the high-risk group (Figure 3D, left). The AUC values for the ROC curve at 1-, 3-, and 5-years were 0.883, 0.874, and 0.857, demonstrating the high accuracy and efficacy of the risk model (Figure 3D, right). Similarly, the performance of the RS model was verified in the GSE39582 dataset, and AUC values for 1-, 3-, and 5-years were 0.854, 0.803, and 0.805 (Figure 3E, right).
Somatic variation of eight CAFs in TCGA-CRC cohortsThe somatic mutation profiles of tumor samples were obtained from TCGA-CRC cohorts, and the mutational signatures of eight CAFs were visualized among different groups. Our results demonstrated consistent mutation patterns across the eight genes in both groups (Figure 4). Missense mutations were predominant, with single nucleotide polymorphisms occurring more frequently than deletions (Figure 4). Additionally, C>T was identified as the most frequent single nucleotide variant (SNV) in all samples (Figures 4A,B). The mutation profiles of eight CAFs in CRC were further depicted, showing ranked percentages of mutations. Notably, ACSL6 mutations were most prevalent in the low-risk group (23%, Figure 4A), whereas RAB36 mutations accounted for the highest proportion in high-risk group samples (32%, Figure 4B).
Figure 4. Mutation feature of eight genes in the low-risk group (A) and high-risk (B) group samples in the TCGA set.
Association between model gene expression and immune cellThe immune cell proportion was evaluated across different risk group samples using the TCGA dataset. Notably, 8 cell types exhibited significantly different ratios, including activated B cells, neutrophils, central memory CD8 T cells, activated CD4 T cells, regulatory T cells, immature dendritic cells, gamma delta-T cells, and eosinophils (P < 0.05, Figure 5A). Additionally, the association between the RS and tumor indicators were explored (Figure 5B). The analysis revealed significant differences among two groups in the tumor purity (P = 0.007812), estimate score (P = 0.036983), stromal score (P = 0.007996), and immune score (P = 0.006668).
Figure 5. Association of risk model gene expression and tumor immunity in the TCGA cohort. (A) Immune cell infiltration disparity across different risk groups. (B) Differences of tumor indicator between different group patients. (C) Correlation-ship significance between risk model gene and immune cell, and ESTIMATE score. (D) Relationship between model candidate genes and fibroblast proportion. (E) Expression patterns of eight model genes in scRNA-seq derived fibroblasts from normal and tumor samples. *P < 0.05, **P < 0.01, ***P < 0.001.
Furthermore, there was a significant correlation between the most signature gene expression and tumor indicators (P < 0.05, Figure 5C). Specifically, activated B cell was positively related to PLIN1 and CLDN11 expression, but negatively correlated with ACSL6 expression. Regulatory T cell proportion was positively related to PLIN1, CLDN11, and CD177 expression, while negatively associated with PBX4 and KCNJ14 expression (P < 0.05). The proportion of activated CD4 T cells was negatively correlated with CD177, PLIN1, CLDN11, and KCNJ14 expression (P < 0.05).
Additionally, most signature genes showed correlations with fibroblast proportions in TCGA samples (Figure 5D). Furthermore, the model gene expression was validated in scRNA fibroblasts. Consistent with the previous expression analysis, aberrant expression levels of the eight signature genes were observed in CRC samples (Figure 5E).
Identification of prognostic CAFs expressionqRT-PCR results indicated that the expression level of RAB36 was significantly downregulated in the HCT116 and HT29 cell lines compared to the NCM460 cells. Conversely, CD177, PBX4, and CCDC78 were upregulated in the HCT116 and HT29 cell lines, and ACSL6 and KCNJ14 only in HCT116 cells (P < 0.05) (Figure 6). The expression trend of CD177 and CCDC78 was consistent with the results in Figure 5E. Therefore, these prognostic CAFs may be potential therapeutic targets for CRC.
Figure 6. Identification of the mRNA expression of the eight prognostic CAFs in NCM460 cells and CRC cell lines. *P < 0.05, **P < 0.01, ***P < 0.001.
DiscussionThis study entailed the characterization of 131 CAFs, which exhibited differential expression levels between CRC samples and normal tissues. This characterization was achieved through an integrative analysis of TCGA-CRC data and scRNA-seq profiles. These genes demonstrated associations with various biological processes, including B cell proliferation related to immunological response, B cell receptor signal, signal transduction, cell adhesion molecule, and peroxisome. Utilizing univariate regression analysis and LASSO analysis, we identified eight signature genes (CD177, RAB36, ACSL6, PBX4, CLDN11, PLIN1, CCDC78, and KCNJ14) that displayed significant prognostic value in TCGA-CRC patients. Subsequently, we generated a CAFs risk model and validated it using the TCGA set and GSE39582 cohort.
The intricate interplay within the TME significantly impacts cancer progression and therapy resistance. As key constituents of the TME, CAFs actively engage with tumor cells and other TME components, orchestrating various TME activities. Cellular communication results showed CAFs interacted with several other cell types (Tfh cell, B cell, myeloid cells, and enteroendocrine cells) through ligand-receptor signaling to regulate CRC progression. As for these cell clusters, Tfh cells primarily provide the required support to B cell for antibody-mediated immune response. In numerous solid organ tumors of non-lymphocytic origin, an increased ratio of Tfh cells is frequently linked to a more favorable prognosis (Gutiérrez-Melo and Baumjohann, 2023). In the TME, Tfh cells predominantly produce IL-21 cytokines, which facilitate humoral responses through the stimulation of B cell activation, class-switch recombination, and the secretion of anti-tumor IgG1 and IgG3 (Hollern et al., 2019). EECs are derived from the pluripotent stem cells in the gastrointestinal tract, and the changes in the composition and function could affect digestive physiology, potentially correlating with gastrointestinal pathologies (Gunawardene et al., 2011). The cellular compartment of EECs within the normal pancreas expands during early tumorigenesis but diminishes considerably with disease progression, as evidenced by a significant decrease in their proportion as lesions advance (Caplan et al., 2022).
We identified eight prognostic genes by qRT-PCR. The expression level of RAB36 was significantly downregulated in the HCT116 and HT29 cell lines compared to the NCM460 cells. Conversely, CD177, PBX4, and CCDC78 were upregulated in the HCT116 and HT29 cell lines, and ACSL6 and KCNJ14 only in HCT116 cells (P < 0.05). The expression trends of CD177 and CCDC78 were consistent with our predicted results. Therefore, these prognostic CAFs may be potential therapeutic targets for CRC. But the results need further verification. RAB36 is a member of the RAS oncogene family. RAB36 has been implicated in promoting CRC progression and invasion, while its knockdown in cancer cells resulted in reduced metastatic potential (Zhu et al., 2018). Moreover, RAB36 was identified as a target of the oncogenic protein HuR in CRC, and circPPFIA1s could inhibit liver metastasis through modulation of the HuR/RAB36 and miR-155-5p/CDX1 pathways (Ji et al., 2022). The proteins ACSL6 and PLIN1 play distinct roles in the lipid metabolism in CRC cells, supporting their growth and survival. PLIN1 or perilipin one modulates lipid storage within lipid droplets (LDs) and serves as a crucial modulator of human lipid metabolism (Desgrouas et al., 2024). Dysregulation of lipid metabolism is a notable feature of various cancers, and PLIN1 expression has been linked to disease outcomes in multiple cancers (Bombarda-Rocha et al., 2023; Zhang et al., 2021; Straub et al., 2019). ACSL6 regulates lipid synthesis and degradation processes by catalyzing the long-chain fatty acids to transform into the active form, for subsequent beta-oxidation (Fedorchuk et al., 2020). Remarkably, ACSL6 expression is typically diminished in various cancer types, yet it is notably elevated in CRC, where its overexpression is linked to increased cellular proliferation and elevated levels of glycolytic products (Parsazad et al., 2023). Notably, ASCL6 mutation is more common in the low-risk group (23%), while also ranking second in the high-risk group (14%), suggesting that ASCL6 mutations may be related to the biology of CRC, rather than merely being a marker of risk stratification. In the future, we will perform ASCL6 overexpression experiments to determine whether its increased expression correlates with the advancement of CRC.
CD177, predominantly expressed in neutrophils, serves as a valuable biomarker for myeloproliferative diseases (Kissel et al., 2001). CD177 mRNA expression and CD177+ neutrophils prevalence is notably higher in CRC tissues compared to controls (Zhou et al., 2018). CD177 expression is associated with a better prognosis in CRC (Dalerba et al., 2011). Tumor-expressed CD177 exerts tumor-suppressive functions by regulating β-catenin activation (Kluz et al., 2020). Moreover, CD177 influences the function and homeostasis of tumor-infiltrated Treg cells, as demonstrated by reduced tumor growth and decreased tumor-infiltrated Treg frequency upon Treg-specific deletion of CD177 in mice (Kim et al., 2021). Our results showed that CD177 expression was positively correlated to stromal, immune, and estimated scores while exhibiting a negative correlation with the proportion of CAFs. Elevated CD177 levels in CRC patients are associated with a better prognosis, suggesting its potential role in prognosis prediction and immunological regulation in CRC patients.
Moreover, CCDC78 is predominantly expressed in skeletal muscle and is implicated in a unique autosomal-dominant congenital myopathy resulting from mutations (Majczenko et al., 2012). CCDC78 is identified as a prognosis biomarker in colon cancer through the utilization of a prediction-scoring model (Yang et al., 2019). The PBX homologue PBX1-4 regulate haematopoiesis, primarily by interacting with the oncogenic factor HOX, and serving as HOX cofactors (Song and Ma, 2022). PBX4 has been implicated as a potential novel onco-promoter in CRC, as evidenced by its overexpression, which increases cancer cell proliferation and upregulates the expression of markers of epithelial-mesenchymal transition (EMT) and angiogenesis (Martinou et al., 2022). A correlation analysis data revealed that increased PBX4 significantly affects the infiltration of immune cells (Chao and Zhang, 2023). Similarly, we also found PBX4 overexpression in CRC was significantly correlated to infiltrated central memory CD8+ T cells, immature DCs, and activated B cells, further confirming the oncogenic and immune regulatory effect of PBX4 in CRC progression.
Tumor-infiltrating immune cells are recruited to establish a pro-inflammatory microenvironment that fosters cancer progression. Immune cell infiltration has emerged as a prognostic marker for CRC, with CD4+ and CD8+ T cells being particularly favorable prognostic factors, correlating with chemotherapy and immunotherapy sensitivity. Enhanced infiltration of CD4+ T cells contributes to the inhibitory immune microenvironment, thereby leading to a poor prognosis in CRC patients. Comprehensive correlation analyses between multiple prognostic factors and the risk model were performed, and our results demonstrated the CAFs-based prognostic model was significantly correlated with eight immune cell proportion and immune estimate indicators. Moreover, the KCNJ14 expression is positively correlated with CD4 + T cell proportion, and increased KCNJ14 level can lead to poor prognosis in CRCs. KCNJ14 is a type of ATP-sensitive inward rectifier potassium (K+) channels (Töpert et al., 2000). KCNJ14 exhibits abnormal upregulation in CRC and is associated with poor prognosis in CRC patients (Li et al., 2022). KCNJ14 deletion could significantly inhibit colorectal cancer cell growth and migration.
In this research, we developed a CAFs prognostic model and leveraged the integration of single-cell sequencing data with bulk RNA-seq data to assess disease progression and prognostic risks in individuals with CRC. Furthermore, our model enables the stratification of patients into high- and low-risk categories and evaluates their potential responses to immunotherapy, which is essential for crafting tailored treatment strategies and follow-up schedules. Additionally, our study has led to the discovery of novel CRC biomarkers, which can help to identify high-risk patient groups earlier, providing the possibility for early intervention and treatment. To summarize, our study presents unique advantages in terms of prognostic assessment, risk categorization, prediction of immunotherapy responses, and the identification of new biomarkers for CRC, which brings new perspectives and methods for the clinical treatment of CRC.
However, limitations in this study should not be ignored. Firstly, the sample size, derived from a public database, was inadequate, potentially introducing bias into our results. To counteract these limitations, future studies should involve a larger and more diverse cohort of patients. This will be essential for validating the precision and broader applicability of our prognostic model. Moreover, although our research has identified the prognostic significance of CAFs model in CRC, further experimental validation is warranted. To gain a more profound understanding of how these prognostic genes influence CRC progression and outcomes, we intend to undertake a suite of in vivo and in vitro experiments. Conducting gain-of-function or loss-of-function studies will be crucial for assessing the biological relevance of the identified signature genes. Lastly, the interplay between prognostic genes and immune cell infiltration is an area that requires further elucidation. We plan to employ flow cytometry in the future to determine the distribution of these prognostic genes in immune cells in vivo after inhibition or overexpression, especially fibroblasts.
Data availability statementThe original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.
Author contributionsWJ: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Validation, Writing–original draft, Writing–review and editing. YL: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Validation, Writing–original draft, Writing–review and editing. JL: Data curation, Formal Analysis, Investigation, Project administration, Supervision, Writing–original draft. ZW: Conceptualization, Funding acquisition, Methodology, Supervision, Writing–review and editing. YxY: Data curation, Investigation, Resources, Validation, Writing–review and editing. BL: Data curation, Investigation, Resources, Validation, Writing–review and editing. SQ: Data curation, Investigation, Resources, Validation, Writing–review and editing. JN: Data curation, Investigation, Resources, Writing–review and editing. YhY: Data curation, Investigation, Validation, Writing–review and editing. SH: Data curation, Investigation, Writing–review and editing. CH: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing–original draft, Writing–review and editing. HY: Methodology, Project administration, Resources, Supervision, Validation, Writing–original draft, Writing–review and editing, Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation.
FundingThe author(s) declare that financial support was received for the research, authorship, and/or publication of this article. Natural Science Foundation of China (NO. 82274531), Yueyang Hospital’s “Revealing the Leader” Translational Medicine Research Fund in 2024(NO.2024YJJB09), and Clinical Effificacy Observation of Lianbai Enema Formula in the Treatment of Active Ulcerative Colitis and Related Transformative Research (No. 2024YJJB09).
Conflict of interestThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s noteAll claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
AbbreviationsCAFs: cancer-associated fibroblasts-related genes; CRC: colorectal cancer; DC: dendritic cell; DEGs: differentially expressed genes; ECM: extracellular matrix; EMT: epithelial-mesenchymal transition; GEO: Gene Expression Omnibus; HOX: Homeobox proteins; HuR: Hu antigen R; LASSO: Least Absolute Shrinkage and Selection Operator; scRNA-seq: single-cell RNA sequencing; SNV: single nucleotide variants; TCGA: The Cancer Genome Atlas Stomach Adenocarcinoma; WGCNA: weighted gene co-expression network analysis.
ReferencesAran, D., Looney, A. P., Liu, L., Wu, E., Fong, V., Hsu, A., et al. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163–172. doi:10.1038/s41590-018-0276-y
PubMed Abstract | CrossRef Full Text | Google Scholar
Bando, H., Ohtsu, A., and Yoshino, T. (2023). Therapeutic landscape and future direction of metastatic colorectal cancer. Nat. Rev. Gastroenterol. Hepatol. 20, 306–322. doi:10.1038/s41575-022-00736-1
PubMed Abstract | CrossRef Full Text | Google Scholar
Bhattacharjee, S., Hamberger, F., Ravichandra, A., Miller, M., Nair, A., Affo, S., et al. (2021). Tumor restriction by type I collagen opposes tumor-promoting effects of cancer-associated fibroblasts. J. Clin. Invest. 131, e146987. doi:10.1172/JCI146987
PubMed Abstract | CrossRef Full Text | Google Scholar
Bombarda-Rocha, V., Silva, D., Badr-Eddine, A., Nogueira, P., Gonçalves, J., and Fresco, P. (2023). Challenges in pharmacological intervention in perilipins (PLINs) to modulate lipid droplet dynamics in obesity and cancer. Cancers (Basel) 15, 4013. doi:10.3390/cancers15154013
PubMed Abstract | CrossRef Full Text | Google Scholar
Bruzzese, F., Hägglöf, C., Leone, A., Sjöberg, E., Roca, M. S., Kiflemariam, S., et al. (2014). Local and systemic protumorigenic effects of cancer-associated fibroblast-derived GDF15. Cancer Res. 74, 3408–3417. doi:10.1158/0008-5472.CAN-13-2259
PubMed Abstract | CrossRef Full Text | Google Scholar
Calon, A., Espinet, E., Palomo-Ponce, S., Tauriello, D. V. F., Iglesias, M., Céspedes, M. V., et al. (2012). Dependency of colorectal cancer on a TGF-β-driven program in stromal cells for metastasis initiation. Cancer Cell 22, 571–584. doi:10.1016/j.ccr.2012.08.013
PubMed Abstract | CrossRef Full Text | Google Scholar
Caplan, L. R., Vavinskaya, V., Gelikman, D. G., Jyotsana, N., Trinh, V. Q., Olive, K. P., et al. (2022). Enteroendocrine cell formation is an early event in pancreatic tumorigenesis. Front. Physiol. 13, 865452. doi:10.3389/fphys.2022.865452
PubMed Abstract | CrossRef Full Text | Google Scholar
Chao, G., and Zhang, L. (2023). Correlation analysis of PBX family with immune invasion and drug sensitivity in colon adenocarcinoma. Heliyon 9, e17220. doi:10.1016/j.heliyon.2023.e17220
PubMed Abstract | CrossRef Full Text | Google Scholar
Chen, Y., Kim, J., Yang, S., Wang, H., Wu, C. J., Sugimoto, H., et al. (2021). Type I collagen deletion in αSMA(+) myofibroblasts augments immune suppression and accelerates progression of pancreatic cancer. Cancer Cell 39, 548–565.e6. doi:10.1016/j.ccell.2021.02.007
PubMed Abstract | CrossRef Full Text | Google Scholar
Chen, Z., Zhou, L., Liu, L., Hou, Y., Xiong, M., Yang, Y., et al. (2020). Single-cell RNA sequencing highlights the role of inflammatory cancer-associated fibroblasts in bladder urothelial carcinoma. Nat. Commun. 11, 5077.
留言 (0)