The overall study flow scheme is depicted in Fig. 1, which shows the methodology used in this study. Single-cell sequencing data from multiple datasets, such as GSE168652 (one pair of cervical tumor tissues matched with their adjacent normal tissue data), E-MTAB-11,948 (three cervical tumors and three normal adjacent tissues), PRJCA008573 (eight cervical tumors, three normal cervical squamous tissues from the ectocervix, and two high squamous intraepithelial lesions), and E-MTAB-12,305 (two high squamous intraepithelial lesions) underwent quality control and normalization (Table S1). From the initial collection of 215,322 cells, 194,993 met the stringent quality control criteria. After global annotation, cells were categorized into seven distinct cell types (Fig. 2A). The optimal marker gene selection for cell type discrimination was demonstrated in Fig. 2B. The number and the proportion of distinct cell types were listed as follows: epithelial cells (72,226 cells, 37.04%, marked with EPCAM, KRT5, and KRT14), endothelial cells (23,877 cells, 12.25%, marked with PECAM1, SELE, and ACKR1), fibroblasts (54,342 cells, 27.87%, marked with PDGFRA, COL1A1, TAGLN, ACTA2, and RGS5), T/NK cells (28,895 cells, 14.82%, marked with CD3D, CD3E, and NKG7), B cells (3,190 cells, 1.64%, marked with CD79A, CD79B, and CD19), Myeloid cells (11,060 cells, 5.67%, marked with CD163, CD14, FCGR3A, and LYZ), and mast cells (1,403 cells, 0.72%, marked with MS4A2, TPSB2, and TPSAB1). Identified cell clusters grouped by different datasets were further visualized using UMAP (Fig. 2C), and identified cell clusters grouped by disease stage (normal tissue,75,805 cells, 38.88%; precancerous lesions, 30,120 cells, 15.45%; and tumor tissues, 89,068 cells, 45.68%) were also visualized with UMAP (Fig. 2D). Tumor samples exhibited a distinct cell lineage distribution from normal samples, with an increase in epithelial cells in the tumor samples, and a decrease in fibroblast cells (Fig. 2E). Furthermore, by considering the interdependence among subsets, we used the Bayesian scCODA model to examine alterations in the distribution of the seven cell types among the groups. The tumor samples exhibited a significant increase in the number of epithelial and B cells. Conversely, tumor samples showed a marked decline in fibroblast and endothelial cell counts. (Fig. 2F and Table S2).
Fig. 1Study workflow. The collection of single-cell sequencing data from multiple datasets, including GSE168652, E-MTAB-11,948, PRJCA008573, and E-MTAB-12,305, with the last validation set. After data quality control, it is subjected to normalization and clustering for unique cell population control. Annotations are performed by global and subcluster levels within the datasets. Focused on subcluster levels, the analysis was performed using specialized computational tools (scCODA, scCODE, and GSVA). For epithelial cells (Epi), inferCNV is applied to infer copy number variations (CNV), and trajectory analysis in conjunction with ATAC-seq is used to elucidate the gene regulatory trajectories and chromatin accessibility patterns
Fig. 2Single-cell landscape of cervical squamous cell carcinoma (CSCC). (A) All-identified cell clusters using the Uniform manifold approximation and projection (UMAP) method. Cell types are represented by different colors. (B) Heatmap showing expression levels of specific markers in each cell cluster. (C) Identified cell clusters are visualized using UMAP, grouped by dataset. (D) UMAP visualization of all identified cell clusters grouped by disease stage. (E) Cell type distribution in each sample. (F) Boxplots displaying the proportions of cell types at different disease stages, with significance determined by scCODA (P < 0.05)
Single-Cell Expression Profile Characterization for Lymphocytes Across Disease Stage GroupsTo investigate the distribution and functional maps of B cells in CSCC, we isolated 3190 B cells. Notably, we found that B cells were significantly enriched in tumor tissue compared to normal tissue, and in the precancerous state. This suggests that subsequent B cell responses may be influenced by the tumor microenvironment (TME). Within the TME, mature B cells may process and present antigens, thus stimulating T cell activation. Additionally, they secrete cytokines such as IL-10 and TGF-β, which have anti-inflammatory effects in normal and tumor tissues, respectively. At the intersection of these three DEG sets, eight genes were identified as consensus upregulated DEGs: HLA-DPA1, IGHA2, CD52, GNLY, LTB, TRBC2, GZMA, and HLA-DRB1. Two genes, LINC-PINT and SNHG5, were downregulated. In the progression of the disease, we observed that certain apoptosis-related pathways were activated, such as role of mitochondria in apoptotic signaling, apoptotic DNA fragmentation and tissue homeostasis, SODD/TNFR1 Signaling, stress-induced HSP regulation and intrinsic prothrombin activation pathway, while other apoptosis-related pathways were inhibited, such as RB tumor suppressor/checkpoint signaling in response to DNA damage, p53-dependent apoptosis in ontogenically transformed cells, actions of Nitric Oxide and Erk1/Erk2 MAPK signaling pathway (Fig. 3A and B). This suggests a complex regulation of apoptosis during disease progression, where different pathways may be differentially regulated depending on the context and stage of the disease. It was worth noting that consensus upregulated B cell gene signature could be utilized to predict the survival of CSCC (Fig. S1).
Fig. 3T and natural killer (NK) cell characteristics during cervical squamous cell carcinoma (CSCC) progression. (A) Venn diagram presenting the specific and common upregulated differentially expressed genes (DEGs) in B cells from normal, precancerous lesions, and tumor tissues (left panel). Heatmap indicating the expression level variations of upregulated genes in the Pathways for the three groups (right panel). (B) Venn diagram showing the specific and common downregulated DEGs in B cells from normal, precancerous lesions, and tumor tissues (left panel). Heatmap indicating the changes in the expression levels of downregulated genes in the Pathways pathway for the three groups (right panel). (C) UMAP visualization displaying all NK and T cells. Different colors represent cell types. (D) Heatmap showing the expression levels of specific markers in the T and NK cell subclusters. (E) Boxplots showing the proportions of T/NK cell subclusters at different disease stages, with significance determined using scCODA (P < 0.05). (F) Heatmap showing the expression level variations of CD4-C2-Treg, CD8-C4-Tem-PDCD1, and CD8-C5-temra in the Pathways for different disease stages. (G) Venn diagram showing the specific and common DEGs of CD4-C2-Tregs, CD8-C4-Tem-PDCD1, and CD8-C5-temra at different disease stages
T and NK cells also play crucial roles in the TME. Of the 25,244 T/NK cells, 55.04% (13,895 cells) were found in tumor tissues. NK cells are categorized into two subgroups: naive NK cells that highly express NCAM1 and GNLY, and effector NK cells that highly express GZMA, GZMB, NKG7, FCGR3A, and FGFBP2. T cells, characterized by CD3D, CD4, and CD8A, are further classified into nine subclusters: three CD4 T subclusters, including CD4-C1 containing Central Memory CD4 T (Tcm) cells that highly express CCR7, ANXA1, and GPR183; CD4-C2 composed of Regulatory CD4 T (Treg) cells that highly express LEF1, SELL, FOXP3, and CTLA4; and CD4-C3 composed of exhausted CD4 T (Tex) cells that highly express PDCD1 and HAVCR2. CD8 cells are segregated into six subclusters: CD8-C1 with Central Tcm cells highly express CD8A, ANXA1, and GPR183; CD8-C2 containing tissue-resident memory CD8 T (Trm) cells that highly express CD8A, CD69, and ITGAE; CD8-C3 incorporating Resident Memory CD8 T (Trm) cells with effector functions that highly express CD8A, CD69, ITGAE, GZMA, and GZMB; CD8-C4 composed of Effector Memory CD8 T (Tem) cells that highly express PDCD1:CD8A, and PDCD1; CD8-C5 consisting of terminally differentiated effector memory or effector CD8 T (Temra) cells that highly express CD8A, KLRB1, and CD160; and CD8-C6 including Effector Memory CD8 T (Temra) cells that highly express MKi67:CD8A, and MIKI67 (Fig. 3C and D).
Within tumor tissues, there was a high enrichment of CD4 Treg, CD8-Tem-PDCD1, CD8-Temra, and CD4 + Tex cells (Fig. 3E and Table S3). We conducted DEG and gene set variation analysis (GSVA) for the CD4-C2-Treg, CD8-C4-Tem-PDCD1, and CD8-C5-Temra groups. The commonly affected signaling pathways across these groups included the BIOCARTA_MHC_PATHWAY (upregulated), BIOCARTA_GCR_PATHWAY (downregulated), and BIOCARTA_CARM1_PATHWAY (downregulated). The major histocompatibility complex (MHC) pathway manages antigenic peptide recognition and presentation, activates T cells, and initiates immune responses. The glucocorticoid receptor signaling (GCR) pathway is a nuclear receptor signaling pathway primarily involved in endocrine regulation, metabolism-related diseases, and endocrine tumor associations. Similarly, the BIOCARTA_CARM1_PATHWAY is a nuclear receptor signaling pathway mainly responsible for endocrine regulation, metabolic diseases, and endocrine tumor associations.
Both CD8-C5-Temra and CD8-C4-Tem-PDCD1 were enriched in inflammation, apoptosis, and cell proliferation-related pathways. CD4-C2-Treg was associated with T and B cell development, as well as cytoskeleton and cellular movement-related pathways. CD8-C5-Temra is linked with mRNA translation, Th2 cell cytokine gene expression, and macrophage formation. CD8-C4-Tem-PDCD1 is associated with the complement pathway, hypoxic stress, and cancer cells migration and invasion (Fig. 3F). Forty-three genes were identified in MHC, GCR, and CARM1 pathways. Furthermore, when merging the upregulated genes of CD4-C2-Treg, CD8-C5-Temra, and CD8-C4-Tem-PDCD1, nine intersected the GCR /MHC/CARM1 pathway. Among the genes downregulated in these groups, three intersected with the GCR /MHC/CARM1 pathway (Fig. 3G and Table S4).
Single-Cell Expression Profile Characterization for Myeloid Cells Across Disease Stage GroupsIn CSCC, myeloid cells are a pivotal component of the immune cells that penetrate tumors and regulate tumor inflammation and angiogenesis. Of the 9,190 myeloid cells analyzed, 4,744 were normal (51.62%), 2,706 were precancerous (29.45%), and 1,740 were malignant, accounting for 18.93% of the total. Four main lineages, namely mast cells, plasmacytoid dendritic cells (pDCs), conventional dendritic cells (cDCs), and monocytes or macrophages, were distinguished within the TME using standard cellular markers. cDCs and groups encompassing monocytes or macrophages can be further subdivided. Specifically, in the cDC classification, we identified two traditional cDC subtypes (CD1C cDC1s and CLEC9A cDC2s) and one mature cDC subtype (LAMP3 cDC). Further clustering of the monocytes/macrophages yielded a monocyte subgroup (Mono-CD14) CD16 (FCGR3A), and four macrophage groups. Conventional macrophages were differentiated based on surface marker expression of C1QC, INHBA, FN1, and SPP1 (Fig. 4A and B). To determine which of these subclusters was significant in the progression trajectory, we conducted a single-cell composition analysis of deformed cells (scCODA) across various disease stages. An increased proportion of FN1 + macrophage (as labeled in Fig. 4) was observed in different disease statuses from normal stage, precancerous stage, to tumor stage. While decreased proportion of Mast cells was observed in different disease statuses from tumor stage, precancerous stage, to normal stage (Fig. 4C and Table S5). Using GSVA, we enriched and compared divergent pathways. Mast cells were predominantly enriched in inflammatory response, angiogenesis, cellular activation, and proliferation-related pathways. In contrast, Mac-C3-FN1 was mainly enriched in chromosomal damage, cell cycle apoptosis, and nuclear receptor hormone-associated pathways (Fig. 4D and E).
Fig. 4Myeloid cell characteristics during cervical squamous cell carcinoma (CSCC) progression. (A) UMAP visualization displaying all myeloid cells. Different colors represent cell types. (B) Heatmap illustrating the expression levels of particular indicators within subclusters of myeloid cells. (C) Boxplots illustrating the proportions of myeloid cell sub-clusters for different disease stages, with significance determined using scCODA (P < 0.05). (D) Heatmap displaying the changes in the expression levels of Mac-C3-FN1 in the Pathways across various disease stages. (E) Heatmap displaying the changes in the expression levels of mast cells in the Pathways across various disease stages
Single-Cell Expression Profile Characterization for Fibroblast Cells Across Disease Stage GroupsFibroblasts can be broadly categorized into three main types: Normal fibroblasts (FB-CLU-matrix), characterized by the marker CLU, NEAT1, and BASP1, which maintains tissue structural integrity and participates in the wound repair process in normal tissues; and myofibroblasts that are further divided into two subtypes: MyoFB-C1-matrix which is identified by ATAC2, MYH11, TAGLN, PDGFRB, RGS5, and NOTCH3, and MyoFB-C2-activated by ATAC2, MYH11, TAGLN, THY1, PDGFRB, RGS5, and NOTCH3. These factors influence tumor cell migration and invasion. Cancer-associated fibroblasts (CAF) are divided into five subclusters: ACTA2 with markers ATAC2, MYH11, and TAGLN; OGN with markers CLU, NEAT1, BASP1, GADD45G, GAS1, and OGN; OGN/P16 with markers CLU, NEAT1, BASP1, GADD45G, GAS1, OGN, and P16; IGF1 with markers COL1A1, COL1A2, DCN, IGF1, APOD, ABCA8, and NRG1; and MKI67 with MKI67, F3 markers (Fig. 5A and B). The MKI67 subcluster proportions increased substantially with a concurrent decline in the OGN and OGN/P16 subclusters (Fig. 5C and Table S6). Subsequent DEG and KEGG pathway analyses indicated that alterations in the MKI67 sub-cluster were associated with cell cycle regulation, cell adhesion, immune responses, hormonal regulation, and HPV infection (Fig. 5D and E, and Fig. 5F).
Fig. 5Fibroblast cell characteristics during cervical squamous cell carcinoma (CSCC) progression. (A) UMAP visualization of fibroblast cells with various colors indicating different cell types. (B) Heatmap presenting the cell type marker expression levels in the fibroblast subclusters. (C) Boxplots delineating the proportions of fibroblast subclusters among different disease stage groups, significance determined by scCODA (P < 0.05). (D) Venn diagram illustrating the specific and shared differentially expressed genes (DEGs) of the CAF-C5-MKi67 subcluster across the normal, precancerous lesion, and tumor stages. (E) Heatmap displaying the differential expression levels of CAF-C5-MKi67 in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database across the three disease stages. (F) Cell cycle pathway depicted in the KEGG pathway maps
Epithelial Cell Characteristics During CSCC ProgressionWith epithelial cell clustering, we identified cell subtypes, including ciliated epithelial, neuroendocrine, and goblet cells. Epithelial cells(Epi) were subsequently subdivided into six distinct sub-clusters (Fig. 6A and B). In contrast to normal tissues, we observed diminished Epi1 and Epi2 distribution in tumor tissues, whereas those of Epi3, Epi4, Epi5, and Epi6 increased (Fig. 6C and D, Table S7, and Fig.S2, S3). These data suggest that during the differentiation of normal cells into cancerous cells, there was a marked shift in the epithelial cell expression patterns. We used Monocle 2 for the pseudotemporal analysis to delve deeper into these alterations. The findings positioned Epi1 and Epi2 at the starting point of the trajectory, with Epi3, Epi4, Epi5, and Epi6 located towards the end (Fig. 6E and F). This may reflect the progression of development from normal to malignant cells. Further differential gene expression testing using Monocle 2 revealed changes in the expression of 5,648 genes throughout this process. These genes clustered into five groups. Notably, clusters c1 and c2 included genes that consistently showed increased expression over the pseudotemporal trajectory, whereas cluster c4 included genes that consistently exhibited decreased expression. These continually shifting genes may play significant roles in the carcinogenesis process (Fig. 6G, Fig. S4, and Table S8).
Fig. 6Epithelial cell characteristics during cervical squamous cell carcinoma (CSCC) progression. (A) Epithelial cell visualization using UMAP, with colors indicating distinct cell types. (B) Heatmap depicting specific marker expression levels in the epithelial subclusters. (C) UMAP visualization of epithelial sub-clusters categorized by group (upper panel). Inferred copy number variation (CNV) levels of epithelial cells, using B, mast, Mye, natural killer T(NKT), fibroblast, and end cells as references (lower panel). (D) Boxplots showing the proportions of epithelial sub-clusters across different disease stage groups, with significance determined using scCODA (P < 0.05). (E) Trajectory analysis of epithelial cells. Epithelial cell differentiation trajectories inferred using monocle v.2, with cells color-coded by identified clusters to illustrate the progression through various differentiation states over pseudotime. (F) Differential gene expression heatmap. This heatmap displays the top 100 differentially expressed genes (DEGs) along the inferred pseudotime trajectory, highlighting genes with significant expression changes that may contribute to cell state transitions. (G) Gene regulation in identified clusters. Expression regulation patterns within clusters: Clusters c1 and c2 are characterized by upregulated gene expression, indicating active cellular processes or differentiation states, whereas cluster c4 shows a downregulation pattern, which may reflect a distinct cellular function or quiescent state
Sc-RNA Sequencing Trajectory and ATAC-seq Peak Analysis Reveals Dynamic Chromatin AccessibilityATAC-seq analysis provides insights into open chromatin regions, highlighting the active genomic regions in cells. To elucidate the relationship between DEGs over pseudotime (DE_pseudotime) and the chromatin landscape, we integrated the data from DE_pseudotime and ATAC-seq peaks. The distances from the peaks to the nearest transcription start site (TSS) were found to range between − 2,000 and 1, 137 DE_pseudotime upregulated genes showed changes in the ATAC peaks, suggesting a direct correlation between gene upregulation and chromatin accessibility. Conversely, of the DE_pseudotime-downregulated genes, 57 exhibited alterations in the ATAC peaks. (Fig. 7A). To understand the biological importance of these alterations, KEGG pathway analysis was conducted on the shared genes. The upregulated genes were primarily involved in ECM-receptor interaction, cell cycle, glycolysis/gluconeogenesis, arginine and proline metabolism, and focal adhesion. In contrast, the downregulated genes were mainly centered around pathways such as thermogenesis, oxidative phosphorylation, neurotrophin signaling, and cell adhesion molecules (Fig. 7C). We extended our analysis to explore the potential transcriptional regulators of these pseudotime DEGs. By comparing the ATAC-seq peaks to known transcription factors (TFs) and upregulated genes, we identified 95 intersecting genes (Fig. 7B and Table S9). They focus on chromatin regulation/acetylation, Notch signaling pathway, and MAPK Signaling Pathway. A protein-protein interaction (PPI) of 95 genes resulted in an interaction network that regulated the upregulated gene pathways (Fig. 7D and Table S10).
Fig. 7ATAC sequencing characteristics. (A) Venn diagram illustrating the distinct and overlapping genes among the three categories: DE_pseudotime upregulated, DE_pseudotime downregulated, and ATAC-seq peaks. (B) KEGG pathways enriched among shared genes between DE_pseudotime upregulated and ATAC-seq peaks (upper panel) and DE_pseudotime downregulated and ATAC-seq peaks (lower panel). (C) Venn diagram showing specific and overlapping genes in three categories: DE_pseudotime upregulated, transcription factors (TF), and ATAC-seq peaks. (D) Protein-protein interaction (PPI) network associated with 95 intersecting genes
Comprehensive Validation of Cell Clusters and Target GenesThe tumor and normal adjacent tissue cell types from the new data (HRA001742) were also annotated accordingly (Fig. 8A). We used a logistic regression model with elastic net regularization to assess the similarity between the clusters in the original datasets (GSE168652, E-MTAB-11948, PRJCA008573, and E-MTAB-12305) and the new datasets (HRA001742). It was found that the degree of similarity of global cell types across datasets is very high (Fig. 8B). The number of cells in the new dataset is significantly lower than in the original dataset, making it impossible to annotate all cell types present in the original dataset. However, a large proportion of cells are still identifiable when delving into sub-clusters of T/NK cells, myeloid cells, and fibroblast cells. For example, we identified regulatory CD4 T (Treg) cells and memory CD8 T (Temra) cells that highly express MKi67 in the new dataset, both of which showed a clear increase in tumor tissues. The new data also revealed a higher enrichment of mast cells in tumor tissues, and nearly all fibroblast subtypes were identified, except for OGN + CAF (Fig. S5). Attention should be paid to the epi clusters. Epi1 and Epi2 represent the normal epithelium, whereas Epi7 is a unique cancer epithelium found only in the new data. Epi6 is a cancer epithelium exclusive of the original data, demonstrating strong cancer epithelium specificity (Fig. 8C). This indirectly emphasized the pronounced specificity of epithelial cells, which can be influenced by patient-specific characteristics. Our pseudotemporal analysis also confirmed this, indicating that Epi1 and Epi2 were in the early stages, whereas Epi5 and Epi6 were relatively later (Fig. 8D). This further emphasizes the advantages of multi-data integration, which enables smaller subgroup identification that may be overlooked in single or smaller datasets, leading to detailed discoveries.
Fig. 8Validation of our findings by other datasets and methods. (A) Visualization of all cell clusters was performed using the UMAP for new datasets (HRA001742). Cell types are represented by different colors. (B) Similarities analysis of all identified cell clusters from original and new data. (C) UMAP representation of epithelial cells in the new data set (left panel). Similarities between epithelial cells from the original data and new data (middle panel). Boxplots showing the epithelial sub-cluster proportions of disease stage groups in the new data; significance was determined by scCODA (P < 0.05)(right panel). (D) The developmental trajectory of epithelial cells in the new data. (E) Representative example of a CSCC tumor stained by IHC with Immunohistochemical Analysis of KRT8, KRT15, as the cell marker Epi6, and Transcription Factor YY1. All images shown were captured at the same magnification (400x); the scale bars represent 500 microns. (F) Representative examples of the gene expression, such as cell markers KRT1 and KRT10 in the Epi1 subset, cell markers KRT8 and KRT15 in the Epi6 subset, and transcription factor YY1 of Epi1. Heatmap of logFC showing the expression as determined by our scRNA-seq (top), bulk RNA-seq (middle), and RT-qPCR experiments (bottom) across normal, precancerous, and CESC samples. All displayed genes showed significant differences (FDR < 0.05)
On the other hand, a biological experiment was also performed to verify the finding. IHC assay confirmed at the protein level that the relative expression of KRT8 and KRT15 (Epi6 cell markers) increased progressively with the disease stage. Such staining also indicated the increased proportion of Epi6 cells during the development of cervical cancer, which could be the future target. The expression of the transcription factor YY1 was elevated in both the precancerous and tumor groups compared to normal tissues (Fig. 8E). Additionally, the consistency of epithelial cell subpopulation markers was revealed with single-cell sequencing, bulk sequencing, and RT-qPCR detection (Fig. 8F and Table S11). All of these results demonstrated the robustness of our results.
留言 (0)