Integrative analysis with machine learning identifies diagnostic and prognostic signatures in neuroblastoma based on differentially DNA methylated enhancers between INSS stage 4 and 4S neuroblastoma

Enhancer-associated regulatory network with ELMER analysis

Methylation and transcriptomic data of the same patient, particularly in 17 INSS stage 4 NB patients (age < 1.5 years) and 20 INSS stage 4S NB patients from the GSE73518 cohort, were incorporated in ELMER analysis. We plotted a heatmap to visualize the variations of methylation and transcriptomic levels between stage 4 and 4S NB (Fig. 2A). Distal probes were used to detect the region of enhancers. According to the hg38 reference genome, we selected 135,427 distal probes by get.feature.probe function (Supplementary Table S3). We utilized the GetNearGenes function to pinpoint the top ten genes nearest to the upstream and downstream of the distal probes respectively, thus creating probe-gene pairs (Fig. 2B). Subsequently, an unsupervised approach was utilized to detect distal probes by get.diff.meth function. Each probe’s methylation levels were ranked in both stage 4 and 4S groups, and the samples in the lower or higher quintiles (20%) of methylation were analyzed to determine whether the probes were hypo- or hypermethylated in stage 4, leading to the identification of 57 distal probes (Supplementary Table S4) with an FDR < 0.01 and a |Δβ| < 0.3 (Fig. 2C). We then examined the inverse correlation between each probe's methylation level and its associated gene's expression. Samples were classified into methylated (M) and unmethylated (U) groups based on the top and bottom 20% methylation levels of the probes, and gene expression levels between these groups were compared using the Mann–Whitney U test. Unsupervised mode was utilized to screen 956 probe-gene pairs with statistically significant negative correlations with default parameters through get.pair function (Supplementary Table S5). Then, the 250 bp base sequence upstream and downstream of these probes were extracted and aligned to 37 significantly enriched motifs (Supplementary Table S6) by get.enriched.motif function. Distal probes associated with the same motif were then classified into the top 20% M and bottom 20% U groups based on methylation levels. Motif FOXK1_HUMAN.H11MO.0.A. with differential expressions between two stages and negative correlations with its methylation levels was obtained through get.TFs function via the unsupervised mode (FDR < 0.05) (Fig. 2D–F, Supplementary Table S7) (Lambert et al. 2018). To improve the robustness of the identified regulatory networks, we also employed a supervised mode for probe selection and motif identification, which revealed that motif FOXK1_HUMAN.H11MO.0.A. enriched differentially between two stages, with significance (FDR < 0.05) (Supplementary Table S8). Ranking the TFs in motif FOXK1_HUMAN.H11MO.0.A. discovered that EPAS1 and L3MBTL4 may play a vital role in methylated regulation (Fig. 2G). Besides, most of the genes in motif FOXK1_HUMAN.H11MO.0.A. expressed significantly higher in stage 4S than in stage 4 (Fig. 2H). GO and KEGG enrichment analysis showed that motif genes may affect embryonic organ development, muscle tissue development, morphogenesis and signaling pathways regulating pluripotency of stem cells (Fig. 2I, J). The enhancer-associated regulatory network is visualized via Cytoscape software (Fig. 2K).

Fig. 2figure 2

Enhancer-associated regulatory network was constructed with ELMER. A The heatmap visualized the differences in methylation and transcriptomic levels between INSS 4 and 4S stages in the GSE73518 cohort. Color gradient: gene expression levels (blue to red, blue for low, red for high) and corresponding methylation levels (steelblue to yellow, steelblue for low, yellow for high). B Example of top ten genes closest to the upstream and downstream of the differentially methylated distal probes. C Volcano plot of probes that were hypermethylated and hypomethylated in INSS 4 stage NB patients. D Example of correlation plot between the TF expression level and DNA methylation level of one of its probes. E Example of correlation plot between the TF expression level and its corresponding average DNA methylation level. F Odds ratios of the significantly enriched motifs identified by the get.enriched.motif function. G TF sorting plot of motif FOXK1_HUMAN.H11MO.0.A. H Profiles of gene expressions of motif FOXK1_HUMAN.H11MO.0.A. in stage 4S and stage 4. Color gradient: gene expression levels (blue to red, blue for low, red for high). IJ Function enrichment performed by GO and KEGG terms focused on motif FOXK1_HUMAN.H11MO.0.A. K TF regulatory network of motif FOXK1_HUMAN.H11MO.0.A visualized with Cytoscape software

Establishment and validation of ISRS with diagnostic and prognostic value

As mentioned above, five NB cohorts with transcriptome data were utilized in model development and verification, which showed a superior effectiveness of batch effect removal via “sva” R package (Supplementary Figure S1A–B). Incorporating 67 genes in motif FOXK1_HUMAN.H11MO.0.A. based on ELMER analysis, 101 prognostic algorithm combinations and 113 prediction algorithm combinations were then constructed via the LOOCV framework. The C-index of each ML combination was calculated in all validation datasets (Supplementary Table S9). The best diagnostic model was established based on RF algorithm which was utilized in both feature selection and model construction, with the highest average C-index (0.812) across five datasets (Fig. 3A). AUC, PRAUC, accuracy, sensitivity, specificity, precision, cross-entropy and Brier scores were utilized to reveal that RF was the most powerful model in diagnosing INSS 4 NB (Supplementary Figure S1C–D). Finally, a 9-gene diagnostic INSS stage-related signature (ISRS) was accordingly established to diagnose stage 4 NB, with CAMTA2 being the most important variable (Supplementary Figure S1E–F). Confusion matrix in E-MTAB 8248 cohort and other three validation cohorts showed a well accuracy of ISRS (Fig. 3B, Supplementary Figure S2A). ROC curves of five cohorts showed a well discrimination of ISRS (Fig. 3C). We also compared the AUC of ISRS, other clinical variables and the logistic regression nomogram model including several clinical variables and ISRS, which demonstrated that ISRS and the nomogram model performed better (Fig. 3D, Supplementary Figure S2B–C). Calibration curves showed a great alignment between ISRS predicted probability and the observed probability of stage 4 NB (Fig. 3E). DCA curves indicated that utilization of ISRS and the logistic regression nomogram model is more clinically beneficial, compared with other clinical variables (Fig. 3F, Supplementary Figure S2D). The feature importance visualization of 9 variables selected by RF demonstrated that CAMTA2 impacted most in the RF model (Fig. 3G). The best prognostic model was established based on the Enet (alpha = 0.6) algorithm which was utilized in both feature selection and model construction, with the highest average C-index (0.732) across five datasets (Fig. 3H). Logarithmic loss, recall and decision calibration were calculated to prove the well calibration and precision of the Enet (alpha = 0.6) model (Supplementary Figure S1G). Ultimately, a 20-gene prognostic ISRS was accordingly established to forecast the prognosis of NB patients (Supplementary Figure S1H–I). The feature importance visualization of 20 variables selected by Enet demonstrated that CAMTA1 impacted most in the Enet model (Supplementary Figure S1J). In GSE49710, E-MTAB 8248, TARGET and E-MTAB 179 cohorts, the low-risk group owned a relatively longer overall survival (OS) and event-free survival (EFS) than the high-risk group (Fig. 3I, Supplementary Figure S3A). In GSE85047 cohort, the low-risk group owned a relatively longer overall survival (OS) and progression-free survival (PFS) than the high-risk group (Supplementary Figure S3A). ROC curves of 1-, 3- and 5-year OS showed well specificity of ISRS (Fig. 3J, Supplementary Figure S3B). AUC values of 3-year OS proved that ISRS and cox regression model involving ISRS and several clinical variables were more specific and discriminative to forecast the prognosis of NB patients, compared to other clinical variables (Fig. 3K, Supplementary Figure S3C–D). Time dependent ROC curves indicated that ISRS and cox regression model outperformed conventional clinical variables in capability of discrimination (Fig. 3L, Supplementary Figure S3E). Calibration curves (Fig. 3M, Supplementary Figure S3F) and DCA curves (Fig. 3N, Supplementary Figure S3G) showed that ISRS is well-behaved in accuracy and clinical benefit. Multivariate Cox regression analysis in the Cox proportional hazards model indicated that ISRS, age, sex, INSS stage and COG risk stratification were independent prognostic factors in NB patients (P < 0.05) (Fig. 3O, Supplementary Figure S3H). All these metrics collectively indicated that ISRS demonstrated stability and robustness in model performances across five NB queues. The flowchart of machine learning algorithm integration was provided in Supplementary Figure S1K.

Fig. 3figure 3

Construction and validation of a diagnostic and a prognostic signature based on 67 genes selected by ELMER analysis. A A total of 113 kinds of diagnostic models via a leave-one-out cross-validation framework and further calculated the C-index of each model. B Confusion matrix of the diagnostic ISRS in the validation cohort E-MTAB 8248. C ROC curves of the diagnostic ISRS in five cohorts (GSE49710, E-MTAB 8248, TARGET, GSE85047 and E-MTAB 179). D ROC curves of the diagnostic ISRS, the logistic regression model (nomogram) and several clinical variables in the validation cohort E-MTAB 8248. E Calibration curves of the diagnostic ISRS in five cohorts (GSE49710, E-MTAB 8248, TARGET, GSE85047 and E-MTAB 179). F DCA curves of the diagnostic ISRS, the logistic regression model (nomogram) and several clinical variables in the validation cohort E-MTAB 8248. G The feature importance visualization of 9 variables selected by RF, which formed the final diagnostic model. H A total of 101 kinds of prognostic models via a leave-one-out cross-validation framework and further calculated the C-index of each model. I Kaplan–Meier survival curves of OS for high-risk and low-risk groups of NB patients in the GSE49710 cohort. J ROC curves of 1-, 3- and 5-year OS of the prognostic ISRS signature in the GSE49710 cohort. K AUC values of 3-year OS of the prognostic ISRS signature, the cox regression model (nomogram) and several clinical variables in the GSE49710 cohort. L Time-dependent ROC curves of the prognostic ISRS signature, the cox regression model (nomogram) and several clinical variables in the GSE49710 cohort. M 1-, 3- and 5-year calibration curves of the prognostic signature in the GSE49710 cohort. N DCA curves of the prognostic ISRS signature, the cox regression model and several clinical variables in the GSE49710 cohort. O Forest plot visualized the outcome of multivariate Cox regression analysis involving the prognostic ISRS and several clinical variables. Risk: risk scores calculated by prognostic ISRS

Functional enrichment analysis and landscape of model genes

PCA analysis displayed significant variations between two risk groups divided by prognostic ISRS (Fig. 4A). Based on 7325 DEGs identified between two risk groups via “limma” R package, we performed functional enrichment analysis according to GO and KEGG terms, which discovered that DEGs were significantly enriched in dopamine secretion, kinetochore organization, serotonin transport and outer kinetochore in GO terms, and were significantly enriched in Neuroactive ligand-receptor interaction, Cell cycle, GABAergic synapse and Calcium signaling pathway in KEGG terms (Fig. 4B, C). GSEA analysis demonstrated that carbon metabolism and cell cycle were enriched in a high-risk group, and adrenergic signaling in cardiomyocytes, aldosterone synthesis and secretion and cell adhesion molecules were suppressed in a high-risk group (Fig. 4D, E). To understand the biological behavioral variations between two risk groups, we conducted GSVA enrichment analysis based on “h.all.v7.4.symbols.gmt” hallmark gene sets in MSigDB database, which indicated that high-risk group was enriched in myc_targets_v2 and DNA_repair, and suppressed in hedgehog_signaling and apical_surface (Fig. 4F). We then visualized the differential expression of ISRS model genes and the variations of clinical variables between the two risk groups (Fig. 4G). ZEB2, CAMTA1, BAZ2B, CAMTA2 and HOXC9, which constituted both the diagnostic and prognostic ISRS, were significantly higher expressed in a low-risk group than in a high-risk group, and significantly higher expressed in stage 4S than in stage 4, pretending to be protectively prognostic genes (Supplementary Figure S4A-B). Detailed information on types of model genes was provided in Supplementary Table S7. Spearman correlation analysis revealed close relationships (correlation P value < 0.0001) among the 24 ISRS model genes (Fig. 4H). Based on CNV data sourced from the cbioportal website, “bubble plots” visualized that CREB5 attained the most somatic CNV frequencies among diagnostic model genes, and CAMTA1 attained the most somatic CNV frequencies among prognostic model genes (Fig. 4I, J). Besides, “circle plots” visualized the chromosome locations where the mutations of ISRS model genes occurred (Fig. 4K).

Fig. 4figure 4

Functional enrichment analysis and landscape of model genes. A PCA analysis plot of high-risk group and low-risk group. B, C GO and KEGG enrichment analyses of DEGs among two risk groups. DF GSEA and GSVA analyses of DEGs among two risk groups. G Differences in the expression of model genes and differences in the clinical variables of NB patients among the two risk groups. H Molecular interaction network plot visualized the correlation among expressions of model genes and their prognostic prediction value. Significantly positive and negative correlations are shown as red and blue lines, respectively. The color and size of the nodes indicate the type of model genes and P values from Cox regression. I, J The CNV mutation frequency of the diagnostic model genes and the prognostic model genes. D Chromosome position and alteration of all model genes

Tumor immune microenvironment analysis

To appraise the discriminative capability of the prognostic ISRS in immune infiltration, we simultaneously calculated the abundance of immune cell infiltration based on eight distinct immune algorithms. “ComplexHeatmap” R package visualized that various immune cells were significantly fewer infiltrated in a high-risk group than in the low-risk group (Fig. 5A). Besides, the spearman correlation heatmap displayed the correlation between immune cell infiltrations and the risk scores calculated by the prognostic ISRS, as well as the relationship between immune cell infiltrations and the expression of ISRS model genes (Fig. 5B). Besides, ssGSEA analysis of immune function scores demonstrated that the low-risk group owned significantly better immune infiltration abundance (Fig. 5C). Moreover, the activation of six key steps in the cancer immunity cycle appeared to be significantly higher in the low-risk group (Fig. 5D). For immune checkpoints, the low-risk group displayed elevated expression levels of immune checkpoint genes, indicating a susceptibility to immunotherapy (Fig. 5E). Previous literatures have stressed the vital role of cancer-associated fibroblasts (CAFs) in immune modulation of the tumor microenvironment (Boyle et al. 2020; Sahai et al. 2020; Gagliano et al. 2020). We then utilized “circle plot” to visualized the correlation of CAFs among various types of immune cells calculated by the CIBERSORT algorithm in NB patients (Fig. 5F). Additionally, it is well-known that heterogeneous metabolic preferences and dependencies exist across tumor types (Hakimi et al. 2016; Hensley et al. 2016; Kim and DeBerardinis 2019), hence we extracted multiple metabolic pathways from the KEGG database to validate the relationship between risk scores and tumor metabolism in NB patients (Fig. 5G). Two hub model genes (FOXD1 and CAMTA2) also showed significant correlations with cell metabolic pathways.

Fig. 5figure 5

Analysis of the TME in different risk groups. A Differences in immune infiltration status between the two risk groups were evaluated by eight immune algorithms. B Heatmap visualized the correlation between different immune cells and risk scores and the relationship between different immune cells and expressions of model genes. C The differences in immune function scores were calculated by ssGSEA analysis between the two risk groups. D The differences in cancer immunity cycle scores based on ssGSEA analysis between two risk groups. E The differences in expressions of immune checkpoint-related genes between two risk groups. F Network showed the correlation among CAFs and CIBERSORT-derived immune cells. Significantly positive and negative correlations are shown as red and blue lines, respectively. The color and size of the nodes indicate the type of immune cells and P values from Cox regression. G The correlations between the signature risk scores, expressions of FOXD1 and CAMTA2, and metabolic-related pathways based on GSVA analysis of KEGG terms were displayed in the butterfly plot

Comparison of the tumor mutation burdens

We visualized and compared the distribution variations of somatic mutations between the low-risk group (Fig. 6A) and the high-risk group (Fig. 6B) based on mutation data sourced from the cbioportal website. Comparisons of tumor mutation burden (TMB) between the two risk groups revealed no significant difference (Figs. 6C). However, TMB was found to be significantly related to the risk scores obtained by the prognostic ISRS based on spearman correlation analysis (Fig. 6D). Moreover, we classified NB patients with mutation data into the high TMB and the low TMB group according to their median TMB score. After integrating two risk groups and two TMB groups, we found that patients with low TMB in the high-risk group owned the worst OS and EFS, and patients with high TMB in the low-risk group owned the best OS and EFS, without significance (Fig. 6E, F).

Fig. 6figure 6

Landscape of somatic mutation, CNVs, immunotherapy and chemotherapy between high-risk and low-risk groups. A, B Visual summary displayed common genetic alterations in the high-risk and low-risk groups. C Tumor mutation burdens between high-risk and low-risk groups. D Spearman correlation between risk scores calculated by ISRS and TMB scores in NB patients. E, F Comprehensive survival analysis on OS and EFS based on two risk groups and two TMB groups. G Violin diagram illustrated the variance in TIDE scores between high-risk and low-risk groups in the GSE49710 cohort. H Kaplan–Meier survival analysis delineated the OS rates for patients categorized into high-risk and low-risk groups in the IMvigor cohort. I The TIDE algorithm predicted response to immunotherapy between high-risk and low-risk groups in the E-MTAB 8248 cohort. J Comprehensive submap analysis predicted response to immunotherapy between high-risk and low-risk groups in the E-MTAB 8248 cohort. K Box diagram depicted the disparity in ISRS among patients exhibiting immunotherapy responses in the IMvigor210, GSE78220, GSE135222, and GSE91061 cohorts. L Correlation study and differential drug response analysis of CTRP-derived pharmaceuticals and PRISM-derived pharmaceuticals to explore potential drugs for high-risk NB patients

Response of immunotherapy and chemotherapy

Based on TIDE scores and submap algorithm, the possibility of immunotherapy responses was appraised in two risk groups. In the GSE49710 cohort, higher TIDE scores, higher TIDE dysfunction and exclusion scores were discovered in the high-risk group, indicating a bigger probability of immune escape during the immunotherapy process (Fig. 6G). In the IMvigor210 immunotherapy cohort, survival analysis demonstrated that low-risk patients with a response to immunotherapy owned significantly longest OS, whereas high-risk patients with no response to immunotherapy had significantly worst OS (Fig. 6H). Meanwhile, in the E-MTAB8248 cohort, a significantly higher proportion of patients responding to immunotherapy was observed in the low-risk group (Fig. 6I). Besides, the submap analysis results demonstrated that low-risk patients were more sensitive to CTLA4 inhibitors, rather than PD1 inhibitors (Fig. 6J). Despite we assessed individual immunotherapy reaction based on two methods in two NB cohorts, it's crucial to straightly compare treatment effectiveness in immunotherapy-treated cohorts between two risk groups, which called a need to involve four immunotherapy-treated cohorts in subsequent analysis. Ultimately, we found that patients who responded more sensitively to immunotherapy owned significantly lower risk scores across four immunotherapy-treated cohorts (Figs. 6K). Accordingly, to explore potential drugs that might be more effective in high-risk NB patients, we predicted drug response based on drug sensitivity data from CTRP and PRISM. Through the cross-correlation in the two pharmacogenomics databases, we successfully predicted six potential drugs or compounds (BI-2536, daporinad, GSK461364, SB-743921, rubitecan and talazoparib) with therapeutic effects in high-risk NB patients (Fig. 6L).

Identification of ISRS model genes-related subtypes

To further understand the expression pattern of ISRS model genes, 871 patients from GSE49710, E-MTAB 8248 and TARGET cohorts were included for consensus clustering analysis. Respectively, we utilized ISRS model genes to employ unsupervised clustering analysis on each cohort, which demonstrated that k = 2 exhibited the best discrimination (Supplementary Figure S5A–B). Besides, PCA and tSNE analysis showed significant variations between the two clusters, according to the expressions of ISRS model genes (Fig. 7A, Supplementary Figure S5C). Meanwhile, we performed survival analysis to demonstrate that cluster 1 owned significantly worse OS and EFS than cluster 2 (Fig. 7B, C). Moreover, the expressions of ISRS model genes and the clinicopathological characteristics in different clusters displayed significant differences (Fig. 7D). Based on 1868 DEGs identified between two clusters via “limma” R package, we performed functional enrichment analysis, which discovered that DEGs were significantly enriched in neuropeptide signaling pathway, neuropeptide hormone activity, ligand-gated ion channel activity, sodium:chloride symporter activity and GABA receptor activity in GO terms, and were significantly enriched in Neuroactive ligand-receptor interaction, ECM-receptor interaction, GABAergic synapse, Synaptic vesicle cycle and Neomycin, kanamycin and gentamicin biosynthesis in KEGG terms (Fig. 7E, F). GSEA analysis demonstrated that cytokine-cytokine receptor interaction and staphylococcus aureus infection were enriched in cluster 1, and GABAergic synapse and neuroactive ligand-receptor interaction were suppressed in cluster 1 (Supplementary Figure S5D). Moreover, GSVA enrichment analysis, based on “h.all.v7.4.symbols.gmt” hallmark gene sets in the MSigDB database, indicated that cluster 1 was enriched in myc_targets_v2, e2f_targets and g2m_checkpoint, and suppressed in hedgehog_signaling and apical_junction (Supplementary Figure S5E). To further understand variations in tumor immune microenvironment between two clusters, eight immune algorithms were utilized to assess the immune abundance differences between two clusters, and visualized the Cox P value of each immune cell type (Fig. 7G). In terms of mutation landscape, we compared the distribution variations of somatic mutations between cluster 1 (Supplementary Figure S5F) and cluster 2 (Supplementary Figure S5G) based on mutation data sourced from the cbioportal website. We observed that TMB scores are not significantly different between the two clusters (Supplementary Figure S5H). Meanwhile, we classified NB patients with mutation data into the high TMB and the low TMB group according to the median TMB score. After integrating two clusters and two TMB groups, we found that patients with low TMB from cluster 1 had worse OS and EFS than patients with high TMB from cluster 2, without significance (Supplementary Figure S5I–J).

Fig. 7figure 7

Construction of ISRS model genes-related clusters. A PCA analysis of two clusters. B, C Kaplan–Meier survival analysis of OS and EFS between two clusters. D ComplexHeatmap of the distribution of ISRS model genes and clinical variables in the two clusters. E, F GO and KEGG enrichment analysis indicated significant enrichment of pathways in cluster 1. G Differences in the proportion of various kinds of immune cells calculated by eight immune algorithms in the two clusters

Model comparisons and landscapes of risk groups and clusters

For the sake of verifying the prognostic effectiveness of ISRS, we collected coefficients of model genes in 39 previously published NB prognostic models. Subsequently, we performed a comparative analysis of the C-index of each prognostic model, which was developed based on a variety of biological characteristics, involving necroptosis, ferroptosis, cuproptosis, disulfidptosis, ganglioside, and m6A methylation. Ultimately, we discovered that ISRS demonstrated superior predictive performances relative to the vast majority of models across five NB datasets (Fig. 8A), which qualified ISRS as an invaluable NB prognostic model. Moreover, the overall relationship among different risk groups, different clusters and patients’ clinical information was visualized by “sankey plot” (Fig. 8B). In terms of immune subtypes of NB patients, we observed significant variations in the proportion of immune subtypes between two risk groups and two clusters in the GSE49710 cohort, with much more wound healing (C1) subtype in high-risk group and in cluster 1 (Fig. 8C). Comparing clinical variables between two risk-groups, we suggested that lower risk scores of ISRS were related to better prognoses in NB patients (Fig. 8D).

Fig. 8figure 8

Model comparisons and landscape of two risk groups and two clusters. A C-index comparison analysis between the ISRS and 39 published signatures in GSE49710, E-MTAB 8248, TARGET, GSE85047, E-MTAB 179 and meta-cohort. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. B Sankey diagram of distributions in two clusters and two risk groups with different clinical variables and survival outcomes. C Differences in the proportion of five immune subtypes between two clusters and two risk groups. D Circular pie chart visualized the proportion difference of clinical indices and immune subtypes between two risk groups

Single-cell sequencing analysis and pseudotime analysis

To explore the biological function of ISRS in a single cell level, we utilized 4 NB single-cell datasets (GSE137804, GSE192906, GSE140819 and CellAtlas) to validate our results. In GSE137804, we utilized harmony integration to remove batch effects, which showed a well correction before and after integration (Fig. 9A). Subsequently, we partitioned all cells in GSE137804 into 25 clusters based on the k-nearest neighbor (KNN) clustering method (Fig. 9B). According to cell markers collected from the literature review, we successfully identified 8 distinct cellular subtypes, involving neuroendocrine cells (NE cells), T cells, B cells, endothelial cells, myeloid cells, Schwann cells, fibroblasts, and plasmacytoid DC cells (Fig. 9C). Violin plot visualized the expressions of canonical markers in 8 clusters (Fig. 9D). Subsequently, the ISRS scoring for each cell was calculated based on six signature enrichment scoring algorithms, which demonstrated that cells with higher ISRS scoring predominately located in NE cells (Fig. 9E, F, Supplementary Figure S6A–B). Meanwhile, we explored the single-cell transcriptome localization of ISRS model genes, which were found to expressed mostly in NE cells (Fig. 9G). Accordingly, the expression patterns of ISRS model genes were validated in three NB single-cell external datasets (GSE192906, GSE140819 and CellAtlas) (Supplementary Figure S6C). Meanwhile, six scoring algorithms were utilized in the three NB single-cell external datasets (GSE192906, GSE140819 and CellAtlas) to demonstrate that ISRS model genes were enriched in NE cells (Supplementary Figure S6D–I). Additionally, the pseudotime trajectory analysis revealed the temporal sequence of malignant cellular differentiation and demonstrated that NE cells with higher ISRS scores appeared at an earlier pseudotime than NE cells with lower ISRS scores, indicating that immature NE cells were more predominant in high-risk NB patients (Fig. 9H). The differentiation trajectory of NE cells, endothelial cells, fibroblasts and Schwann cells were plotted via Monocle 2 algorithm, which suggested that endothelial cells owned the potential to differentiate into other types of cells (Fig. 9I). Hence, we set endothelial cells as the beginning of differentiation to investigate pseudotime trajectory of all cells in Monocle 3 analysis, which validated that immature NE cells could own higher ISRS scores and serve as malignant cells (Fig. 9J). Besides, pseudotime analysis of interested genes identified that pseudotime DEGs were increased along the pseudotime curve, while the ISRS model genes stayed still (Fig. 9K).

Fig. 9figure 9

Exploration of ISRS and model genes in GSE137804 scRNA-seq data. A UMAP plot before and after Harmony integration, each color stands for a NB tumor sample. B, C UMAP plot of the distribution of 24 clusters and 7 cell types after manual annotation. D Violin plot of maker genes for annotation of each cell type. E, F Single-cell scoring results of ISRS model genes based on the singscore algorithm in each cell type. G FeaturePlot of ISRS model genes (CAMTA1, CAMTA2, ZEB2, BAZ2B and HOXC9) in UMAP plot. HI Pseudotime trajectory analysis in NB cells via Monocle 2 algorithm (Cells are colored according to pseudotime, ISRS groups and cell types). JK Pseudotime trajectory analysis in NB cells and pseudotime DEGs via Monocle 3 algorithm

Validation of ISRS by inferCNV, cell communication and SCENIC analysis

To comprehensively verify the efficacy of ISRS in single-cell RNA-sequencing data, we utilized inferCNV algorithm to investigate the clonal structure of NE cells, endothelial cells, fibroblasts and Schwann cells. The infer CNV analysis revealed that NEs with chromosome 17q gain could serve as malignant cells, while cells with more CNVs owned higher ISRS scores in 4 NB single-cell datasets, validating the capability of risk stratification of ISRS (Fig. 10A, B, Supplementary Figure S7A–C). In cell communication analysis, we plotted circle diagrams to demonstrate the interaction times and interaction strengths among each cell type, visualizing the integrated cell communication networks in high ISRS cells and low ISRS cells. We compared the cell communication patterns between two ISRS groups, which revealed that B cells, myeloid cells, Schwann cells and fibroblasts might contribute most to differences in cell communication networks (Fig. 10C). Hence, we investigated over-expressed ligand-receptor pairs and their interactions among B cells, myeloid cells, Schwann cells and fibroblasts in the high ISRS group, identifying differential cell interactions between two ISRS groups (Fig. 10D). Different cell types would emit different contributive signals on the overall, incoming and outgoing signals between two ISRS groups, especially for Schwann cells, fibroblasts and endothelial cells (Fig. 10E, F, Supplementary Figure S7D). Besides we further investigated the relation between regulon (TFs and target genes) activity and each cell type in different ISRS group based on SCENIC analysis, which indicated that regulons of SOX4 and SMAD5 scored high activity in high ISRS cells, and regulons of KLF7 and SMARCA4 scored high activity in low ISRS cells (Fig. 10G, Supplementary Figure S7E).

Fig. 10figure 10

The landscape of CNV, cell–cell communication, transcriptional regulons in GSE137804 scRNA-seq cohort. A, B Differences of CNVs of NE cells, fibroblasts, Schwann cells and endothelial cells in high ISRS cells and low ISRS cells. C Circle diagrams showed the interaction strength and number between each cell type in high ISRS cells and low ISRS cells. D Chord chart and bubble chart showed overexpressed ligand–receptor interactions in high ISRS cells. Bubble size represents P value generated by the permutation test, and the color represents the possibility of interactions. E Heatmap showed the efferent or afferent contributions of all signals to different cell types in low ISRS cells (left) and low ISRS cells (right). F Dot plot shows dominant senders and receivers in high ISRS cells and low ISRS cells. The X and Y axes are the total outgoing or incoming communication probabilities associated with each cell group, respectively. The size of the dots is positively related to the number of inferred links (both outgoing and incoming) associated with each cell block. The colors of the dots represent different cell groups. G SCENIC analysis indicated significant regulons in low ISRS cells (left) and high ISRS cells (right)

Pan-cancer analysis and immunohistochemistry of two hub genes

Finally, we observed two ISRS model genes (CAMTA2 and FOXD1) with abundant research value in oncology, which have been proven as oncogenes in head and neck squamous cell carcinoma (Wu et al. 2023) or colon cancer (Luan et al. 2021). Hence, we performed pan-cancer analysis to reveal the heterogeneity of CAMTA2 and FOXD1 expressions in tumor and normal samples across 33 tumor types (Fig. 11A). Meanwhile, correlation between expressions of two hub genes and TMB, MSI, immune cell and immune score highlighted the significance of two hub genes involved in tumor immune microenvironment and immune cell infiltration (Fig. 11B–E). The protective prognosis value of CAMTA2 was observed in PAAD, while that of FOXD1 was found in CESC and LUSC, showing a similar prognostic effect in NB (Fig. 11F). Furthermore, we conducted IHC staining to validate the significantly higher protein level of CAMTA2 and FOXD1 expression in stage 4S tissues than in stage 4 tissues, which supported our bioinformatics results (Fig. 11G, H, Supplementary Figure S7F). Meanwhile, we divided 35 stage 4S and 4 NB patients into high and low CAMTA2/FOXD1 group based on the median IHC score of CAMTA2/FOXD1. Survival analysis revealed that NB patients with high CAMTA2 protein levels had significantly better OS than NB patients with low CAMTA2 protein levels, while this difference was also observed between high and low FOXD1 groups, without significance (Fig. 11H).

留言 (0)

沒有登入
gif