Preoperative liquid biopsy transcriptomic panel for risk assessment of lymph node metastasis in T1 gastric cancer

Discovery of candidate genes predicting LNM in T1 GC patients

In this study, we performed an unbiased biomarker discovery process by analyzing transcriptomic data from two GC datasets (TCGA and GSE246963), complemented with mRNA sequencing data from T1 GC tissues with and without LNM. Using differential gene expression analysis (Wilcoxon rank-sum test for GSE246963, P < 0.05; EdgeR for TCGA, P < 0.05) and correlation analysis (r < 0.5), we identified four genes SDS, TESMIN, NEB, and GRB14 that were differentially expressed between LNM and non-LNM patients (Fig. 2A). Volcano plots showed these genes were upregulated in cancer tissues (Fig. 2B). Validation with TCGA data confirmed higher expression in LNM cases (P < 0.05) (Supplementary Fig. 3), and pan-cancer analysis showed elevated expression across various cancer types (Supplementary Fig. 4). Further validation in a pilot cohort at FHHMU showed significantly higher expression of these genes in T1 GC tissues with LNM compared to those without LNM (P < 0.05) (Fig. 2C-D). Additionally, higher expression levels of these genes were closely associated with poorer clinical characteristics (Fig. 2E-F), and the detailed P values are shown in Supplementary Tables 67. Correlation analysis using Timer 2.0 (http://timer.cistrome.org/) indicated positive associations between these genes and VEGFA and VEGFC in metastasis and tube formation (Fig. 2G). Pathway enrichment analysis using Enrichr (KEGG and GO, https://maayanlab.cloud/Enrichr/) and PPI network mapping with STRING and Cytoscape 3.9.1 revealed the potential roles of these genes in GC (Fig. 2H-L). Overall survival analysis of the four candidate mRNAs using Kaplan-Meier plots (https://kmplot.com/analysis/) revealed that high expression status of all four candidate genes was significantly associated with poorer prognosis (Supplementary Fig. 5).

Fig. 2figure 2

Discovery and Preliminary Validation of LNM Candidate Biomarkers in T1 Gastric Cancer Patients Using Public Databases and Transcriptomic Sequencing Data. (A) Through Venn diagram analysis, four candidate mRNAs (SDS, TESMIN, NEB, GRB14) were identified by examining transcriptomic data from the TCGA database (tumor tissues vs. adjacent normal tissues), GEO database (GSE246963), and paired tumor samples from T1 GC patients with and without LNM. (B) A volcano plot displays the expression levels of these four candidate genes across the datasets used in the discovery process. (C) Expression levels of the four candidate mRNAs were compared in fresh-frozen tumor tissues from 28 matched T1 GC patient pairs with and without LNM, matched by propensity score. (D) In peripheral blood samples from 22 matched patient pairs, expression levels of the four candidate mRNAs were also assessed in patients with and without LNM. (E) The relationship between the expression levels of the four candidate mRNAs in fresh-frozen tumor samples and clinicopathological characteristics was analyzed. (F) Similarly, the relationship between mRNA expression levels in peripheral blood samples and clinicopathological characteristics was evaluated. (G) An association heatmap was generated based on the TCGA database to explore the relationship between the four candidate mRNAs and common metastasis-related genes. (H) GO and KEGG pathway analyses of the four genes were performed using the Enrichr database. (I-L) PPI networks for each of the four candidate mRNAs (I: SDS, J: TESMIN, K: NEB, L: GRB14) were constructed using the online STRING database (https://string-db.org)

Validation of surgical resection specimens for 4-mRNA panel predicting LNM in T1 GC patients

First, we conducted a correlation analysis of the four candidate mRNA biomarkers and found no significant correlations among them, eliminating the possibility of collinearity (Fig. 3A). Next, we evaluated these biomarkers in a training cohort of T1 GC patients (184 without LNM, 34 with LNM) using RT-qPCR and logistic regression. Each gene was independently associated with LNM risk in T1 GC patients (P < 0.05, Supplementary Table 8). ROC curve analysis showed that while individual mRNA biomarkers were effective, the combined 4-mRNA panel significantly enhanced diagnostic performance (AUC = 0.838, sensitivity 82.3%, specificity 75.0%) (Supplementary Fig. 6A-B).

Fig. 3figure 3

Training and Validation of the 4-mRNA Signature for Predicting LNM in T1 Gastric Cancer Patients Using Fresh-Frozen Tissue Samples. (A) Correlation analysis among the four candidate genes. (B) Nomogram constructed to predict LNM in T1 GC patients, based on the 4-mRNA signature combined with clinical features. (C) ROC curves of various predictive variables within the training dataset. (D) ROC curves of different predictive variables within the validation dataset. (E) Calibration curve of the RSA model in the training dataset. (F) Calibration curve of the RSA model in the validation dataset. (G) Confusion matrices for different predictive models in the training and validation datasets. (H) Double-layer concentric circle plots displaying clinical benefit for different predictive models in the training and validation datasets. (I) Radar chart comparing evaluation metrics of different predictive models in the training dataset. (J) Radar chart comparing evaluation metrics of various predictive models in the validation dataset. (K) Clinical impact curve of the RSA model for patients in the training dataset. (L) Clinical impact curve of the RSA model for patients in the validation dataset. (M) Comparative analysis of the eCura scoring system versus the RSA model for identifying LNM, using a combined dataset from the training and validation sets

To further improve clinical utility, we developed a Risk Stratification Assessment (RSA) model by combining the 4-mRNA panel (OR = 13.911, 95% CI: 4.585–42.212) with clinical variables, including tumor size (OR = 5.906, 95% CI: 1.673–20.856), depth of infiltration (OR = 5.940, 95% CI: 1.814–19.452), and lymphovascular invasion (OR = 5.935, 95% CI: 1.767–19.935) (Supplementary Table 9). This RSA model was visualized using a nomogram (Fig. 3B). In the training cohort, the RSA model demonstrated excellent predictive accuracy for LNM, achieving an AUC of 0.890, significantly outperforming the clinical model (AUC = 0.820; P = 0.036) (Fig. 3C). Calibration curves further validated the RSA model’s reliability in predicting LNM (Fig. 3E). The confusion matrix and radar chart confirmed that the RSA model provided higher sensitivity and specificity than the clinical model alone (Fig. 3G upper panel; Fig. 3I; Supplementary Table 10).

For validation, the RSA model was applied to an independent cohort of 186 T1 GC patients (31 LNM-positive, 155 LNM-negative). It retained high predictive accuracy (AUC = 0.878, sensitivity 83.9%, specificity 83.9%), outperforming both the clinical model and the 4-mRNA panel in LNM detection (Fig. 3D). Calibration curves again verified the model’s reliability in predicting LNM risk (Fig. 3F). The confusion matrix and radar chart confirmed that the RSA model achieved the highest sensitivity and specificity in the validation set as well (Fig. 3G lower panel; Fig. 3J; Supplementary Table 10). Additionally, we selected 26 T2, 19 T3, and 40 T4 GC patients for validation of the RSA model’s ability to predict LNM. The results revealed that the expression of four mRNAs associated with LNM was higher in specimens from T2-T4 stage GC patients with LNM compared to those without (Supplementary Figs. 78). Further ROC curve analysis showed the following AUC values: T2 patients AUC = 0.646 (95% CI: 0.400–0.892), T3 patients AUC = 0.608 (95% CI: 0.343–0.873), and T4 patients AUC = 0.640 (95% CI: 0.416–0.864). These findings suggest that the 4-mRNA model may have limited applicability for predicting LNM in T2-T4 stage GC patients (Supplementary Fig. 9).

Clinically, the RSA model significantly reduced overtreatment rates. As shown in Fig. 3H, using traditional clinicopathological criteria, 100% of patients in the training cohort would have been classified as high-risk, resulting in unnecessary radical surgeries for 84.4% of cases (184 of 218). In contrast, the 4-mRNA classifier reduced the high-risk classification rate to 33.9%, with an overtreatment rate of only 21.1%. The RSA model further refined this classification, effectively eliminating overtreatment (9.2% in the training cohort). Similar reductions were observed in the validation cohort, where the RSA model significantly reduced unnecessary surgeries compared to other models. Clinical impact curve analysis demonstrated that the RSA model’s nomogram offered superior net benefit across a broad, practical range of threshold probabilities, indicating substantial predictive value in both training and validation sets (Fig. 3K-L). Overall, the RSA model markedly improved clinical decision-making, reducing overtreatment rates from 83.9 to 44.1% across both cohorts, thus enhancing treatment accuracy and minimizing unnecessary interventions (Fig. 3M). Moreover, compared to the eCura system, the high- and low-risk stratification based on the RSA model was able to distinguish T1 GC patients (Supplementary Fig. 10). The RSA combination model improved the prediction accuracy for recurrence risk compared to the eCura system (AUC = 0.724, 95% CI = 0.640–0.809), with the RSA model achieving an AUC of 0.786 (95% CI = 0.703–0.868) (Supplementary Fig. 12A-C).

Validation of gastroscopic biopsy specimens for predicting the 4-mRNA panel of LNM in T1GC patients

In addition to the surgically resected specimens from our training cohort, we obtained 122 matched biopsy samples, including 18 cases positive for LNM and 104 negative cases (Fig. 4C). Notably, a significant correlation among four genes was observed in the matched biopsy samples (Fig. 4A). Comparative analysis of gene expression between the matched biopsy and surgical specimens revealed no significant differences in these genes (Fig. 4B). The AUC for detecting LNM using clinical characteristics was 0.829 (95% CI: 0.738–0.919). In contrast, the AUC for the RSA model was 0.928 (95% CI: 0.880–0.977), suggesting that the RSA model is also suitable for preoperative biopsy samples (Fig. 4D; Supplementary Fig. 6C). Additionally, calibration curve analysis further validated the RSA model’s excellent predictive performance (Fig. 4G).

Fig. 4figure 4

Transcriptome validation stage for identifying LNM in gastroscopic biopsy specimens from patients with T1 GC. (A) Correlation analysis of the four mRNAs in gastroscopy biopsy specimens and their paired surgical resection specimens. (B) Comparison of expression levels of the four mRNAs in gastroscopy biopsy specimens and paired surgical resection specimens. (C) Screening process for the validation set of gastroscopy biopsy specimens. (D) ROC curves for various predictor variables in gastroscopy biopsy specimens. (E) Radar chart comparing evaluation metrics of different predictive models. (F) Confusion matrices of different predictive models. (G) Calibration curve of the RSA model in gastroscopy biopsy specimens. (H) Double-layer concentric circle plot showing clinical benefits of different predictive models. (I) Clinical impact curve of the RSA model within the validation set of gastroscopy biopsy specimens

In the biopsy cohort, the RSA model demonstrated the highest sensitivity (83.3%) and specificity (84.6%), outperforming the clinical model (sensitivity: 72.2%; specificity: 79.8%) and the 4-mRNA panel model (sensitivity: 94.4%; specificity: 69.2%) (Fig. 4E–F, Supplementary Table 11). After analyzing the clinical benefits of different models, we found that the RSA model reduced the overtreatment rate from 85.2% to 13.9%, compared to the clinical feature-only model (Fig. 4H). This result highlights the RSA model’s potential to improve clinical decision-making and reduce unnecessary treatments. Furthermore, the clinical impact curve showed that the nomogram provided superior net benefit across a broad, clinically relevant range of threshold probabilities, underscoring the RSA model’s predictive value (Fig. 4I).

Liquid biopsy specimen validation of a 4-mRNA panel predicting LNM in T1GC patients

The primary objective of our study was to develop a liquid biopsy-based assay for predicting LNM in T1 GC patients by adapting a tissue-based 4-mRNA biomarker panel into a serum-based test. In a training cohort of 125 LNM-positive and 22 LNM-negative patients, we used RT-qPCR to assess the diagnostic potential of these mRNAs. Initial quality control of peripheral blood samples confirmed normal A260/280 ratios (Fig. 5A). Logistic regression analysis indicated that each mRNA independently predicted LNM risk (all P < 0.05, Supplementary Fig. 6D-E, Supplementary Table 8), and multifactor logistic regression was used to construct a predictive nomogram for LNM (Fig. 5B). In the training cohort, the RSA model demonstrated an AUC of 0.873 (95% CI: 0.801–0.945, Fig. 5C), indicating strong predictive power for LNM. Based on a risk probability cutoff derived from the Youden index, T1 GC cases were dichotomized, and confusion matrix and radar plot analyses further supported the model’s predictive accuracy (Fig. 5E-upper; Fig. 5I; Supplementary Table 12). Calibration curve analysis confirmed the model’s high predictive performance (Fig. 5G). Applied to an external validation cohort (141 LNM-negative and 27 LNM-positive T1 GC patients), the RSA model achieved an AUC of 0.852 (95% CI: 0.774–0.930, Fig. 5D) with superior sensitivity (81.5%) and specificity (79.4%) compared to other models (Fig. 5E-lower; Fig. 5J; Supplementary Table 12). Further calibration analysis validated the model’s robust predictive accuracy (Fig. 5H).

Fig. 5figure 5

Transcriptome validation phase for identification of LNM in peripheral blood specimens from patients with T1 GC. (A) Quality control analysis of peripheral blood specimens at different time points. (B) Construction of an LNM prediction nomogram for T1 GC patients based on the 4-mRNA signature combined with clinical features. (C) ROC curves of various predictor variables within the training dataset. (D) ROC curves of different predictor variables within the validation dataset. (E) Confusion matrices of different predictive models in the training and validation datasets. (F) Double-layer concentric circle plot illustrating clinical benefits of various predictive models in the training and validation datasets. (G) Calibration curve of the RSA model in the training dataset. (H) Calibration curve of the RSA model in the validation dataset. (I) Radar chart comparing evaluation metrics of different predictive models in the training dataset. (J) Radar chart comparing evaluation metrics of various predictive models in the validation dataset. (K) Clinical impact curve of the RSA model for patients in the training dataset. (L) Clinical impact curve of the RSA model for patients in the validation dataset. (M) Comparative analysis of the eCura scoring system versus the RSA model for identifying LNM, using a combined dataset from the training and validation sets

The primary aim of our study was to evaluate the clinical utility of the RSA model, which combines a 4-mRNA biomarker panel and clinical features, for non-invasively identifying patients with actual LNM and reducing unnecessary surgeries in others. In the training cohort, only 15.0% of “high-risk” patients (22 of 147) had LNM, while the RSA model reclassified 69.4% as low-risk, reducing the potential overtreatment rate to 17.7% (26 of 147), a significant improvement over the 85.0% rate associated with traditional pathological criteria (Fig. 5F, upper panel). Similar findings were observed in the external validation cohort, where the RSA model markedly reduced overtreatment rates compared to other models (Fig. 5F, lower panel). Furthermore, clinical impact curve analysis across both cohorts supported the RSA model’s superior net benefit across a broad, clinically relevant range of threshold probabilities (Fig. 5K-L). Combined analysis of the training and validation cohorts showed that the RSA model reduced the conventionally assessed overtreatment rate from 84.4% to 56.0% (Fig. 5M), underscoring its effectiveness in clinical applications. In addition, the RSA model-based risk stratification effectively differentiated T1 GC patients, outperforming the eCura system (Supplementary Fig. 11). The RSA combination model improved the prediction accuracy for recurrence risk compared to the eCura system (AUC = 0.700, 95% CI = 0.630–0.771), with the RSA model achieving an AUC of 0.807 (95% CI = 0.744–0.870) (Supplementary Fig. 12D-F).

4-mRNA panel shows significant specificity for LNM prediction of T1 GC compared to other gastrointestinal cancers

To assess the specificity of our 4-mRNA panel in predicting LNM in T1 GC patients, we employed a three-pronged validation approach. First, we stratified 315 T1 GC patients in both training and validation cohorts based on peripheral blood tumor markers (CEA, CA19-9, CA72-4), resulting in 69 (21.9%) marker-positive and 246 (78.1%) marker-negative cases (Fig. 6A). Notably, the RSA model (AUC = 0.868, 95% CI: 0.803–0.933; Delong test, P < 0.001) outperformed the clinical model (AUC = 0.807, 95% CI: 0.803–0.933) for LNM prediction across both marker-positive and marker-negative cohorts (Fig. 6B–D).

Fig. 6figure 6

Identification and Prediction of LNM in Patients with Different Peripheral Blood Tumor Marker Status Using a 4-mRNA Signature and Prospective Clinical Validation. (A) Distribution of different peripheral blood tumor marker statuses in a new cohort combining the training and validation sets. (B-C) ROC curves of various predictive models for patients with different peripheral blood tumor marker statuses in the validation set (B, positive; C, negative). (D) Comparison of AUC, sensitivity, and specificity for different predictive models in assessing LNM in patients with varying peripheral blood tumor marker statuses. (E) ROC curve for LNM prediction in a prospective observational study (ChiCTR-IIR-17011197) using different predictive models. (F) Calibration curve of the RSA model in the prospective validation set. (G) Radar chart comparing evaluation metrics of different predictive models in the prospective validation set. (H) Clinical impact curve of the RSA model within the prospective validation set. (I) Confusion matrices for different predictive models in the prospective validation set. (J) Double-layer concentric circle plot displaying the clinical benefits of various predictive models in the prospective validation set. (K) Comparison of the expression levels of the four mRNAs (I: SDS, J: TESMIN, K: NEB, L: GRB14) in peripheral blood samples taken at baseline and three months post-surgery in the prospective validation set. (L) Comparison of LNM risk probabilities based on preoperative and postoperative RSA model formulas constructed from transcriptomic profiles and clinical characteristics in peripheral blood. (M) Comparison of ROC curves for LNM prediction based on the 4-mRNA signature in peripheral blood samples before and after surgery. (N) Recruitment status of patients with other gastrointestinal malignancies receiving endoscopic treatment. (O) ROC curve illustrating the 4-mRNA signature’s performance in predicting LNM in other gastrointestinal malignancies

The second approach involved prospective serum samples from a cohort (ChiCTR-IIR-17011197), collected pre-surgery (baseline) and at three months post-surgery (follow-up). ROC curve analysis demonstrated that the RSA model was the most accurate predictor, with an AUC of 0.812 (95% CI: 0.706–0.918), sensitivity of 80.0%, and specificity of 75.6% (P < 0.001; Fig. 6E). Calibration curve analysis closely matched the ideal result (Fig. 6F), and confusion matrix and radar plot analyses further confirmed the RSA model’s superior performance over both the clinical model (AUC = 0.707, sensitivity 66.7%, specificity 65.9%) and the 4-mRNA model alone (AUC = 0.788, sensitivity 100.0%, specificity 42.7%) (Fig. 6G, I, Supplementary Fig. 6F; Supplementary Table 13). Clinical benefit analysis demonstrated that the RSA model significantly reduced the high overtreatment rate from 84.5% to 14.4%, highlighting its potential for improved clinical decision-making and fewer unnecessary interventions (Fig. 6H, J). Further analysis of postoperative samples revealed significantly reduced levels of all four mRNAs (Fig. 6K) and a marked decrease in LNM risk probability (P < 0.001; Fig. 6L). ROC analysis showed a substantial drop in predictive accuracy for LNM post-surgery, with AUC declining to 0.675, emphasizing the biomarkers’ preoperative specificity (Delong test, P < 0.001; Fig. 6M).

In the third approach, we extended the analysis to assess the 4-mRNA panel’s diagnostic performance in other early gastrointestinal cancers (all T1 stage), including esophageal (n = 38), colon (n = 32), and rectal (n = 29) cancers (Fig. 6N). The 4-mRNA panel achieved significantly higher diagnostic accuracy for LNM in GC (AUC = 0.879) than in other cancers (esophageal: AUC = 0.621; colon: AUC = 0.696; rectal: AUC = 0.699; Fig. 6O). DeLong’s test confirmed the GC-specificity with statistically significant differences when compared to esophageal (P = 0.002), colon (P = 0.001), and rectal cancers (P = 0.001). Overall, these findings highlight the high specificity and clinical applicability of the 4-mRNA panel as a non-invasive blood-based biomarker, particularly well-suited for predicting LNM in T1 GC patients.

Biological characteristics and immune infiltration

To explore the immunological characterization of this feature, we performed GSEA functional enrichment analysis using RNA sequencing data of gastric cancer samples (LNM + vs. Non-LNM), and the results showed that tumor progression and immune regulation-related pathways such as Agiogenesis, PI3K AKT MTOR signaling, Inflammatory response, IL2 STAT5 signaling, Interferon α response, Interferon γ response, and TNFα pathway were significantly upregulated in the lymph node metastasis group (Fig. 7A). In addition, the same trend was shown in further GSVA enrichment analysis (Fig. 7B). Since cell types vary with local signaling networks and drive cellular activities within tumors, we investigated whether cell states and multicellular communities differ between different features. We calculated the relative abundance of each immune cell in tumor tissue using two algorithms, CIRBERSORT and MCPcounter. In terms of cell state, the results showed a trend of higher abundance of immune cells in patients with lymph node metastasis (Fig. 7C-D).

Fig. 7figure 7

Biological characteristics and immune infiltration in LNM and Non-LNM groups. (A-B) GSEA (A) and GSVA (B) enrichment analysis results from RNA sequencing data of gastric cancer samples, comparing LNM and Non-LNM groups. (C) Scores of combined cell types derived from the CIBERSORT algorithm, illustrating the proportional diversity among features. (D) Relative abundance of each immune cell calculated using the MCPcounter algorithm, displayed in a heat map. (E) Heat map showing expression levels of immune checkpoint genes in gastric cancer patients with LNM compared to those without LNM. (F) Violin plots illustrating differences in tumor purity, immune score, ESTIMATE score, and stromal score between gastric cancer patients with and without LNM. (G-H) t-SNE plots depicting cell type (G) and metastasis type (H) derived from single-cell data of gastric cancer patients. (I-Q) Violin plots displaying differences in immune checkpoint gene expression between gastric cancer patients with LNM and those without LNM

Next, we focused on the characterization of immune infiltration in the local immune signaling environment. We calculated the immune infiltration score of each sample by ESTIMATE analysis, and the results showed that the stromal, immune and ESTIMATE scores of patients in the LNM group were significantly higher than those in the Non-LNM group (Fig. 7F). Further immune checkpoint analysis results showed that most immune-related targets were expressed more highly in gastric cancer patients with lymph nodes (Fig. 7E, I-Q). These results indicate that immunotherapy has a potential effect on gastric cancer populations with lymph nodes. To further explore the biological functions that affect this feature, we used single-cell transcriptomes to reveal the potential role of related genes in the immune microenvironment. We obtained 10 cell subsets (Fig. 7G-H) by screening, dimensionality reduction, clustering and cell grouping of single-cell data, and showed the expression of 4 genes in the immune microenvironment. The results showed that SDS, TESMIN and NEB were all expressed in T cells and B cells (Supplementary Fig. 13).

留言 (0)

沒有登入
gif