Intratumoral and peritumoral PET/CT-based radiomics for non-invasively and dynamically predicting immunotherapy response in NSCLC

Demographic and clinicopathological characteristics

The demographic and clinicopathological characteristics of the training cohort and the testing cohort are presented in Table S2. The training and testing cohorts consisted of 183 and 78 patients, respectively. Among them, DCB was achieved by 184 patients (70.5%), while the remaining 77 patients (29.5%) did not achieve DCB. No statistically significant differences were observed between the two cohorts concerning the baseline characteristics in this study. Among all enrolled patients, those who achieved DCB demonstrated several distinguishing characteristics when compared to those who achieved NDB. These factors included elevated levels of body mass index, PD-L1 tumor proportion score (TPS), and albumin, as well as decreased levels of serum cytokeratin 19 fragment antigen 211, neuron-specific enolase, squamous cell carcinoma antigen, platelets, and C-reactive protein (all p < 0.05).

Performance evaluation of the prediction models

We followed a predetermined radiomic workflow and developed the PET-Radscore, CT-Radscore, PET/CT-Radscore, and COMB-Radscore models. Subsequently, the radiomic model with the highest predictive ability was determined through a comprehensive evaluation of model performance. The receiver operating characteristic (ROC) curve shows that the COMB-Radscore model had the highest the area under the curve (AUC) value among all evaluated models. The AUC (ROC) values for the PET-Radscore, CT-Radscore, PET/CT-Radscore, and COMB-Radscore models in the training cohort were 0.846 (p-value < 0.0001), 0.856 (p-value < 0.0001), 0.768 (p-value < 0.0001), and 0.894 (p-value < 0.0001), respectively (Fig. S2a). In the testing cohort, the AUC values were 0.591 (p-value = 0.2295), 0.720 (p-value = 0.0035), 0.702 (p-value = 0.0074), and 0.819 (p-value < 0.0001), respectively (Fig. 2a).

Fig. 2: Performance evaluation of prediction models.figure 2

ROC curves (a), calibration curves (b), decision curves (c) and PR curves (d) of the four radiomic models in the testing cohort. e Difference in COMB-Radscore between DCB and NDB groups in the testing cohort. f Response (DCB/NDB) and COMB-Radscore for each patient in the testing cohort. ROC curves receiver operating characteristic curves, PR curves precision-recall curves, AUC area under the curve, CI confidence interval, DCB durable clinical benefit, NDB no durable clinical benefit.

The calibration curve showed a good fit of the COMB-Radscore, with no significant differences observed in either the training or testing cohort according to the Hosmer–Lemeshow test (Figs. S2b and 2b). Decision curve analysis revealed that all models achieved net clinical benefit against a treat-all-or-none plan, and the COMB-Radscore exhibited the highest net benefit across most threshold probability ranges (Figs. S2c and 2c). The precision-recall (PR) curve demonstrated that the COMB-Radscore achieved the highest AUC (PR) in both the training and testing cohorts, with values of 0.819 (p-value < 0.0001) and 0.647 (p-value = 0.0209), respectively (Figs. S2d and 2d).

In the testing cohort, the COMB-Radscore demonstrated superior prediction accuracy compared to the PET-Radscore (NRI = 0.34, pNRI = 0.035, IDI = 0.21, pIDI = 0.007), CT-Radscore (NRI = 0.29, pNRI = 0.038, IDI = 0.14, pIDI = 0.012), and PET/CT-Radscore (NRI = 0.25, pNRI = 0.111, IDI = 0.17, pIDI = 0.006) based on the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) analyses (Table S3). Furthermore, the evaluation metrics, including positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, accuracy (ACC), recall, F1 score, Matthews correlation coefficient (MCC), and Kappa, also indicated that the COMB-Radscore exhibited optimal predictive performance (Table S4).

Upon further analysis of the relationship between the COMB-Radscore and treatment response, significant statistical differences in the COMB-Radscore were observed among the different treatment responses. The groups with better treatment responses (DCB, CR/PR, CR/PR/SD) exhibited lower COMB-Radscore (Figs. S2e, 2e, and S3). The COMB-Radscore and therapeutic response (DCB/NDB) of each patient in the training and testing cohorts are illustrated in Figs. S2f and 2f, respectively.

Furthermore, the predictive performance of COMB-Radscore was compared to that of 10 serum inflammatory markers, and it was found that COMB-Radscore outperformed all other markers (Fig. S4a, b). Correlation analysis revealed no significant association between COMB-Radscore and 10 serum inflammatory markers (Fig. S4c).

Collectively, these results indicated superior performance of the COMB-Radscore model in comparison to the other radiomic models or serum inflammatory markers.

Clinical utility of the COMB-Radscore

To further explore the clinical applicability of the COMB-Radscore, we conducted an analysis to compare the low and high COMB-Radscore groups in terms of PFS, OS, and response to immunotherapy. The Kaplan-Meier survival curves revealed significant differences in PFS (training cohort log-rank test p-value < 0.0001; testing cohort log-rank test p-value = 0.00064) and OS (training cohort log-rank test p-value = 0.00013; testing cohort log-rank test p-value = 0.00019) between the low and high COMB-Radscore groups (Fig. 3a, b). Patients in the low COMB-Radscore group exhibited prolonged PFS and OS. In addition, a higher proportion of patients with DCB, CR/PR, or CR/PR/SD were observed within the low COMB-Radscore group compared to the high COMB-Radscore group (Fig. 3c).

Fig. 3: Clinical utility of the COMB-Radscore.figure 3

Kaplan-Meier analysis of PFS (a) and OS (b) for the low and high COMB-Radscore groups in the training (left) and testing (right) cohorts. c Proportional composition of different patient responses between the low and high COMB-Radscore groups in the training (left) and testing (right) cohorts. HR hazard ratio, CI confidence interval, PFS progression-free survival, OS overall survival, CR complete response, PR partial response, SD stable disease, PD progressive disease. The definitions of CR, PR, SD, and PD were based on the RECIST V.1.1 criteria.

Univariate Cox regression analysis was performed on all baseline clinicopathological variables to predict PFS and OS, followed by multivariate Cox regression analysis, which included variables with a p-value less than 0.05 to control for potential confounders. The COMB-Radscore remained a powerful and independent prognostic factor for predicting both PFS and OS (Tables S58).

In the PFS analysis, additional subgroup analyses were conducted based on various clinical and pathological variables. When stratified by factors such as gender, age, smoking status, histological type, T stage, N stage, M stage, overall stage, number of metastases, treatment strategy, and irAE, the COMB-Radscore remained a statistically significant prognostic classifier in most subgroups (Tables S9, 10).

Overall, our findings further confirm significant differences in patient prognosis when stratified by the COMB-Radscore.

Dynamic predictive ability of the COMB-Radscore

The performance of the COMB-Radscore in dynamically predicting subsequent treatment efficacy was validated using the follow-up 18F-FDG PET/CT scans of patients. In this part of the study, 25 patients from the training and testing cohorts were included, all of whom underwent follow-up 18F-FDG PET/CT scans 6 to 12 months post-treatment. Based on disease progression within 6 months after follow-up 18F-FDG PET/CT, we categorized the patients into two groups: the NDB (Follow-up) group consisting of 11 patients with disease progression and the DCB (Follow-up) group consisting of 14 patients without disease progression. The detailed procedures for validating the model’s dynamic predictive capabilities are provided in the supplementary methods.

The ROC and PR curves demonstrated a favorable predictive ability of the COMB-Radscore (Follow-up), yielding AUC values of 0.857 (p-value = 0.0026) for the ROC and 0.836 (p-value = 0.0046) for the PR curves, respectively (Figs. 4a and S5c). The calibration and decision curves also demonstrated good calibration and clinical applicability of the COMB-Radscore (Follow-up) (Fig. S5a, b). Likewise, the COMB-Radscore (Follow-up) exhibited strong performance in other model evaluation metrics (Table S11). Significant differences were observed among different subsequent treatment outcomes with respect to the COMB-Radscore (Follow-up), with lower levels observed in the DCB (Follow-up) group (Fig. S5d).

Fig. 4: Dynamic predictive ability of and changes in COMB-Radscore.figure 4

a ROC curve of COMB-Radscore (Follow-up) in the follow-up cohort. b Kaplan-Meier analysis of PFS (Follow-up) for low and high COMB-Radscore (Follow-up) groups in the follow-up cohort. c Changes in COMB-Radscore of patients in the DCB (Follow-up) group and NDB (Follow-up) group. d Changes in COMB-Radscore of two representative patients during treatment. ROC curve receiver operating characteristic curve, AUC area under the curve, CI confidence interval, HR hazard ratio, DCB durable clinical benefit, NDB no durable clinical benefit, PFS progression-free survival, BOR best overall response, PR partial response, PD progressive disease.

We analyzed the clinical utility of the COMB-Radscore (Follow-up). PFS (Follow-up) was defined as the duration from the initiation of 18F-FDG PET/CT follow-up scans to disease progression. Survival analysis showed a significant difference in PFS (follow-up) between the low and high COMB-Radscore (Follow-up) groups, with the low COMB-Radscore (Follow-up) group demonstrating prolonged PFS (Follow-up) compared to its counterparts (log-rank test p-value = 0.0036) (Fig. 4b). There was a higher proportion of patients achieving DCB (Follow-up) in the low COMB-Radscore (Follow-up) group compared to the other groups (Fig. S5e).

Subsequently, we conducted further analysis of the dynamic changes in COMB-Radscore. In the NDB (Follow-up) group, there was a significant increase in COMB-Radscore based on follow-up 18F-FDG PET/CT images compared to the baseline. However, no significant changes were observed in the DCB (Follow-up) group (Fig. 4c).

Figure 4d illustrates the radiological responses and the corresponding changes in COMB-Radscore for two representative patients from the previously mentioned retrospective cohort.

Patient A underwent a baseline 18F-FDG PET/CT scan prior to initiating first-line treatment, which yielded a baseline COMB-Radscore of −2.34 (COMB-low). The best overall response (BOR) to immunotherapy was a PR, classified as DCB, indicating that the baseline COMB-Radscore effectively predicted Patient A’s response to immunotherapy. After 9.73 months of treatment, a follow-up 18F-FDG PET/CT scan revealed a radiological PR, with the COMB-Radscore (follow-up) remaining stable at −2.38, showing no significant change from the baseline. This stability suggests that Patient A had a low risk of subsequent tumor progression and could continue to benefit from immunotherapy. As anticipated, Patient A continued to receive immunotherapy for an additional 13.70 months, with regular radiological evaluations confirming PR until they were lost to follow-up.

Similarly, Patient B also underwent a baseline 18F-FDG PET/CT scan prior to initiating first-line therapy, resulting in a COMB-Radscore of −2.76 (COMB-low). The BOR to immunotherapy was a PR, classified as DCB, indicating that the baseline COMB-Radscore also successfully predicted Patient B’s response to immunotherapy. After 9.87 months of treatment, Patient B underwent a follow-up 18F-FDG PET/CT scan. At this time, the follow-up COMB-Radscore had significantly increased to 0.46 (COMB-high), indicating a high risk of subsequent tumor progression, despite the radiological evaluation still showing a PR at that time. As anticipated, disease progression was detected during the follow-up CT scan after Patient B received two additional cycles of immunotherapy.

These findings suggest that, compared to relying solely on tumor size for radiological evaluation, the COMB-Radscore has the potential to facilitate the early detection of disease progression in patients.

Complementarity of COMB-Radscore and TPS-Lung

In this study, we further explored the spatial heterogeneity of the predictive capabilities of TPS and COMB-Radscore. The results showed that the TPS derived from biopsy specimens of primary lung tumors, designated as TPS-Lung, demonstrated superior predictive performance for immunotherapy efficacy compared to TPS derived from other regions. Likewise, the COMB-Radscore derived from primary lung tumors demonstrated superior predictive performance for the efficacy of immunotherapy compared to the COMB-Radscore derived from metastases at other locations. Detailed results regarding spatial heterogeneity are presented in the supplementary materials (Supplementary Results 1, Figs. S6S8, Tables S12, 13).

In this study, both the TPS-lung and COMB-Radscore showed strong predictive capabilities for the efficacy of immunotherapy. Consequently, we further explored the correlation and potential complementarity between them. The correlation analysis revealed no significant association between COMB-Radscore and TPS-Lung, and there was no statistically significant difference in the distribution of TPS-Lung between the low and high COMB-Radscore groups (Fig. S9).

Subsequently, two cohorts were established from the training and the testing cohorts. The first cohort, referred to as the COMB-Radscore prediction failure cohort, consisted of NDB patients with a low COMB-Radscore and DCB patients with a high COMB-Radscore. In this particular cohort, the AUC (ROC) for TPS-Lung was found to be 0.867 (p-value = 0.0002) (Fig. 5a). The second cohort, known as the TPS-Lung prediction failure cohort, included NDB patients with a TPS-Lung ≥ 50% and DCB patients with a TPS-Lung <1%. Within this specific cohort, the AUC (ROC) for the COMB-Radscore was calculated to be 0.926 (p-value = 0.0004) (Fig. 5b). Furthermore, when combining both COMB-Radscore and TPS-lung assessments, a more refined stratification of patients was achieved (Figs. 5c and S10). Specifically, individuals classified as COMB-low (low COMB-Radscore) + TPS-Lung ≥50% demonstrated significant benefits from immunotherapy in terms of a higher proportion of DCB patients and longer PFS and OS. Conversely, those categorized as COMB-high (high COMB-Radscore) + TPS-Lung <50% exhibited the opposite outcomes.

Fig. 5: Complementarity between COMB-Radscore and TPS-Lung.figure 5

a ROC curve of TPS-Lung in the COMB-Radscore prediction failure cohort. b ROC curve of COMB-Radscore in the TPS-Lung prediction failure cohort. c Kaplan-Meier analysis of PFS of four groups of patients stratified by COMB-Radscore and TPS-Lung. d ROC curves of the TPS-Radscore, COMB-Radscore, and TPS-Lung in the sub-training (left) and sub-testing (right) cohorts. ROC curves, receiver operating characteristic curves, AUC, area under the curve, CI confidence interval, HR hazard ratio, PFS progression-free survival, TPS tumor proportion score.

In the NSCLC population with a TPS < 50%, there is currently no recognized biomarker to distinguish the patient population that would benefit from immunotherapy monotherapy or combination therapy. Therefore, we further evaluated the potential application value of COMB-Radscore in this clinical scenario. Survival analysis revealed that in the low COMB-Radscore group, combination therapy did not offer a long-term PFS benefit over monotherapy. In contrast, patients in the high COMB-Radscore group who received combination therapy experienced a significant improvement in PFS compared to those on monotherapy. Detailed results are presented in the supplementary materials (Supplementary Results 2, Fig. S11). Subsequently, to develop an integrated model, we created a new sub-training and sub-testing cohort by selecting 128 and 54 patients with TPS-Lung information from the training and testing cohorts, respectively. In the sub-training cohort, an integrated model named TPS-Radscore was developed using the XGBoost algorithm by combining COMB-Radscore and TPS-Lung. The predictive ability of TPS-Radscore was significantly improved compared to COMB-Radscore or TPS-Lung alone. In the sub-training and sub-testing cohorts, the AUC (ROC) values for identifying DCB patients were 0.974 (p-value < 0.0001) and 0.888 (p-value < 0.0001), respectively (Fig. 5d). In the sub-testing cohort, NRI and IDI analyses demonstrated that the TPS-Radscore exhibited higher prediction accuracy than the COMB-Radscore (Table S14). Furthermore, other model evaluation metrics, were superior to those of COMB-Radscore as well (Table S15).

Biological basis of the COMB-Radscore

Radiogenomic analysis was conducted to explore the underlying biological basis of the COMB-Radscore. First, gene set enrichment analysis [17] (www.gsea-msigdb.org/gsea/index.jsp) was performed to identify potential molecular pathways associated with the COMB-Radscore. Significant enrichment in several immune-related molecular pathways was observed in the COMB-low group. (Fig. S12a). The TCIA cohort also exhibited similar findings (Fig. S12b).

Subsequently, we analyzed the differences in the immune microenvironment between low and high COMB-Radscore patients using IOBR [18], an immunology tool previously developed by our research group. We calculated the four different immune phenotypes (MHC molecules, effector cells, suppressor cells, and checkpoints) using IPS [19]. The results showed that the COMB-low group demonstrated higher scores for MHC molecules (p = 0.049) and lower scores for checkpoints (p = 0.026) compared to the COMB-high group (Fig. 6a). Additionally, we evaluated the infiltration abundance of immune cells in patients using Cibersort [20] and further analyzed the T cell functional status of the two groups of patients using gene markers for cytolytic activity(CYT) [21] and the T-cell-inflamed gene-expression profile (GEP) [22]. The results revealed that the COMB-low group exhibited elevated levels of CD8 + T cells (p = 0.019) and M1 macrophages (p = 0.065), while displaying decreased levels of M2 macrophages (p = 0.011) (Fig. 6b). Furthermore, the COMB-low group demonstrated higher CYT (p = 0.032) and an enhanced T-cell–inflamed GEP score (p = 0.0051) compared to the COMB-high group (Fig. 6c). Similar results were also observed in the TCIA cohort (Fig. S13).

Fig. 6: Differences in the tumor immune microenvironment between low and high COMB-Radscore groups.figure 6

Difference in four immune phenotype scores (a), abundances of 22 immune cells (b), cytolytic activity and T-cell–inflamed GEP score (c) between the low and high COMB-Radscore groups in the Internal radiogenomics cohort. d Difference in CD3 + CD8 + T cells, CD3 + CD8 + PRF1 + T cells, CD3 + CD8 + PD1 + T cells, and CD3 + CD8 + PD1 + PRF1 + T cells between the low and high COMB-Radscore groups in the Multiple immunofluorescence cohort. e Micrographs of the multiplex immunofluorescence for two representative patients. MHC major histocompatibility complex, EC effector cells, SC suppressor cells, CP checkpoints, IPS immunophenoscore, GEP gene expression profile, DAPI 4′,6-diamidino-2-phenylindole, CD3 cluster of differentiation 3, CD8 cluster of differentiation 8, PRF1 perforin-1, PD1 programmed cell death protein 1.

The composition of CYT genes was analyzed, revealing a significantly higher expression level of the PRF1 (p = 0.031) gene in the COMB-low group than in the COMB-high group (Fig. S14a). However, no significant difference was observed in the expression level of the GZMA (p = 0.26) gene. Similarly, we examined the expression levels of the immune checkpoint PDCD1 (p = 7.5e-05) and found a significant upregulation in its expression within the COMB-low group. Furthermore, correlation analysis demonstrated a significant negative association between PRF1 and PDCD1 with respect to the COMB-Radscore (Fig. S14c). Similar results were also observed in the TCIA cohort (Fig. S14b, d).

To further validate the disparities in CD8 + T cell quantity and function within the tumor immune microenvironment between the two patient groups, we conducted multiplex immunofluorescence staining on pathological tissue sections from 31 patients in the training and testing cohorts. The results showed a higher density of CD3 + CD8 + T cells, CD3 + CD8 + PRF1 + T cells, CD3 + CD8 + PD1 + T cells, and CD3 + CD8 + PD1 + PRF1 + T cells (all p < 0.05) in the COMB-low group compared to the COMB-high group (Fig. 6d). These findings suggest that patients with a low COMB-Radscore exhibit an immune-inflamed tumor microenvironment and are more likely to benefit from immunotherapy.

留言 (0)

沒有登入
gif