Non-invasive multimodal CT deep learning biomarker to predict pathological complete response of non-small cell lung cancer following neoadjuvant immunochemotherapy: a multicenter study

Introduction

Lung cancer remains the most lethal malignancy worldwide, characterized by a substantial incidence and mortality rate.1 Prognosis in advanced-stage non-small cell lung cancer (NSCLC) is relatively poor, making neoadjuvant treatment essential for tumor downstaging prior to surgical resection. Recently, the integration of immunochemotherapy into neoadjuvant regimens has marked a significant advancement, yielding improved progression-free survival and overall survival.2

The advent of neoadjuvant immunochemotherapy has been paralleled by an increase in patients achieving a pathological complete response (pCR), with reported rates ranging from 24% to 63%.3–5 The extent of residual tumor cells postsurgery is a critical prognostic factor, with lower residual tumor burden correlating with a more favorable prognosis.6 Confirming pCR in patients with NSCLC is a valuable predictor of their survival time.7 Furthermore, patients who achieve pCR may benefit from a “wait-and-see” strategy, similar to observation strategies employed in rectal cancer8 and esophageal cancer.9 10 Implementing this strategy can help prevent overtreatment and potentially lead to long-term survival.11 Studies have shown that excessive lymph node dissection in lung cancer can reduce the effectiveness of immunotherapy.12 For patients with pCR, avoiding thorough lymph node dissection may help achieve better outcomes in immune maintenance treatment. However, the definitive assessment of pCR still primarily relies on histopathological examination of the resected tumor, underscoring the need for a non-invasive, accurate method to identify pCR post-immunochemotherapy.

CT imaging is a standard, cost-effective tool for monitoring tumor response to treatment in lung cancer.13 However, when evaluating treatment response to neoadjuvant immunochemotherapy, there exists a substantial disparity between radiological and pathological evaluations. In the NADIM trial, 33% of patients with stable disease and 73% with partial response on imaging were found to have pCR.6 This difference is often due to pseudo-lesions resulting from lymphocyte infiltration, where radiological findings might not reflect actual tumor regression. Therefore, it is necessary to refine methodologies to bridge the gap between radiological and pathological evaluations in the context of neoadjuvant immunochemotherapy.

With the development of artificial intelligence (AI), extracting radiomics or deep learning features from CT can provide additional information, enhancing diagnosis, treatment, and prognosis in lung cancer.14 15 Several studies have investigated the predictive value of quantitative imaging features for neoadjuvant immunochemotherapy response in NSCLC.4 16 17 However, conventional deep learning feature extraction methods are typically based on models trained on natural images or limited datasets, limiting their applicability in medical research. To address this limitation, we previously developed a pretrained foundation model for CT images in lung cancer (namely FM-LCT) using datasets with large sample size and ample diversity. Notably, there has been no prior research exploring the use of non-contrast enhanced and contrast enhanced CT images to extract deep learning features and build prediction models for neoadjuvant immunochemotherapy response in NSCLC. In this study, we used the FM-LCT model to extract deep learning features from non-contrast enhanced and contrast enhanced CT images obtained before treatment. We then fused these deep learning features and constructed a machine learning model to predict pCR. This approach is expected to achieve more accurate prediction performance.

Here, we introduce an AI-based prediction model that uses deep learning features from pretreatment multimodal CT scans to predict pCR in patients with NSCLC undergoing neoadjuvant immunochemotherapy. Our method enables the multidimensional characterization of tumor features during neoadjuvant immunochemotherapy in NSCLC and has successfully developed a non-invasive biomarker to distinguish pCR. This approach has the potential to aid clinical decision-making in resectable NSCLC patients, prevent overtreatment, and facilitate personalized and precise cancer treatment.

Materials and methodsStudy data

This retrospective study is registered at http://clinicaltrials.gov (identifier: NCT06285058). We screened 295 patients with NSCLC who underwent surgery after neoadjuvant immunochemotherapy at Center A (Tongji Medical College Affiliated Union Hospital), Center B (Zhengzhou University First Affiliated Hospital), Center C (Yichang Central Hospital), and Center D (Anyang Cancer Hospital) between August 2019 and February 2023.

Inclusion criteria were: (1) NSCLC diagnosis confirmed by biopsy pathology, clinically staged IB to III; (2) completion of at least two cycles of neoadjuvant immunochemotherapy; (3) postoperative pathological evaluation of tumor and lymph nodes as per International Association for the Study of Lung Cancer (IASLC) guidelines. Exclusion criteria included: (1) absence of pretreatment CT image; (2) CT imaging more than 1 month before treatment initiation; (3) incomplete clinicopathological data.

The study subjects were presented in the flowchart depicted in figure 1. The clinicopathological data included age, gender, smoking history, tumor and family history, pretreatment neutrophil to lymphocyte ratio, pretreatment serum lactic dehydrogenase, pretreatment albumin, pretreatment clinical stage, tumor location, pathological type, and postoperative pathological response. The tumor, node, metastasis (TNM) staging system used was the 8th edition of the IASLC TNM staging system.18 The reporting of this study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines.19

Figure 1Figure 1Figure 1

Flowchart shows patient exclusion for each dataset. IASLC, International Association for the Study of Lung Cancer; NSCLC, non-small cell lung cancer.

Pretreatment evaluation and neoadjuvant administration

Prior to neoadjuvant treatment, a comprehensive evaluation was conducted to diagnose and stage the tumor. This evaluation involved various preoperative examinations, including enhanced MRI of the head, contrast enhanced CT of the chest, abdominal ultrasound, whole-body bone scintigraphy, or positron emission tomography-CT. Additionally, pathological diagnosis was obtained through procedures such as tissue biopsy under bronchoscopy, ultrasound bronchoscopy-guided transbronchial fine needle aspiration, or CT-guided percutaneous puncture. Typically, patients underwent 2–4 cycles of neoadjuvant immunochemotherapy, with a 3-week interval between each cycle. The treatment regimen comprised first-line immune checkpoint inhibitors, such as pembrolizumab (200 mg), nivolumab (360 mg), durvalumab (1500 mg), nivolumab (200 mg), tislelizumab (200 mg), or camrelizumab (200 mg), combined with platinum-based chemotherapy, as recommended by guidelines. Approximately 4 weeks after the completion of neoadjuvant treatment, surgical resection and systematic lymph node dissection were performed. The specific treatment plan, treatment cycles, and operation schedule for each patient were determined through multidisciplinary consultation among the participating treatment institutions.

Histopathological assessment and definition of pCR

Thoracic surgeons (YQ and CZ) with pathological diagnosis expertise collaborated with pathologists to evaluate treatment responses at each site. This evaluation followed the IASLC guidelines,20 defining pCR as the absence of viable tumor cells in both the tumor bed and lymph nodes (ypT0 and ypN0). Discrepancies were resolved through review by two pathologists, reaching a consensus via discussion.

CT acquisition

The study used CT scans with slice thicknesses ranging from 0.625 mm to 1.25 mm alongside the administration of contrast medium. Each patient received an intravenous injection of non-ionic iodinated contrast medium (350 mg/mL) at a dosage of 60–80 mL, with an injection rate of 2–3 mL/min. Bone reconstruction algorithms were used for reconstruction purposes. Detailed acquisition and reconstruction parameters were available in online supplemental file S1. The chest CT images underwent retrospective analysis using a window width of 1600 HU and a window level of −600 HU. A radiologist (GW) and a thoracic surgeon (GY) completed region of interest (ROI) segmentation using ITK-SNAP (V.3.8.0, available at http://www.itksnap.org/). Differences in opinion among the radiologists were resolved through discussion to achieve a consensus. Subsequently, the senior radiologist assessed the quality of the ROI and made necessary adjustments after the primary radiologist completed tumor lesion segmentation. To assess the robustness, 50 cases were randomly selected to estimate the intraclass correlation coefficients, with a value of ≥0.75 indicating robustness. Figure 2 illustrates the overall research design.

Figure 2Figure 2Figure 2

An illustration of the overall research design. (a) Pretraining of the foundation model using masked autoencoder method. (b) Deep learning features extraction, and prediction model training and test. (c) Quantitative analysis and evaluation including patient characteristics analysis, prediction model evaluation, and quantitative imaging feature analysis. LUNAI-eCT, contrast enhanced CT deep learning features model; LUNAI-fCT, non-contrast and contrast enhanced CT fused deep learning features model; LUNAI-uCT, non-contrast enhanced CT deep learning features model.

Image preprocessing

Imaging analysis, especially for CT scans, was sensitive to variations in slice thickness and scanner types. To address this, image preprocessing was performed on the original CT images. These steps included voxel resampling to achieve isotropic dimensions of 1 mm using B-spline interpolation and intensity normalization using the z-score method. This meticulous approach ensures a consistent and reliable foundation for subsequent model development and data analysis.

Feature extraction and model development

Given the intricate nature and wide-ranging variations found in lung cancer CT images, traditional feature extraction methods often fail to capture essential effectively. In contrast, deep learning-based feature extraction methods offer significant advantages, such as the ability to autonomously learn abstract and intricate features from images, eliminating the need for manual design. However, conventional deep learning feature extraction methods are typically based on models trained on natural images or limited datasets, limiting their applicability in medical research.

To address this limitation, we employed a pre-existing foundational model called FM-LCT as the feature extractor. The FM-LCT model was trained on a diverse dataset encompassing various lung cancer types and stages, using masked autoencoder (MAE) contrastive learning algorithms.21 For further details, refer to the project’s repository on GitHub (https://github.com/zhenweishi/FM-LCT).

We used the FM-LCT model to extract two feature sets, namely FS-uCT and FS-eCT, from the bounding box of tumors in non-contrast enhanced and contrast enhanced CT images, respectively. Each feature set consisted of a feature vector with a length of 768. A fused feature set (FS-fCT) was generated by performing average pooling on the FS-uCT and FS-eCT feature sets. To prevent model overfitting, we performed feature dimension reduction using principal component analysis, retaining 16 key features. Correlations were assessed using Pearson and Spearman methods. Ultimately, we developed three types of models, named LUng cancer NeoAdjuvant Immunochemotherapy (LUNAI), by the random forest modeling method: (1) using only non-contrast enhanced CT features (LUNAI-uCT model), (2) using only contrast enhanced CT features (LUNAI-eCT model), and (3) using the fused CT features (LUNAI-fCT model). The output probability of LUNAI-fCT model was used to generate an immunochemotherapy treatment response (Immu_TR) score, which represented the likelihood of patients achieving pCR following neoadjuvant immunochemotherapy.

Model evaluation and statistical analysis

Models were evaluated using the area under the receiver operating characteristic curve (AUC) with a 95% CI. Additionally, several metrics were calculated, including accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive (NPV). To quantify the impact of CT imaging features on model prediction, SHapley Additive exPlanations (SHAP) analysis was conducted.22 To gain insights into how our model makes predictions, we employed Gradient-weighted Class Activation Mapping (Grad-CAM) to generate saliency heatmaps.23

Statistical analyses were carried out using R software (V.3.6.0, The R Foundation) and Python SciPy (V.1.8.0, Python Software Foundation). A two-tailed p value of less than 0.05 was considered statistically significant. Fisher’s exact test or Pearson’s χ2 test and the Kruskal-Wallis test were used to compare the characteristics of patients in the training and external test datasets.

ResultsPatient characteristics

A total of 225 patients were initially acquired from four centers. Patients were excluded for 37 patients due to missing or inadequate quality of CT, 14 patients with a time interval greater than 1 month between CT imaging and treatment initiation, and 19 patients with incomplete clinicopathological data. This resulted in a final training and internal validation datasets comprising 113 patients from Center A, and the test dataset included 112 patients (Center B n=73, Center C n=20, Center D n=19). Table 1 provides a summary of the clinicopathological information of the patients, while table 2 details the patient characteristics across different cohorts. In the training and internal validation datasets, 111 patients had non-contrast enhanced CT images, 108 patients had contrast enhanced CT images, and 106 patients had both modalities. Correspondingly, the test dataset included 71, 74, and 35 patients, respectively, for the same categories.

Table 1

Comparison of clinical and pathological characteristics between pCR group and non-pCR group

Table 2

Patient characteristics across different cohorts

Performance evaluation of prediction models

Figure 3a compares the discrimination performance for different prediction models concerning pCR. The LUNAI-uCT, LUNAI-eCT, and LUNAI-fCT models achieved AUCs of 0.762 (95% CI 0.654 to 0.791), 0.797 (95% CI 0.724 to 0.844), and AUC of 0.866 (95% CI 0.821 to 0.883) in the test dataset, respectively. The confusion matrices for these models are shown in figure 3b, with cut-off values of 0.448 for the LUNAI-uCT, 0.458 for LUNAI-eCT, and 0. 434 for LUNAI-fCT. Subgroup analysis based on age (≥60 vs <60), smoking status (smoking vs non-smoking), and pretreatment overall stage (II vs III–IV) were depicted in figure 3c.

Figure 3Figure 3Figure 3

Performance of the three developed models for predicting pCR. (a) Receiver operating characteristic curves (ROC); (b) the confusion matrix for the three models; (c) the subgroup analysis includes age, smoking status, and pretreatment overall stage in the test dataset. LUNAI-eCT, contrast enhanced CT deep learning features model; LUNAI-fCT, non-contrast and contrast enhanced CT fused deep learning features model; LUNAI-uCT, non-contrast enhanced CT deep learning features model; pCR, pathological complete response.

The evaluation metrics, including accuracy, sensitivity, specificity, PPV, and NPV, were detailed in table 3. In the test dataset, the LUNAI-uCT model achieved an accuracy of 0.676 (95% CI 0.610 to 0.753). It exhibited high sensitivity (0.958) but lower specificity (0.532), indicating a higher rate of false positives. The PPV was 0.511, while the NPV was 0.962. The LUNAI-eCT model showed improved performance with an accuracy of 0.716 (95% CI 0.668 to 0.758). It had a sensitivity of 0.833 and a specificity of 0.660, reflecting a better balance in identifying positive and negative cases. The PPV for LUNAI-eCT was 0.541, and the NPV was 0.892. The LUNAI-fCT model demonstrated the highest performance, with an accuracy of 0.800 (95% CI 0.772 to 0.841), a sensitivity of 0.917, and a specificity of 0.739. The PPV for LUNAI-fCT was 0.647, and the NPV was 0.944. These results indicate that the LUNAI-fCT model exhibited the highest discrimination performance for predicting pCR, demonstrating its generalization in external unseen test datasets.

Table 3

Discrimination performance comparison of the prediction models for pCR

The Kolmogorov-Smirnov (KS) statistic test of the Immu_TR score was illustrated in figure 4a–c. The Immu_TR score derived from fused CT imaging features achieved the highest score of 0.825 (p<0.001) in distinguishing pCR and non-pCR groups across the entire patient population. The KS scores for non-contrast enhanced and contrast enhanced CT images were 0.704 (p<0.001) and 0.752 (p<0.001), respectively. The Immu_TR scores were visualized using t-distributed Stochastic Neighbor Embedding (figure 4d–f) and Uniform Manifold Approximation and Projection (figure 4g–i). These visualizations showed that the Immu_TR scores based on fused features provide clearer separation between pCR and non-pCR groups compared with the other methods.

Figure 4Figure 4Figure 4

Statistical analysis of the generated Immu_TR scores between pCR and non-pCR groups in all patient populations. Immu_TR, immunochemotherapy treatment response; pCR, pathological complete response.

Important model features and example model interpretation

The deep learning features in the FS-uCT, FS-eCT, and FS-fCT feature sets demonstrated low correlations (see online supplemental file). SHAP summary plots in figure 5 illustrate feature importance and their effects on the model’s prediction. In the LUNAI-uCT model (figure 5a), features 1 and 2 had the highest absolute SHAP values (longest bars), indicating a strong influence on predictions. For the LUNAI-eCT model (figure 5b), features 2 and 3 contributed the most significantly. For the LUNAI-fCT model (figure 5c), features 2 and 14 had the highest SHAP values, highlighting their substantial contributions to the model’s prediction.

Figure 5Figure 5Figure 5

Prediciton model interpretation by SHAP analysis for LUNAI-uCT, LUNAI-eCT, and LUNAI-fCT models in pCR prediction. LUNAI-eCT, contrast enhanced CT deep learning features model; LUNAI-fCT, non-contrast and contrast enhanced CT fused deep learning features model; LUNAI-uCT, non-contrast enhanced CT deep learning features model; pCR, pathological complete response; SHAP, SHapley Additive exPlanations.

Figure 6 shows an example implementation of the Immu_TR score generated by the LUNAI-fCT model for two patients with similar clinical characteristics. Both patient A and patient B had the same clinical T, N, and overall stages, smoking status, pathological type, lesion location, and similar ages and tumor volumes prior to neoadjuvant immunochemotherapy. Grad-CAM saliency maps indicated that the deep learning features were extracted from intra-tumoral regions. The Immu_TR scores generated for these patients were 0.791 and 0.213. Pathological evaluation confirmed that patient A achieved pCR, whereas patient B did not.

Figure 6Figure 6Figure 6

Schemtic shows an example implementation of the Immu_TR score in two patients with similar clinical characteristics, where patient A with pCR has a high Immu_TR score, and patient B with non-pCR has a low Immu_TR score. Grad-CAM saliency maps were generated for both non-contrast enhanced and contrast enhanced CT images. Grad-CAM, Gradient-weighted Class Activation Mapping; Immu_TR, immunochemotherapy treatment response; LUNAI-fCT, non-contrast and contrast enhanced CT fused deep learning features model; pCR, pathological complete response.

Discussion

In our study, we harnessed the capability of the FM-LCT foundation model to extract pertinent features from non-contrast enhanced and contrast enhanced CT images. The developed models (LUNAI-uCT, LUNAI-eCT, and LUNAI-fCT) demonstrated moderated performance in predicting pCR in NSCLC patients undergoing neoadjuvant immunochemotherapy. Notably, the LUNAI-fCT, which used fused features from both non-contrast enhanced and contrast enhanced CT images, achieved the best performance with an AUC of 0.866 (95% CI 0.821 to 0.883) in the test dataset. A novel Immu_TR score, derived from the predictive probability of the developed models, showed significant differences between pCR and non-pCR groups across all patient populations. This score offered a non-invasive, accurate method for predicting pCR in NSCLC patients to neoadjuvant immunochemotherapy, potentially aiding in identifying individuals most likely to benefit from this treatment strategy.

The burgeoning field of immunotherapy represents a transformative approach in the neoadjuvant treatment landscape of lung cancer. Its role in orchestrating the suppression of tumor-mediated immune evasion, coupled with the amplification of the host’s immune response to eradicate malignancies, is well-documented.3 24 Furthermore, there is a gradual increase in the number of lung cancer patients achieving pCR after receiving neoadjuvant immunochemotherapy.25 The ratio of residual tumor cells after surgery is closely linked to prognosis, confirming that pCR is valuable in predicting the survival of patients with NSCLC.26 Studies have demonstrated that an observational management approach, which avoids surgical organ preservation, is an effective choice for patients who achieve pCR.11 To prevent overtreatment, lung cancer management can draw on the observational strategies employed in colorectal cancer8 and esophageal cancer.9 10 Excessively aggressive lymph node dissection may diminish the efficacy of immunotherapy,12 thus the necessity of comprehensive lymph node dissection in all patients remains a topic of ongoing debate. Patients who achieve pCR may benefit from avoiding complete lymph node dissection, leading to improved immune maintenance therapy. However, it is important to note that confirmation of pCR can only be obtained through histopathological examination of surgical specimens. Consequently, the development of a non-invasive and accurate method to safely identify pCR after immunochemotherapy remains a significant challenge.

SITC Clinical Immuno-Oncology Network pointed out that immuno-oncology biomarker goals can be divided into risk biomarkers, diagnostic biomarkers, prognostic biomarkers, and predictive biomarkers, which can include molecular, histological, radiographic, and physiological indicators.27The quantitative imaging features captured from CT images may reflect potential tumor heterogeneity, angiogenesis, and cell density, which can reflect the biological behavior of tumors.28 29 Related studies have shown that quantitative imaging features extracted from CT images were correlated with immune microenvironment features, which can predict the pathological response of solid tumors.30 In previous studies, some research has predicted treatment response by extracting radiomics features before and after neoadjuvant immunochemotherapy. Liu et al included 89 lung cancer patients who underwent neoadjuvant immunochemotherapy, with 64 patients in the training group and 25 patients in the validation group. The radiomics model, combining clinical features, predicted major pathological response (MPR) with an AUC of 0.81 and an accuracy of 0.80 in the validation set.16 Han et al enrolled 206 lung cancer patients who underwent neoadjuvant immunochemotherapy and extracted radiomics features from contrast enhanced CT images taken before and after treatment to calculate the percentage of feature changes. These changes were integrated into a machine learning model for MPR prediction, resulting in an AUC of 0.732 and an accuracy of 0.660 in the testing group.31 Yang et al enrolled 110 lung cancer patients who received neoadjuvant immunochemotherapy and developed the NACIP model by extracting radiomics features from pretreatment and post-treatment contrast enhanced CT images. The NACIP model achieved an AUC of 0.85 and an accuracy of 0.81.4 She et al incorporated multi-center data and employed deep learning features to develop a predictive model for MPR in patients undergoing neoadjuvant immunochemotherapy, achieving an AUC of 0.75 in an external validation cohort.32 However, conventional deep learning feature extraction methods are typically based on models trained on natural images or limited datasets, limiting their applicability in medical research. To address the issue, our study employed a contrastive learning algorithm using MAE to train FM-LCT models on different datasets containing various types and stages of lung cancer as deep learning feature extractors.

Most research employed contrast enhanced CT to extract image features and construct models. However, no study has specifically investigated the disparities in constructing LUNAI models using both non-contrast enhanced and contrast enhanced CT. In our study, we independently extracted image features from both the non-contrast enhanced and contrast enhanced modalities of lung cancer patients before undergoing neoadjuvant immunochemotherapy. Subsequently, these features were fused to examine whether the fused model outperforms the single-modality prediction models. As a result, we constructed three models: LUNAI-uCT, LUNAI-eCT, and LUNAI-fCT. Significantly, LUNAI-fCT achieved the highest AUC value (0.866) among them and displayed superior performance compared with the other two models in additional evaluation metrics, including an accuracy of 0.800. In comparison to previous studies, LUNAI-fCT exhibited exceptional performance.

We used Grad-CAM to generate significant heatmaps, explaining how the FM-LCT foundation extracted deep learning features. The Grad-CAM saliency maps showed that the deep learning features were mainly extracted from intra-tumoral regions, but also involved information on tumor surrounding environments. Finally, two Immu_TR scores were generated, where patient A had a high Immu_TR score of 0.791 and patient B had a low Immu_TR score of 0.213. Pathological evaluation confirmed that patient A achieved pCR and patient B did not. This visualization of the model prediction process enhances the intuitiveness of clinical applications.

The study has several limitations and deficiencies. First, as a retrospective study, it may be subject to selection bias. Additionally, there are statistical differences in indicators such as gender, stage, and pathological type between the training group and the test group, potentially impacting model accuracy. Future prospective clinical studies should verify the model’s accuracy for improved performance. Second, although this is a multicenter study, the sample size in each center is small. Future research should aim for large sample sizes and multi-center validations. Third, the study did not include pretreatment biopsy specimen information, such as programmed cell death ligand 1 expression, tumor mutational burden, and sequencing data. Future research should integrate multimodal data such as pathology and sequencing to explore effective biomarkers for immunochemotherapy in lung cancer and achieve precise treatment for lung cancer immunochemotherapy.

In conclusion, the LUNAI-fCT, developed in this study, which combines deep learning features extracted from non-contrast enhanced and contrast-enhanced CT scans, demonstrating the highest predictive performance for pCR to neoadjuvant immunochemotherapy in patients with NSCLC. As an imaging biomarker, the LUNAI-fCT can non-invasively predict pCR to neoadjuvant immunochemotherapy in NSCLC. This capability has the potential to aid clinical decision-making in resectable NSCLC patients, prevent overtreatment, and facilitate personalized and precise cancer treatment.

留言 (0)

沒有登入
gif