Identifying immunohistochemical biomarkers panel for non-small cell lung cancer in optimizing treatment and forecasting efficacy

Patient demographics

This study analyzed the data of 140 NSCLC patients, with 69 receiving immunotherapy and 71 receiving chemotherapy (Table 1). None of the patients harbored targetable drivers approved by European Medicines Agency (EMA). Among those who received immunotherapy, the median age was 66 years and 73.9% of patients were male. Non-squamous cell carcinoma was the most commonly observed histological type, observed in 37 patients (53.6%), followed by squamous carcinoma in 32 patients (46.4%). At diagnosis, 25 patients (36.2%) were in stage III and 44 (63.8%) were in stage IV. Similarly, in the group that received chemotherapy, the median age was 66 years and 78.9% of the patients were male. Non-squamous cell carcinoma was observed in 42 patients (59.2%), whereas squamous carcinoma was observed in 29 patients (40.8%). At the time of diagnosis, 32 patients (45.1%) were in stage III and 39 (54.9%) were in stage IV.

Table 1 Demographics and clinical characteristics of patientsTherapeutic decision-making based on IHC and patient characteristics

The method for identifying tumors and determining the appropriate course of treatment relies heavily on pathological examination and clinical guidelines. Nevertheless, clinicians’ expertise plays a significant role in this process. Hence, we developed a machine-learning model to provide clinicians with automated treatment recommendations. Our model uses a supervised binary classification algorithm to predict the effectiveness of immunotherapy and chemotherapy based on patient characteristics and IHC biomarker results. We utilized the LightGBM model, an integrated machine-learning algorithm, to establish the relationship between the input and output (Fig. 1A). LightGBM is a highly efficient gradient boosting decision tree algorithm that utilizes advanced techniques such as gradient-based one-side sampling and exclusive feature bundling to handle large datasets and feature sets with ease. The innovative histogram-based approach effectively reduces the number of split points, resulting in faster training times and improved performance [15]. The model achieved impressive accuracy, precision, recall, and f1 scores of 82.1, 81.2, 82.1, and 81.6%, respectively. Figure 1B shows the AUCs for the therapeutic regimen (chemotherapy and immunotherapy) in the validation group, which were both 0.93. The algorithm also identified important markers, such as PD-L1, Ki67, p63, tumor stage, and napsin A (Fig. 1D).

Fig. 1figure 1

Prediction of therapeutic regimen through clinical features and IHC results by lightGBM model. (A) Receiver operating characteristic (ROC) curves of a prediction model in the validation cohort; (B) Confusion matrix of a prediction model in the validation cohort; (C) Top 10 important biomarkers from the lightGBM model

Prognostic prediction predicted by machine-learning models

To distinguish between patients with good and poor prognoses, we separated them into two groups based on their PFS time (< 180 days and ≥ 180 days). By utilizing the LightGBM model and considering patient characteristics, IHC results, and therapeutic regimens, we were able to predict PFS with an accuracy rate of 82.1%, precision rate of 82.3%, and recall rate of 82.1%. The f1 score, which considers both precision and recall, also yielded a score of 82.1%. The validation group showed excellent AUC values for PFS times less than 180 days and 180 days or more, with AUC of 0.89 and 0.89, respectively, as depicted in Fig. 2A. Additionally, the algorithm provides critical indicators, such as Ki67, PD-L1, TTF-1, CK5/6, and age, as shown in Fig. 2C.

To more rigorously test our model’s performance in prognostic prediction, we applied it to samples from external The Cancer Genome Atlas (TCGA) datasets. Surprisingly, when the model trained with our data was used in predicting the TCGA datasets, the accuracy rate, precision rate, recall rate and F1 scores were 96.8%, 97.0%, 96.5% and 96.7%, respectively. The AUCs for the PFS times (< 180 days and ≥ 180 days) in the validation group from TCGA datasets are both 0.98 (Fig. 2D). Figure 2E displays the confusion matrix computed in validation cohort, which further demonstrates how LightGBM model was able to accurately predict each class.

Fig. 2figure 2

Prognostic Prediction through clinical features, IHC results, and therapeutic regimen by lightGBM model. (A) ROC curves of a prediction model in the validation cohort; (B) Confusion matrix of a prediction model in the validation cohort; (C) Top 10 important biomarkers from the lightGBM model; (D) ROC curves of a prediction model in the validation cohort from external TCGA datasets; (E) Confusion matrix of a prediction model in the validation cohort from external TCGA datasets

Analysis of vital biomarkers

According to the LightGBM model, IHC results have the potential to guide treatment decisions and can serve as prognostic markers. In light of this finding, we focused our efforts on the critical biomarkers identified in Figs. 1C and 2C.

The survival rates of the patients who underwent chemotherapy or immunotherapy were not significantly different (Fig. S1). Nonetheless, patients with both squamous and non-squamous cell cancers were more likely to choose immunotherapy if PD-L1 was highly expressed. Figure 3A and B show the immunohistochemical expression results of the chemotherapy and immunotherapy groups, respectively. This observation was statistically significant, with p-values below 0.001 (Fig. 3C). For our patients with PD-L1 ≥ 50%, 95% of them chose immunotherapy. And for patients with PD-L1 < 50%, 70% of them chose chemotherapy. Furthermore, patients who received immunotherapy and had PD-L1 TPS levels ≥ 50% had a longer mPFS of 470 days versus 180 days for those with levels < 50%, with a p-value of 0.002 (see Fig. 3D).

Fig. 3figure 3

Representative images of IHC expression patterns of PD-L1 in chemotherapy (A) and immunotherapy (B) group; (C) Expression of PD-L1in chemotherapy and immunotherapy group; (D) Kaplan–Meier curves for PFS of PD-L1 in immunotherapy group. PD-L1 TPS = 50% is used as the cutoff criterion

Fig. 4figure 4

Representative images of IHC expression patterns of TTF-1 in non-squamous (A) and squamous (B) group; (C) Expression of TTF-1 in non-squamous and squamous group; (D) Kaplan–Meier curves for PFS of TTF-1 for non-squamous cell cancer patients. The critical value of negative and positive of TTF-1 is used as the cutoff criterion

Regarding TTF-1, non-squamous NSCLC patients had high expression levels (Fig. 4A and B), which did not affect clinical decisions, as depicted in Fig. 4C. However, among non-squamous cell cancer patients, those who were TTF-1 positive had a longer mPFS of 550 days compared to TTF-1 negative patients with only 110 days. This difference was statistically significant, with a p-value less than 0.001, as illustrated in Fig. 4D.

Meanwhile, squamous NSCLC patients had high expression levels of p63 and CK5/6 in Fig. 5. It was noted that patients with squamous cell carcinoma who exhibited high levels of p63 expression had a significantly longer mPFS of 410 days, compared to those with negative expression who had an mPFS of only 100 days. This difference was statistically significant with a p-value of less than 0.001, as shown in Fig. 6B. Additionally, patients with squamous cell carcinoma showed higher levels of CK5/6 expression, as shown in Fig. 6C. Among these patients, those with medium positive and strong positive expression of squamous carcinoma had a significantly longer mPFS of 550 days, compared to those who were weakly positive and had an mPFS of only 160 days. This difference was also statistically significant with a p-value of 0.0007, as depicted in Fig. 6D.

Next, we found that CK7 and napsinA were highly expressed in non-squamous carcinoma patients. However, our PFS analysis did not reveal any significant differences, as evidenced by Fig. S2 and S3. Similarly, there was no variation in Ki67 and Villin expression, as depicted in Fig. S4 and S5. Our research indicated that patients in clinical phase III had a longer mPFS of 590 days, compared to 220 days for clinical phase IV (p < 0.001, Fig. S6A). Additionally, we observed that patients with low differentiation had poorer prognosis (240 vs. 550 days, p = 0.002, Fig. S6B). Finally, we did not identify any statistically significant correlations among PFS, age, and tumor type, as shown in Fig. S6C and D.

Fig. 5figure 5

Representative images of IHC expression patterns of p63 in non-squamous (A) and squamous (C) group, and of CK5/6 in non-squamous (B) and squamous (D) group

Fig. 6figure 6

Expression of p63 (A), and CK5/6 (C) in non-squamous and squamous group; Kaplan–Meier curves for PFS of p63 (B), and CK5/6 (D) in squamous group. PD-L1 The critical value of negative and positive of p63 is used as the cutoff criterion. The critical value of weakly positive and moderately positive of CK5/6 is used as the cutoff criterion

Combined diagnosis by IHC panel

Through analysis using the lightGBM model, we successfully identified six key biomarkers that formed a unique detection panel. This panel can predict the optimal therapeutic regimen and PFS. PD-L1, TTF-1, P63, CK5/6, disease stages, and differentiation degree were the identified biomarkers, and the heatmap in Fig. 7 illustrates the differences in the expression of these biomarkers between different groups. PD-L1, in particular, can effectively guide the selection of treatment plans, as patients with high PD-L1 expression are more likely to benefit from immunotherapy. Moreover, higher expression levels of PD-L1 in the immunotherapy group were associated with longer PFS, indicating better treatment outcomes. Higher expression levels of TTF-1 and CK5/6 predicted better therapeutic outcomes. Disease stages and differentiation are relatively well understood, with stage III patients having a better prognosis than those with stage IV disease (Fig. S6A), whereas patients with low differentiation have a worse prognosis (Fig. S6B).

Fig. 7figure 7

Heat map of median arcsinh-transformed biomarkers expression normalized per batch to a mean of zero. The color scheme employed in the map highlights whether a biomarker is over-expressed (red) or under-expressed (blue). The topmost section of the map displays bars that represent different patient groups, where the green bar denotes patients with a PFS time < 180 days and the deep pink bar represents patients with a PFS time ≥ 180 days. In the representation of a therapeutic regimen, dark blue represents chemotherapy, and pink indicates immunotherapy. Each column in the map represents a unique patient sample collected at a specific time point

留言 (0)

沒有登入
gif