HE4-based nomogram for predicting overall survival in patients with idiopathic pulmonary fibrosis: construction and validation

IPF patients show an elevated expression of HE4 protein but not HE4 gene

We compared the gene expression between 20 normal controls and 112 IPF patients from GPL14550 platform in GSE70866 data set, and identified 379 DEGs of which 207 genes were upregulated and 172 were downregulated (Fig. 1A, B). The gene expression of HE4 did not differ in the two groups (Fig. 1C). However, the serum protein levels of HE4 increased significantly (p < 0.001, Fig. 1D).

Fig. 1figure 1

Expression of HE4 in IPF patients. A Volcano map displays DEGs between IPF patients and normal controls in the GSE70866–GPL14550 data set. B Heatmap of the expression of DEGs. C HE4 gene expression in IPF patients compared to the NC group. D Protein levels of HE4 in IPF patients compared to the NC group (p < 0.001). ***p < 0.001, ns: non-significant

Next, we analyzed the correlation between HE4 expression and IPF patients’ clinical characteristics. There were no obvious differences between patients’ genders, ages, or GAP index levels in training cohorts (Fig. 2A–C). The HE4 gene levels were slightly higher in patients with a GAP index of 6–8 compared with 0–3 (p = 0.058). Similar results were found in validation cohort (Fig. 2D, E). An elevated expression of HE4 gene was significantly associated with high GAP index (GAP 4–5 vs. GAP 0–3: p = 0.019, GAP 6–8 vs. GAP 0–3: p = 0.009) (Fig. 2F). HE4 protein levels were also compared in IPF patients. HE4 did not show gender differences, but was higher in elderly patients (age > 65 vs. ≤ 65: p = 0.011) (Fig. 2G, H). HE4 protein level was also positively correlated with GAP index (GAP 6–8 vs. GAP 0–3: p = 0.007, GAP 6–8 vs. GAP 4–5: p = 0.043) (Fig. 2I).

Fig. 2figure 2

Association between HE4 expression and clinical characteristics of IPF patients. Data are shown for correlation between HE4 gene expression and A age, B gender, and C GAP index in training set; and DF in validation set. The protein levels of HE4 in different subgroups of G age, H gender, and I GAP index. *p < 0.05, **p < 0.01, ns: non-significant

High expression of HE4 predicts poor prognosis

It was shown in our previous article that high levels of HE4 protein in IPF patients correlated with poor OS. We focus on the prognostic value of HE4 gene. Based on Kaplan–Meier plots (Fig. 3A, C), elevated expression of HE4 gene was significantly associated with poor OS in both training and validation cohorts. The HR and 95% CI were 2.62 (1.61–4.24) for training set and 4.50 (1.76–11.53) for validation set, and the p values were < 0.001 and 0.002, respectively.

Fig. 3figure 3

HE4 gene exhibits superior prognostic value in IPF in both training and validation sets. A Kaplan–Meier plotter in training set. B TimeROC curves in training set. C Kaplan–Meier plotter in validation set. D TimeROC curves in validation set

The timeROC curves were drawn to further evaluate HE4’s values (Fig. 3B, D). The AUCs in predicting 1-, 2- and 3-year survival were 0.650, 0707, and 0.722 in training set, and 0.773, 0.771, and 0.830 in validation set. These results indicate that HE4 gene level possessed prognostic value in IPF.

Construction and validation of a HE4 gene-based prognostic model

We constructed a prognostic model using age, gender, GAP index, and HE4 gene expression to assess the utility of HE4 as an IPF prognostic factor. In a univariate and multivariate COX regression analyses, GAP index and HE4 gene level were found to be independent prognostic factors (Table 2). Based on HE4 gene and GAP index, a prognostic model was built that the results were visualized using a nomogram (Fig. 4D). The formula of the model is: risk-score = 0.16222182 * HE4 + 0/0.37580659/1.05003609 (for GAP index 0–3/4–5/6–8) + (− 1.1183375). The C-index of the model was 0.649. The risk-score was calculated for each patient and the median value was 0.4027. Patients were assigned to low- or high-risk groups, which is displayed in Fig. 4A. Kaplan–Meier plotter analysis showed IPF patients in high-risk group had a lower overall survival (OS) than those in low-risk group (HR: 3.49, 95%CI 2.10–5.80, p < 0.001) (Fig. 4B). Furthermore, the specificity and sensitivity of the model were also evaluated using time-dependent ROC analysis. In terms of 1-, 2-, and 3-year survival, the area under the ROC curve (AUC) were 0.639, 0.712, and 0.766, respectively (Fig. 4C). The calibration curve of the nomogram is shown in Fig. 4E, presenting good agreement between predicted and actual survival status. DCA was performed to measure the clinical effectiveness of the nomogram. It showed that the net benefits backed by the nomogram were slightly better than those by GAP index in predicting 2-year prognosis (Fig. 4F–H).

Table 2 Results of univariate and multivariate Cox regression analyses in the training setFig. 4figure 4

Construction of the risk model in the training cohort. A Distribution and survival status of patients based on the risk model. The left side of the dotted line: low-risk population. The right side: high-risk population. B Kaplan–Meier curves for the OS of patients in the low- and high-risk groups. C Time-dependent ROC curves of 1, 2, and 3 years. D Nomogram for prediction of overall survival rates in IPF patients based on the result of multivariate cox regression analysis. E Calibration curves of the nomogram prediction of 1-, 2-, and 3-year OS rates in IPF patients. F 1-Year DCA curve of the nomogram. G 2-Year DCA curve of the nomogram. H 3-Year DCA curve of the nomogram

To evaluate its prognostic efficacy, the prognostic model was applied to the validation set. Patients in the validation cohort was calculated with a risk score and divided into low- and high-risk groups by cutoff value set as 0.4027. As a result, 29 patients were included in the low-risk group, and 35 in the high-risk group. It is indicated IPF patients with high risk-score have elevated rate of death (Fig. 5A). Kaplan–Meier plotter analysis further validated the results (HR: 6.00, 95%CI 2.–4–17.67, p = 0.001) (Fig. 5B). The AUCs of 1-, 2-, and 3-year survival were 0.784, 0.814, and 0.874 (Fig. 5C). Calibration (Fig. 5D) and DCA curves of 1, 2, and 3 years (Fig. 5E–G) were plotted which suggested similar efficacy to training set.

Fig. 5figure 5

Assessment of the risk model in the validation cohort. A Risk scores were calculated for each patient using the model above, with a cutoff value of 0.4027 for low- and high-risk groups. The distribution and survival status of these patients were plotted. B Kaplan–Meier curves. C Time-dependent ROC curves of 1, 2, and 3 years. D Calibration curves. E DCA curve of 1 year. F DCA curve of 2 years. G DCA curve of 3 years

Construction of a HE4 protein-based prognostic model

Since HE4 protein levels were correlated with IPF characteristics, we considered HE4 as a potential prognostic biomarker. We constructed a prognostic model which incorporated with age, gender, smoking history, GAP index, and the levels of HE4 and KL-6 proteins. KL-6 is a glycoprotein mainly secreted by type II alveolar epithelium and glandular cells. As part of the tissue repair process, it plays a key role in IPF pathophysiology. In a number of studies, it has been confirmed that an increase in KL-6 levels indicates a poor prognosis for patients with IPF. Among the 59 IPF patients, 16 patients were unable or refused to accept the pulmonary function test. Thus, we could not calculate the GAP index for these people. The 16 patients were therefore excluded for further analysis. Following the exclusion of patients with incomplete clinical information, 43 patients were finally included in the analysis.

Table 3 shows results of univariate and multivariate COX regression analysis. Similar to the results above, HE4 protein level and GAP index were also independent prognostic factors who were subsequently utilized to draw a nomogram (Fig. 6D). The formula of the model is: risk-score = 0.00427293 * HE4 + 0/1.04647188/1.16579674 (for GAP index 0–3/4–5/6–8) + (− 0.9717294). The C-index of the model was 0.7. The distribution of patients in high- or low-risk group is displayed in Fig. 6A. A high risk-score was significantly correlated with poor prognosis (HR: 3.51, 95%CI 1.65–7.48, p = 0.001) (Fig. 6B). The AUCs of 1-, 2-, and 3-year survival were 0.823, 0.820, and 0.758 (Fig. 6C). Besides, calibration and decision curve analysis were performed, indicating a good prediction effect, especially in 2-year prognosis (Fig. 6E–H).

Table 3 Results of univariate and multivariate Cox regression analyses in clinical samplesFig. 6figure 6

Construction of a new risk model in IPF patients. A Distribution and survival status of patients based on the model. B Kaplan–Meier curves for the OS of patients in the low- and high-risk groups. C Time-dependent ROC curves of 1, 2, and 3 years. D Nomogram for prediction of 4-year OS rates in IPF patients based on HE4 protein levels and GAP index. E Calibration curves of the nomogram. F 1-Year DCA curve of the nomogram. G 2-Year DCA curve of the nomogram. H 3-Year DCA curve of the nomogram

留言 (0)

沒有登入
gif