A prognostic framework for predicting lung signet ring cell carcinoma via a machine learning based cox proportional hazard model

Patient characteristics

We identified 731 LSRCC patients after screening. In our cohort, the highest incidence rate of LSRCC was in patients between 22 and 64 years old (47.33%), followed by patients with 65–77 years old (37.76%) and patients aged 78–85 + years old (14.91%). In general, the majority of patients were young. Among the patients, 53.49% of patients were male, and the rest of them were female (46.51%). There were slightly more male patients than female, but overall, there was no significant difference between them as a whole. There were 82.08% of patients of white race, 10.53% were black people and other ethnic groups accounted for 7.39%. The marital status, more than half of the patients (58.82%) were married. Totally 56.63% of the patients had tumors on the right, 36.94% had on the left and 6.43% had in other positions. Most patients had tumors in the upper lobe (43.78%), followed by the lower lobe (22.98%), NOS (21.61%), middle lobe (6.98%), and main bronchus (4.65%). Most patients (52.26%) were diagnosed with T4, followed by T2 (19.43%), T1 (12.59%), TX (9.58%), and T0 (1.23%). The majority of patients (42.82%) were diagnosed with N2, followed by N0 (21.61%), N3 (21.07%), N1 (8.07%), and NX (6.43%). More than half of the patients had distant metastasis (56.77%) at diagnosis, followed by no distant metastasis (39.67%) and unknown distant metastasis (3.56%). Moreover, surgery was performed in 19.70% of patients, 55.54% of them received chemotherapy, and 35.02% received radiation.

In conclusion, the majority of patients were male, white, and married. The most common LSRCC classifications were tumors on the right laterality and T4, N2, and M1.Stage. In addition, 19.70% of patients received surgery, 55.54% received chemotherapy, 35.02% received radiation, and 24.49% received chemotherapy plus radiation (Table 1).

Table 1 Characteristics of LSRCC patientsRisk factors of OS and CSS

A total of 731 patients were included in the study, and they were randomly assigned to two different cohorts according to the ratio of the training cohort (n = 511) and the validation cohort (n = 220). To identify the prognostic factors, we performed univariate and multivariate Cox regression analyses in the training cohort.

According to the univariate Cox analysis, we found age, chemotherapy, marital status, primary site, surgery, T.Stage, M.Stage, and N.Stage were significantly associated with OS. Single patients (HR:1.22; Cl: 1.01–1.46) aged 65–77 (HR:1.36, Cl: 1.12–1.67) and 78–85+ (HR:2.01, Cl: 1.54–2.62) years old, diagnosed with advanced T.Stage, having regional lymph node invasion and distant metastasis were associated with a poorer prognosis. The results also demonstrated that compared with patients with primary tumor located in the main bronchus, those in the upper lobe (HR: 0.58, Cl: 0.38–0.89), lower lobe (HR: 0.49; Cl: 0.31–0.76), middle lobe (HR: 0.47; Cl: 0.28–0.80) were all associated with a better prognosis. Patients who underwent surgery (HR: 0.29; Cl: 0.23–0.38) and chemotherapy (HR: 0.64; Cl: 0.53–0.76) had a better prognosis.

Moreover, age, chemotherapy, laterality, primary site, surgery, M.Stage, and N.Stage were independent prognostic factors for CSS. Patients aged 78–85+ (HR:1.80; Cl: 1.36–2.40) years old, having tumors in other positions (HR:1.62, Cl:1.11–2.38), regional lymph node invasion and distant metastasis were associated with a poorer prognosis. Compared with tumors in the main bronchus, upper lobe (HR: 0.29; Cl: 0.19–0.46), lower lobe (HR: 0.30; CI: 0.18–0.48), middle lobe (HR: 0.25; Cl: 0.14–0.44), NOS (HR: 0.43; Cl: 0.27–0.69) were all associated with a better prognosis. Patients who underwent surgery (HR: 0.20; Cl: 0.15–0.27) and chemotherapy (HR:0.73; Cl:0.60–0.89) had a better prognosis (Table 2).

Further, we performed multivariate analysis using Cox proportional hazards regression modeling and found that age, chemotherapy, primary site, surgery, T.Stage, and M.Stage were independent prognostic factors for OS. Patients aged 65–77 (HR:1.56; Cl:1.25–1.94) and 78–85+ (HR:1.80; Cl:1.35–2.39) years old, diagnosed with advanced T.Stage, having distant metastasis were associated with a poorer prognosis. Compared with patients with primary tumor located in the main bronchus, those in the upper lobe (HR: 0.67; Cl: 0.42–1.05), lower lobe (HR:0.55; Cl:0.35–0.88), middle lobe (HR:0.59; Cl:0.34–1.04), NOS (HR:0.60; Cl:0.37–0.98) were all associated with a better prognosis. Patients who underwent surgery (HR: 0.20; Cl: 0.15–0.27) and chemotherapy (HR:0.73; Cl:0.60–0.89) had a better prognosis. According to the results of CSS, eight variables including age, chemotherapy, radiation, primary site, surgery, T.Stage, M.Stage, and N.Stage were identified as independent prognostic. Patients aged 65–77 (HR: 1.24; Cl: 0.99–1.56) and 78–85+ (HR:1.52; Cl:1.11–2.09) years old, diagnosed with advanced T.Stage, having regional lymph node invasion and distant metastasis were associated with a poorer prognosis. Compared with patients with primary tumor located in the main bronchus, those in the upper lobe (HR:0.67; Cl:0.42–1.05), lower lobe (HR:0.44; Cl:0.27–0.71), middle lobe (HR:0.41; Cl:0.23–0.75), NOS (HR:0.40; Cl:0.24–0.67) were all associated with a better prognosis. Patients who underwent surgery (HR:0.25; Cl:0.17–0.36), chemotherapy (HR:0.41; Cl:0.33–0.52), and radiation (HR:0.80; Cl:0.65-1.00) all had a better prognosis (Table 3).

Table 2 Univariate analysis of overall survival and cancer special survivalTable 3 Multivariate analysis of overall survival and cancer special survivalComparison of the models based on Cox and random forest algorithms

Both in the training and verification set, the C-index based on the Cox model is higher than that of the random forest model, which also reflects that the Cox model has a stronger accuracy in model construction than the random forest algorithm in this study (Table 4).

Table 4 C-index-based evaluation of survival prediction with Cox regression and random forestNomogram construction

Age, chemotherapy, primary site, surgery, T.Stage, and M.Stage were identified as independent prognostic factors via multivariate Cox analysis (all p < 0.05) and further included to establish the nomogram. However, it is important to consider both clinical and statistical significance when choosing inclusion variables (Iasonos et al. 2008). Therefore, we also included radiation and N-Stage in the predictive model (Fig. 2). To use the nomogram, an individual patient’s value is located on each variable axis, and a line is drawn upward to determine the number of points received for each variable value. The sum of these numbers is located on the total points axis, and a line is drawn downward to the survival axis to determine the likelihood of 1-year, 3-year, and 5-year survival time (Zheng et al. 2019). The nomogram revealed that surgery, T.Stage, and chemotherapy had the largest impact on the patient’s prognosis.

Fig. 2figure 2

The nomogram of the 1-year, 3-year, and 5-year overall survival of patients in the training cohort

Deep verification

Model 1 was constructed using six significant factors derived from Cox multivariate analysis and two other clinically relevant factors, and six significant factors obtained from Cox multivariate analysis were used to construct model 2. Compared the predictive power of the two models and validated while performing deep cross-validation using bootstrap repeat sampling. The results showed that the C-index of model 1 was slightly higher before and after cross-validation, thus its prediction power was also stronger (Supplementary Fig. 2).

留言 (0)

沒有登入
gif