Machine learning-based prognostic model for patients with anaplastic thyroid carcinoma

3.1 Characteristics of patients

We eventually included 1222 ATC patients, with 74.4% aged over 60 years and 744 males (60.9%). Among them, Caucasians comprised the majority (78.6%). At the time of diagnosis, 70.9% of these patients had distant metastases. Regarding treatment, 47.4% underwent surgery including subtotal or near-total thyroidectomy (STNTT, 20.6%) and total thyroidectomy (TT, 25.8%), 57.3% received radiotherapy, and 44.6% received chemotherapy. The patients were randomly divided into a training set (855 patients) and a validation set (367 patients). Detailed characteristics of these patient groups are summarized in Table 1.

Table 1 Patient demographics, tumor characteristics and treatment options of patients with anaplastic thyroid carcinoma3.2 Model development3.2.1 Cox model

In training set, the results of univariate Cox proportional hazards regression analysis identified age, marital status, tumor stage, tumor size, surgery, radiotherapy and chemotherapy as significant (all P < 0.05, Table 2). Then, multivariate Cox regression analysis was employed to further analyze these variables. The results of multivariate Cox regression showed that age 71–80 (HR = 1.456, 95%CI 1.192–1.777, P < 0.001), age > 80 (HR = 1.584, 95%CI 1.263–1.987, P < 0.001), tumor size 41-60 mm (HR = 1.642, 95%CI 1.309–2.059, P < 0.001), tumor stage distant (HR = 1.647, 95%CI 1.396–1.944, P < 0.001), surgery STNTT (HR = 0.669, 95%CI 0.556–0.805, P < 0.001), surgery TT (HR = 0.414, 95%CI 0.342–0.500, P < 0.001), RT yes (HR = 0.600, 95%CI 0.513–0.701, P < 0.001), and CT yes (HR = 0.576, 95%CI 0.489–0.679, P < 0.001) were significantly associated with survival outcomes of ATC patients (Fig. 1). The Cox model was developed using the prognostic factors identified through multivariate Cox analysis.

Table 2 Univariate cox proportional hazard regression to identify prognostic factors for patients with anaplastic thyroid carcinomaFig. 1figure 1

Forest plots of results from multivariate Cox regression analysis. STNTT subtotal or near-total thyroidectomy, TT total thyroidectomy, RT radiotherapy, CT chemotherapy

3.2.2 RSF model

The RSF model was similarly constructed using the training set. A total of 9 variables, including age, sex, race, marital status, tumor size, tumor stage, surgery, RT and CT were incorporated into the model.

3.3 Models comparison and validation

In terms of OS outcome prediction, the RSF model demonstrated excellent calibration, with an iBS of 0.055 (Fig. 2A), which is lower than the Cox model’s iBS of 0.063 (Fig. 2B). The RSF model also exhibited strong discrimination in predicting survival, with AUC values of 87.0 (84.1–89.9) at 1 year, 90.4 (87.5–93.4) at 3 years, and 91.0 (87.9–94.1) at 5 years, outperforming the Cox model (Fig. 2C–E). Additionally, the C-index value for the RSF model is 0.768, which is higher compared to the C-index of 0.758 obtained from the Cox model. To evaluate the clinical utility of the models, we generated DCA curves. As shown in Fig. 3, the RSF model also provided greater net clinical benefits than the Cox model at the 1-, 3-, and 5-year marks, with a larger area under the DCA curve (Fig. 3A–C).

Fig. 2figure 2

Model comparison. The Brier score (A and B) and the receiver operating characteristic (ROC) curve with the area under the curve (AUC) value (CE), were compared between the Cox and RSF models in the training group

Fig. 3figure 3

Decision Curves of the training cohort showed that the RSF model was better than the Cox regression model (AC). RSF random survival forests

The performance of the RSF in validation set was further evaluated. Patients in validation group were also accurately predicted by the RSF model with AUC values of 83.9 (79.2–88.6) at 1 year, 85.8 (79.5–92.1) at 3 years, and 86.7 (79.6–93.8) at 5 years (Fig. 4A). And RSF model also showed satisfactory net clinical benefits at 1, 3, and 5 years in validation set (Fig. 4B–D).

Fig. 4figure 4

RSF model validation. The receiver operating characteristic curve with the area under the curve value (A) and Decision Curves of the validation cohort (BD) showed the RSF model had a satisfactory performance. RSF, random survival forests

To evaluate the feature importance within the RSF model, we performed additional analysis. We ranked the time-dependent features according to their permutation importance, specifically illustrating how different variables contribute to the prediction over time. This provides a visual explanation of the model’s key factors. And Fig. 5 demonstrated that surgery, RT, and CT were the most influential predictors early on. Over time, the importance of all variables decreases and converges, indicating that the model’s reliance on individual factors diminishes as time progresses.

Fig. 5figure 5

Time-dependent features according to the permutation importance illustrated how different variables contribute to the prediction over time

3.4 Risk stratification

We further utilized the RSF model to stratify patients by risk. Based on the best survival difference, the optimal threshold for the RSF model’s predicted values was determined to be 83.99 in training cohort (Fig. 6A), dividing the patients into low- and high-risk groups in both training and validation sets. Kaplan–Meier analysis confirmed that patients in the high-risk group had significantly worse survival compared to those in the low-risk group in both cohorts (Fig. 6B and C).

Fig. 6figure 6

The RSF model-based risk stratification. Risk score distribution (A); Kaplan–Meier analysis was performed on high- and low-risk groups within both the training (B) and validation (C) cohorts

留言 (0)

沒有登入
gif