Machine learning constructs a diagnostic prediction model for calculous pyonephrosis

General clinical characteristics

268 patients with renal pelvis effusion and upper urinary calculi who underwent ultrasound-guided percutaneous renal puncture while hospitalized were included in this study, which adhered strictly to the inclusion and exclusion criteria. The patients were found in the urinary calculi database between January 2018 and December 2022. The general clinical features of every participant in the study are listed in Table 1. The hydronephrosis group (n = 179) and the pyonephrosis group (n = 89) included 268 individuals with calculous hydronephrosis. There were 63 patients with pyonephrosis and 126 individuals with hydronephrosis in the training set (n = 189). There were 26 patients with pyonephrosis and 53 individuals with hydronephrosis in the testing set (n = 79).

26 men, or 41% of the training set, were in the pyonephrosis group. 37 women, or 59%, were in the same group. There were 55.46 years on average. The data revealed that 17% with diabetes, 27% with hypertension, 44% with history of ipsilateral stone surgery, 13% with renal colic, and 37% had fever. In the hydronephrosis group, 42 women made up 33% of the total, with 84 males making up 67% of the group. 52.91 was the average age. 5% with diabetes, 41% with hypertension, 37% with a history of ipsilateral stone surgery, 77% had renal colic and 18% with fever.

In the training set, two groups differed significantly in terms of gender (p = 0.001), diabetes (p = 0.009), fever (p < 0.001) and renal colic (p < 0.001), while gender (p = 0.004), fever (p = 0.007) and renal colic (p < 0.001) in the testing set.

Table 1 Comparison results of general clinical characteristics on two groupsBlood cells and infection characteristics

Table 2 shows the statistical differences in WBC (p = 0.001), neutrophils (p < 0.001), lymphocytes (p < 0.001), RBC (p < 0.001), hemoglobin (p < 0.001), CRP (p < 0.001), PCT (p < 0.001), and IL-6 (p < 0.001) between the two groups in terms of blood cell and infection analysis. With the exception of lymphocytes (p = 0.136), there were significant differences in the test set.

Table 2 Comparison results of blood cell and infection characteristics on two groupsBlood biochemical characteristics

Table 3 illustrates the blood biochemistry variations between two groups in terms of uric acid (p = 0.02), albumin (p < 0.001), globulin (p < 0.001), G/A (p < 0.001), blood glucose (p < 0.001), and cholesterol (p < 0.001). The rest was consistent with the training set, and there was no significant change in cholesterol (p = 0.102) between the testing and training sets.

Table 3 Comparison results of blood biochemistry characteristics on two groupsUrine characteristics

Urine analytical results are displayed in Table 4. In the testing set, urine WBC (p < 0.001), urine culture (p = 0.017), and urine nitrite (p = 0.034) were statistically different from two groups, but only urine WBC (p = 0.01) were in the training set.

Table 4 Comparative results of urine characteristics on two groupsStone-related characteristics

Table 5 shows the pertinent stone properties. In the training set, there were differences between two groups in terms of the maximal cross-sectional area (p = 0.006), stone position (p < 0.001), stone number (p = 0.036), and HU value of effusion (p < 0.001). In the testing set, there were additional variations in staghorn calculi (p = 0.004) and stone density (p = 0.005).

Table 5 Comparative results on stone-related characteristics on two groupsFive ML prediction models are evaluated

Table 6 shows that the training dataset’s greatest AUC was generated by RF (AUC 1.000, 95%CI 0.999-1.000), followed by XGBoost (AUC 0.999, 95%CI 0.982-1.000), GBDT (AUC 0.977, 95%CI 0.952-1.000) and SVM (AUC 0.971, 95%CI 0.946-0.996). The lowest AUC (AUC 0.938, 95%CI 0.899-0.977) was found utilizing LR. The five prediction models performed satisfactorily in terms of prediction on the training set; the AUC of RF was 1.000, while the AUC of the other models was greater than 0.900, all falling within the 95%CI.

GBDT (AUC 0.967, 95%CI 0.935-1.000) had the greatest AUC in the test set. LR had the next highest AUC (AUC 0.957, 95%CI 0.911-1.000), followed by XGBoost (AUC 0.950, 95%CI 0.901-0.990), SVM (AUC 0.939, 95%CI 0.889-0.989) and RF (AUC 0.924, 95%CI 0.859-0.988).

LR, GBDT and RF models have the best accuracy, followed by SVM, while XGBoost (0.873) with the lowest accuracy. At 0.923 and 0.887, respectively, the LR model’s sensitivity and specificity were the greatest among the five models.

Table 6 Five ML prediction models’ outcomes

Pyonephrosis was identified as the cause feature in the training set of 189 patients. The collinearity feature was eliminated using the Lasso regression and characteristic screening was carried out (Fig. 1).

Fig. 1figure 1

Lasso regression coefficient and log (λ) value

The λmin that the model error is the lowest and the λ1se that within a standard error range are shown (Fig. 2). The fitting impact of the model is better the narrower the ordinate axis (degree of freedom), according to Lasso regression analysis on a 10-fold cross-validation curve.

In this research, the binary logistic regression model was established by selecting the appropriate characteristics of λmin and λ1se. We discovered that on both the training and testing sets, the LR models’ AUC were same, thus chose nine features that corresponded to the λ1se. Nine characteristics were determined using Lasso regression: globulin, G/A, diabetes, renal colic, hemoglobin, CRP, IL-6, urine bacterial count and HU value of effusion.

Fig. 2figure 2

Log (λ) value and model error

The nine characteristics that Lasso regression assessed are first subjected to single-factor and multi-factor LR analysis, as indicated in Table 7. Five characteristics have been demonstrated to be independent risk factors by multi-factor LR, while the single-factor LR analysis of the nine features revealed significant differences in both the hydronephrosis group and the pyonephrosis group, with P < 0.05 serving as the screening criterion.

Diabetes (OR = 16.32, 95%CI 2.02-131.67, p = 0.009), renal colic (OR = 0.06, 95%CI 0.02-0.05, p < 0.001), HU value of effusion (OR = 1.14, 95%CI 1.06-1.23, p < 0.001), hemoglobin (OR = 0.97, 95%CI 0.95-1.00, p = 0.026) and CRP (OR = 1.02, 95%CI 1.01-1.04, p < 0.001) were the five independent risk factors.

Given the regression coefficients of the characteristics found in Table 8, the following formula is used to computing our LR model: logit (Y) = 3.525–1.532 * diabetes-3.456 * renal colic + 0.178 * HU value of effusion − 0.046 * hemoglobin + 0.030 * CRP. The binary predictive features in the algorithm are evaluated as 0 or 1.

Table 7 Single and multiple factors LR results of nine characteristicsTable 8 Five characteristics of the LR model

Gradient boosting tree is the foundation upon which the XGBoost model is constructed. The contribution of the chosen feature is the gain on each node. As seen in Fig. 3, we total up each feature’s contribution to determine the characteristic’s relevance rating. We grouped the characteristics based on their relative value. Figure 4 shows the collection of a few trees in the model. The foremost clinical characteristic in the XGBoost model is CRP, which is followed by hemoglobin, blood glucose, renal colic, globulin and HU value of effusion. The XGBoost prediction model’s accuracy was 0.968, its sensitivity and specificity were 0.962 and 0.981, respectively, and its AUC on the training set was 0.990 (95%CI, 0.982-1.000).

.

Fig. 3figure 3

The characteristics importance ranking of XGBoost model

Fig. 4figure 4

Multi classification tree set of XGBoost model

An approach to sort characteristics using the SVM algorithm is to add hierarchical characteristics one at a time. The top 14 characteristics are arranged as shown (Fig. 5): N/L, HU value of effusion, hemoglobin, renal colic, globulin, WBC, albumin, PCT, IL-6 and blood neutrophils.

Figure 6 shows the prediction accuracy of the four linear kernel functions of SVM: linear, polynomial, radial and sigmoid. The results are 0.894, 0.873, 0.911, and 0.820, respectively. In constructing the SVM model, we selected the most accurate Radial Kernel. AUC of the SVM model was 0.971 (95%Cl, 0.946-0.996) on the training set, with accuracy of 0.947, sensitivity of 0.968, and specificity of 0.908. On the testing set, AUC was 0.939 (95%Cl, 0.889-0.989), with accuracy of 0.860, sensitivity of 0.889, and specificity of 0.800.

Fig. 5figure 5

The SVM model’s significance is one of the top 14 characteristics

Fig. 6figure 6

Accuracy of four SVM kernel functions predictions

Many decision trees compose RF. Every decision tree is a separate entity. When we combined them, receiving a prediction result that is derived from the weighted average of all the trees’ predictions. Mean decrease gini (MDG) and Mean decrease accuracy (MDA) are used to rank the significance of each characteristic, the greater the value, the more significant the trait (Fig. 7).

CRP (23.801), interleukin 6 (14.099), renal colic (13.762), HU value of effusion (11.855), stone position (9.366), and globulin (8.880) were the attributes of MDA > 5. CRP (17.353), interleukin 6 (10.500), HU value of effusion (7.168), globulin (5.329), and hemoglobin (5.305) were the top five most significant predictors in MDG. Overall, MDA and MDG feature significance ranking findings are comparable.

To construct the RF model, we utilized 158 trees that have the lowest overall error rate. For the model training set, the AUC was 1.000 (95%Cl, 0.999-1.000), accuracy, sensitivity and specificity were 1.000. The AUC of the testing was 0.924 (95%Cl, 0.859-0.988), the accuracy was 0.873, the sensitivity was 0.891, and the specificity was 0.833.

Fig. 7figure 7

The characteristics importance ranking of RF model

The decision tree model is iteratively constructed with GBDT utilizing the gradient boosting technique. In an effort to lower the prediction error of the current model on the training data, the model will train a new decision tree in each iteration using the residual of the previous model. we reached at the perfect number of model iterations is 1169 using 10-fold cross-validation (Fig. 8).

The characteristics of the GBDT model are arranged based on their relative significance. CRP remains the most crucial clinical characteristic, followed by renal colic, HU value of effusion, G/A, IL-6, globulin, and PCT (Fig. 9). The AUC of the GBDT model training set was 0.977 (95%Cl, 0.952-1.000), the accuracy was 0.952, the sensitivity was 0.961, and the specificity was 0.935. In the testing, the accuracy, sensitivity, specificity and AUC were 0.873, 0.891 and 0.833 (95%Cl, 0.935-1.000).

Fig. 8figure 8

The ideal number of GBDT model iterations

Fig. 9figure 9

The characteristics importance ranking of GBDT model

The models with the greatest AUC were RF (AUC 1.000, 95%CI 0.999-1.000) (Fig. 10), followed by XGBoost (AUC 0.999, 95%CI 0.982-1.000), GBDT (AUC 0.977, 95%CI 0.952-1.000) and SVM (AUC 0.971, 95%CI 0.946-0.996). The lowest AUC was found with LR (AUC 0.938, 95%CI 0.899-0.977). All five of the prediction models performed well in terms of prediction on the training set; the AUC of RF was 1.000, while other models were greater than 0.900, all falling within the 95%CI.

Fig. 10figure 10

ROC of five prediction models for ML on the training set

Among the models tested (Fig. 11), GBDT (AUC 0.967, 95%CI 0.935-1.000) had the greatest AUC, in the testing set followed by RF (AUC 0.924, 95%CI 0.859-0.988), XGBoost (AUC 0.950, 95%CI 0.901-0.990), SVM (AUC 0.939, 95%CI 0.889-0.989) and LR (AUC 0.957, 95%CI 0.911-1.000).

Fig. 11figure 11

ROC of five prediction models for ML on the testing set

Calibration analysis of LR model

In our study, the GBDT model had the greatest AUC (AUC 0.967, 95%CI 0.935-1.000), with an accuracy rate of 0.873, a sensitivity of 0.891 and a specificity of 0.833. Followed by the LR model (AUC 0.957, 95%CI 0.911-1.000), the sensitivity and specificity were 0.923 and 0.887. The LR model is regarded as the most effective model in this study because to its clinical interpretability and practicability. The model calibration of the training set and the testing set of the LR model were confirmed using the Bootstrap resampling approach (n = 1000) (Figs. 12 and 13).

Fig. 12figure 12

Calibration curve of LR model on training set

Fig. 13figure 13

Calibration curve of LR model on testing set

Clinical applicability analysis of LR model

The DCA curve was generated for the LR model and its applicability was examined (Fig. 14) The two extreme curves in the testing set, All and None, are lower than DCA curve of the model. The results demonstrated that LR model offers a larger net benefit for therapeutic intervention for patients with pyonephrosis. All is the net income line when all patients were intervened. None is the net benefit line for all patients without intervention.

Fig. 14figure 14

DCA curve of the LR model on the testing set

Nomogram of LR model

Nomogram integrated five elements of diabetes, renal colic, hemoglobin, CRP and HU value of effusion, then utilized a line segment with a scale to depict the LR model. It is convenient for urologists to obtain the probability of pyonephrosis by summing the scores corresponding to the five features (Fig. 15). It is drawn on the same plane in a specific proportion to express the relationship between the various characteristics in the prediction model.

Fig. 15figure 15

留言 (0)

沒有登入
gif