Machine learning models for predicting of PD-1 treatment efficacy in Pan-cancer patients based on routine hematologic and biochemical parameters

Patients and study design

This retrospective study included 170 NPC patients, 110 esophageal cancer patients (ECs), and 151 lung cancer (LC) patients from Sun Yat-sen University Cancer Center who underwent PD-1 checkpoint inhibitor combination therapy. (Fig. 1) The inclusion criteria consisted of patients with NPC, ESCC or LC who received PD-1 therapy in combination with radiotherapy or/and chemotherapy or/and surgical treatment or/and targeted therapy from September 2018 to July 2022. The exclusion criteria consisted of a follow-up time of < 1 year, a lack of hematological examination before treatment and after the third week of treatment, a duration of PD-1 inhibitor administration of less than three months, and not undergoing imaging evaluation within 8–12 weeks. Imaging evaluations were carried out according to the Response Evaluation Criteria in Solid Tumors (RECIST) v1.1 to evaluate the effect of immunotherapy at 8–12 weeks and included progressive disease (PD) and non-PD (complete response (CR), partial response (PR), and stable disease (SD).

Basic clinical parameters, including age, sex, histological type, metastasis stage, and TNM classification, were collected.

Laboratory examination

The complete blood analysis results were obtained using an automated XN-2000 hematology analyzer (Sysmex, Japan). Flow cytometry, impedance cytometry and optical cytometry were used to determine the hematological parameters of the Sysmex XN-2000 strain. The impedance method and hydrodynamic focusing method were used to count red blood cells (RBCs) and platelets. Fluorescent flow cytometry was used to determine the white blood cell (WBC) count in all the channels. Fluorescent flow cytometry was performed with scattered laser light (on the front and side). The Sysmex XN-2000 analyzer can be used to determine 28 basic diagnostic parameters and 16 optional diagnostic parameters, including RBC, WBC (percentage and absolute number of neutrophils, lymphocytes, eosinophils, basophils and monocytes), mean corpuscular volume (MCV), hematocrit (HCT), platelet (PLT), hemoglobin (HGB), mean corpuscular hemoglobin concentration (MCHC), and mean corpuscular hemoglobin (MCH).

Biochemical parameters were measured according to standard commercially available assays adapted to a Roche Cobas C702 Chemistry Analyzer (Roche Diagnostics, Japan) or Hitachi LABOSPECT 008 AS Chemistry Analyzer (Hitachi High-Tech Corporation, Japan) using automated procedures: glucose(GLU), urea, creatinine(CRE), uric acid(UA), total bile acid(TBA), triglycerides (TG), total cholesterol(CHO), aspartate aminotransferase (AST), alanine aminotransferase (ALT), alkaline phosphatase (ALP), gamma-glutamyl transferase (GGT), total proteins(TP), globulin(GLOB), albumin(ALB), carbon dioxide(CO2), calcium(Ca+), lactate dehydrogenase (LDH), total bilirubin(total bilirubin(TBA), Direct bilirubin(DBIL), cholinesterase (CHE), creatine kinase (CK), cystatin C(CYSC), high-density lipoprotein-C (HDL-C), low-density lipoprotein-C (LDL-C), apolipoprotein A1 (ApoA1), apolipoprotein B (ApoB), C-reaction protein(CRP), serum amyloid(SAA). A chemistry analyzer was used to conduct photometric assays on the absorbance changes of various analytes, and the quantitative results were calculated. The details are listed in Table S1 in the supplementary material.

The NLR, MLR, PLR, and systemic immune-inflammation index (SII) were calculated as the neutrophil count/lymphocyte count (NLR), monocyte count/lymphocyte count (MLR), platelet count/lymphocyte count (PLR), and NLR * platelet count, respectively.

Machine learning methods for the prediction of cancer treatment response

To predict the response of cancer patients to PD-1 checkpoint inhibitor combinations, we employed commonly used machine learning methods, including principal component analysis (PCA), support vector machine (SVM) [18], random forest (RF) [19], adaptive boosting decision tree (AdaBoost) [20], gradient-boosting decision tree (GBDT) [21], extreme gradient boosting decision tree (XGBoost) [22] and artificial neural network (ANN) methods [23], to learn blood biomarker features. The dimension of the blood biomarker features was reduced to 2 in the PCA. We employed the ν-SVM method, which utilized a parameter ν to control the number of support vectors. After tuning the hyperparameters, ν was chosen to be 0.03, and the radial basis function was selected as the kernel function to maximize the prediction accuracy. For decision tree-based methods, multiple decision trees are employed to improve classification performance. The number of trees in the RF was set to 100, and the maximum depth of the trees was adjusted to 20. To evaluate the importance of blood biomarkers, base decision tree classifiers were used to calculate the feature importance in AdaBoost. The number of trees in AdaBoost was chosen to be 100, and the learning rate was 1. GBDT also employs 100 decision trees with a maximum depth of 3. For the XGB method, the tree number was adjusted to 60 with a maximum depth of 20. The ANN method employed 3 layers of neural networks, and the nodes were 64, 48 and 16 for the first, second and third hidden layers, respectively. We used the ReLU activation function for the first and second hidden layers. For the third output layer, the Softmax activation function was chosen to determine the probabilities for different class predictions. The optimization function of the MLP was the Adam function, and the learning rate was 0.0001. The loss function was chosen to be MSELoss.

Response category prediction strategy

The blood biomarker levels of 431 patients were normalized to the range of [-1,1]. The numbers of patients with different treatment responses were 66 (PD), 256 (SD) and 109 (PR). The SD and PR patients formed the DC group. To train and test the machine learning models, the samples were randomly divided into training and testing datasets at a ratio of 8:2. In the training dataset, there were 345 patients, including 53 PD patients and 292 DC patients. Due to the imbalance between the numbers of PD patients and DC patients, the number of PD patients was increased from 53 to 292 with the synthetic minority oversampling technique (SMOTE) to avoid ignoring the features of PD during training [24]. In the testing dataset, there were 13 PD patients and 73 DC patients. The test results of the machine learning models are shown with receiver operating characteristic (ROC) curves. The scaled values of true positives, false positives, true negatives and false negatives are presented.

留言 (0)

沒有登入
gif