Predicting the Recurrence of Ovarian Cancer Based on Machine Learning

Background

Ovarian cancer is one of the vital causes of gynecological cancer deaths,1,2 and its lethality mainly comes from its high risk of recurrence.3 The recurrence rate of patients in the first three years is approximately 70%.4 Therefore, identifying the recurrence of ovarian cancer patients is important as it can guide personalized treatment and surveillance plans, such as the selection of chemotherapy.5 The carbohydrate antigen 125 (CA125) is routinely used to detect the progression of ovarian cancer,6 however, monitoring the increase in CA125 concentration alone for recurrence treatment cannot improve patient survival.7 With the emergence of different biomarkers, the rising number of studies on CA125 combined with different biomarkers seems to be promising. However, due to the lack of sensitivity or specificity, none of the biomarkers is used clinically to detect ovarian cancer progression, including carcinoembryonic antigen (CEA), carbohydrate antigen 19–9 (CA19-9), and HE4 are effective.8 Therefore, it is urgent to adopt new methods to predict the recurrence of ovarian cancer.

In order to improve the effective treatment of ovarian cancer, it is significant to identify factors which can accurately define patient characteristics before initial intervention. In addition, developing methods for predicting treatment outcomes and prognosis is an important measure in the field of personalized medicine.9–11 Several studies have shown that the combination of biomarker and multiple clinical factors can accurately predict the prognosis.12,13 Machine learning is a branch of artificial intelligence technology, filling the gap in clinicians processing of complex information.14 It can provide various effective methods to process multidimensional datasets, and it excels in providing methods that can effectively evaluate a large number of variables to build accurate predictive models.15,16 At present, the machine learning model has been applied to the diagnosis and prediction of prognosis of diseases. To a certain extent, diagnostic efficiency and patient prognosis have been significantly improved.17 For instance, Jan’s team developed a new artificial intelligence model that extracts radiomics and deep learning features from CT images to distinguish between benign and malignant ovarian tumors.18 In addition, Piedmonte, S et al created a new machine learning algorithm to predict the results of primary cytoreductive surgery in patients with advanced ovarian cancer.19

However, there are few researches on the recurrence of ovarian cancer related to machine learning. In this study, we explored the relationship between the recurrence of ovarian cancer and serum biomarkers, as well as other clinical variables, by using different machine learning methods, mainly including K-Nearest Neighbor (K-NN), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost). Our research aims at developing machine learning prediction methods based on multiple blood biomarkers and clinical features to evaluate the recurrence rate of ovarian cancer patients, thereby helping clinicians choose personalized treatment strategies.

Materials and MethodsStudy Population

We randomly and retrospectively screened the data of patients diagnosed with ovarian cancer in the Second Affiliated Hospital of Nantong University from January 2008 to December 2019. The inclusion criteria of the study were as follows: 1. the pathological diagnosis of ovarian cancer in adult women was confirmed by two experienced pathologists; 2. complete clinical and follow-up data. The exclusion criteria for patients were as follows: 1. with significant missing clinical information; 2. experience of the previous cancer or coexistence with the current other cancer. In the end, we enrolled a total of 277 patients for this study. The ultimate follow-up date was December 2022. The study followed the ethical principles in the Declaration of Helsinki and was approved by the Ethics Committee of the Second Affiliated Hospital of Nantong University. The specific workflow of this study is depicted in Figure 1.

Figure 1 The schematic diagram of the overall workflow.

Clinical Data Collection

The investigated dataset consisted of 47 clinical parameters of patients: 9 demographic data (Age, Menopausal status, FIGO stage, Degree of differentiation, Histological type, Lymph node metastasis, Ascites, Hydrothorax, Neoadjuvant chemotherapy), 14 variables from full blood sources (White blood cell, Neutrophil ratio, Lymphocyte ratio, Eosinophil ratio, Monocyte ratio, Neutrophil count, Lymphocyte count, Eosinophil count, Red blood cell, Hematocrit, Mean corpuscular volume, Mean corpuscular hemoglobin, Mean corpuscular hemoglobin concentration, Platelet count), 20 variables from serum sources (Alkaline phosphatase, Gamma glutamyl transferase, Albumin, Globulin, Total bilirubin, Direct bilirubin, Indirect bilirubin, Creatinine, Uric acid, Total cholesterol, α-L-fucosidase, Glucose, Prealbumin, Aspartate aminotransferase, Lactic dehydrogenase, Creatine kinase, Kalium, Chlorine, Cholinesterase, Calcium), 3 variables from plasma sources (Prothrombin time, Activated partial thromboplastin time, Fibrinogen), and 1 tumor biomarker (Carbohydrate antigen 125) (Table 1). The missing value rate of all variables is below 10%. Missing values were interpolated using simple mean imputation according to the complete cohort. Table 2 shows the correlation between the recurrence of ovarian cancer and clinical parameters.

Table 1 The Attribute List for Different Variables of the Cohort

Table 2 Association Between Recurrent and Recurrent-Free of Ovarian Cancer Groups

Machine Learning Models

In this study, we employed six machine learning algorithms based on supervised integration, including K-NN, DT, RF, AdaBoost, GBM, and XGBoost. The effectiveness of all machine learning algorithms was evaluated through the Python (Version 3.7.13) programming language. Based on Python software for programming, additionally, Anaconda software was employed to complete the installation of libraries and packages. Next, we performed basic data processing through Python libraries such as Pandas and Numpy. All machine learning methods except for XGBoost can be realized using the Sklearn package, while XGBoost uses its own specialized software package for implementation. We used the Python function train_ Test_ Split divides the samples into training and testing cohorts, and each method fits the testing cohort based on the learning results of the training cohort, and then compares the accuracy of each machine learning algorithm. Finally, the importance of each feature in each machine learning algorithm is determined through “feature_importance_”.

Statistical Analysis

The IBM SPSS Statistics software (Version 25.0) was employed to draw receiver operating characteristic curve (ROC) to evaluate the prediction performance of various clinical variables. Independent-samples two tails t-test was used to compare the difference of continuous variables between groups, and Chi-squared test was employed for categorical variables. The GraphPad Prism software (Version 8.0.1) was applied to calculate the expression of clinical variables in each group. The value of p<0.05 was considered statistically significant.

ResultsCorrelation Between Variables

Firstly, we compared the recurrent and recurrent-free of ovarian cancer groups with various indicator levels. As shown in Figure 2A, the gradient change of color blocks from blue to red in the heat map corresponds to the indicator level from negative to positive correlation. Correlation analysis shows that neoadjuvant chemotherapy, degree of differentiation, FIGO stage, and PLT are positively correlated with the recurrence of ovarian cancer. However, PAB, ALB, and CHE are negatively correlated with the recurrence of ovarian cancer (Figure 2B). In order to explore the utility of multiple indicators as predictors of recurrence characteristics in ovarian cancer, we compared multiple logistic regression analysis based on 47 biomarkers with single logistic regression analysis using each marker. Figure 2C shows the ROC curve based on multiple logistic regression of 47 biomarkers (blue line) in 277 patient samples to predict the recurrence of ovarian cancer. This result indicates that the area under the ROC curve (AUC) is superior to any single regression result represented by dashed lines (Figure 2C). When we employed stepwise regression, the regression model was constructed using a subset of biomarkers, with slight improvement in AUC (Figure 2C, red line).

Figure 2 Correlation between variables. (A) The levels of various biomarkers in the recurrent and recurrent-free groups of ovarian cancer. In the upper legend, blue represents the recurrent group of ovarian cancer, while red represents the recurrent-free group. In the bottom legend, red and blue respectively represent positive and negative correlations with the recurrence of ovarian cancer, and the greater the absolute value, the higher the correlation. (B) Heat map of the relationship between variables. The gradient from green to red represents a gradient from positive to negative correlation. (C) Logistic regression of ROC curve for predicting ovarian cancer recurrence. The results of the multiple regression model using all 47 biomarkers is represented by blue line, while the single regression results are represented by black dashed lines. The red line represents the result of gradual regression.

Multimodal Prognostication

We evaluate the effectiveness of various biomarkers in predicting the recurrence of ovarian cancer by constructing different machine learning algorithms. The K-NN algorithm shows that only when the neighbor value is set to 10, the accuracy of both the training and testing sets can maintain relative consistency and achieve the optimal level (Figure 3A). The DT model demonstrates that neoadjuvant chemotherapy, PAB, and HCT exhibit high predictive power in evaluating the recurrence of ovarian cancer (Figure 3B). However, in the RF model, UA, CA125, and HCT exhibit high predictive performance (Figure 3C). Only the Ca indicator has the highest weight in the AdaBoost model (Figure 3D). In the GBM model, the top three biomarkers are neoadjuvant chemotherapy, CA125, and PAB (Figure 3E). Similarly, HCT, neoadjuvant chemotherapy, and CA125 also have higher weights in the XGBoost model (Figure 3F).

Figure 3 Multimodal prognostication. (A–F) Variable importance of features included in six machine learning algorithms for predicting the recurrence of ovarian cancer.

Finding Significantly Associative Biomarkers Using Statistical Methods

Next, each prediction method is applied to calculate the importance of each biomarker in predicting the recurrence of ovarian cancer. This method eliminated 12 biomarkers which have no predictive value in each machine learning method, and ultimately, we identified 35 predictive factors that can differentiate between recurrent and recurrent-free of ovarian cancer (Figure 4A). At the same time, we visualized and analyzed these 35 biomarkers according to their frequency in each machine learning method, and the word cloud geom demonstrated that the predictive ability of neoadjuvant chemotherapy, PAB, and HCT was superior in each machine learning method (Figure 4B).

Figure 4 Finding significantly associative biomarkers using statistical methods. (A) Five machine learning methods were used to calculate the relative importance of 35 biomarkers in predicting the recurrence of ovarian cancer. (B) Sort 35 biomarkers according to their frequency of occurrence in each type of machine learning.

The Distribution of Important Markers and the Accuracy of Models

In order to effectively evaluate the potency of biomarkers in predicting the recurrence of ovarian cancer, we selected the top three key biomarkers with the highest predictive value among each machine learning method based on their weights. As shown in Table 3, according to the frequency of occurrence, a total of six indicators were contained, including neoadjuvant chemotherapy, MONO%, HCT, PAB, AST, and CA125. Standard box plots presenting the distribution of these six biomarkers between recurrent and recurrent-free of ovarian cancer are shown in Figure 5A–F. In addition, the accuracy of each model is depicted by radar chart, and the performance on XGBoost is prominent, with an accuracy rate of 0.95 (Figure 6).

Table 3 The Quantified Importance of Six Biomarkers by Machine Learning

Figure 5 The expression of six important biomarkers. (A–F) Box plots representing distribution of top six biomarkers for distinguishing the recurrent from recurrent-free of ovarian cancer.

Figure 6 The accuracy of each model depicted by radar plot.

Discussion

As is well known, the high recurrence rate of ovarian cancer leads to poor prognosis, so it is urgent to predict the progression of ovarian cancer in order to appropriately implement strategies to prevent recurrence.20 At present, it is known that the risk factors for ovarian cancer recurrence include age, stage of disease, and tumor histology. If these risk factors can be integrated to construct a predictive model, it will be able to effectively predict the progression of ovarian cancer. Nowadays, machine learning algorithms based on artificial intelligence technology have been widely applied in the diagnosis and prognosis evaluation of diseases.9,21,22 Due to the spatial and temporal heterogeneity of solid tumors,23 it is necessary to integrate multi-scale clinical information to improve predictive ability. Coincidentally, machine learning algorithms can discover hidden information in clinical data by simultaneously handling multiple factors, thereby better understanding the complex mechanisms behind carcinogenesis and cancer progression.24 At present, there is still exploratory value in accurately identifying clinically significant patient information before initial treatment. Therefore, we used machine learning algorithms to predict the recurrence of ovarian cancer based on a given set of variables, and compare different algorithms to determine the most efficient method.

A great deal of studies have shown that the tumor microenvironment is crucial to the occurrence and development of tumors.25–27 These studies enlighten us that we should not be limited to the application of traditional tumor markers (such as CA125, HE4, CEA, etc.) to judge the progress of tumors, but should focus on the biomarkers related to the tumor microenvironment, so as to fully judge the growth characteristics and biological behavior of tumors. For instance, in previous research, we found that CA125 combined with D-dimer can preferably predict lymph node metastasis in ovarian cancer patients.28 Artificial intelligence technology can identify more clinical parameters related to prognosis. In consequence, we investigated six machine learning methods to predict recurrence in patients of ovarian cancer based on readily available biomarkers, and analyzed the prognosis of patients using 47 clinical parameters of the patient by machine learning classifiers. The collected 47 biological indicators cover the basic clinical characteristics of patients and various biomarkers in peripheral blood, in order to achieve more accurate disease prediction. In addition to the conventional CA125, the machine learning algorithm in our study also identified crucial factors for the recurrence of ovarian cancer, including neoadjuvant chemotherapy, MONO%, HCT, PAB, and AST. To sum up, based on the deep comparison of different clinical parameters using artificial intelligence technology, this study can provide valuable information about patients for clinicians.

In this study, we have demonstrated the feasibility of using machine learning to construct ovarian cancer prediction model. In a recent study, researchers used nine machine learning algorithms to predict diagnosis and evaluate the prognosis of patients with epithelial ovarian cancer based on 34 basic clinical variables. The study demonstrated that the XGBoost model has the most predictive value in predicting the diagnosis of epithelial ovarian cancer.29 Similarly, the model of XGBoost performed the best correspondence in predicting the recurrence of ovarian cancer in terms of the model we established, outperforming the other five models. These results indicate that machine learning algorithms can provide valuable prognostic information based on preoperative biomarkers, which is beneficial to develop personalized treatment strategies for patients of ovarian cancer.

It is undeniable that our research has some limitations. Firstly, the research data for this study comes from a single medical center and involves a relatively small number of patients. Therefore, more patients from multiple medical centers are needed to verify the applicability of this model in the future. Moreover, retrospective study often carries the risk of selection bias. In addition, we need to further explore the application of machine learning in clinical and medical decision-making through prospective cohort study in future research.

Conclusions

Accurate prognostic prediction tools are helpful for clinical decision-making in ovarian cancer. The machine learning method in this study revealed the correlation between clinical parameters and the recurrence of ovarian cancer, which can be used for patient stratification. In conclusion, our research shows that machine learning can achieve more accurate disease assessment, help clinicians make decisions, develop personalized treatment strategies, and adapt to the current development trend of precision medicine. We believe that future research can utilize artificial intelligence to integrate various clinical parameters of patients to develop new models and provide critical disease information to clinicians.

Ethics Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Second Affiliated Hospital of Nantong University. Individual consent was waived because of the retrospective nature of our study. Patients’ data were anonymized and maintained with confidentiality.

Funding

This study was supported by grants from Nantong Municipal Health and Construction Commission Youth Mandatory Project QN2022023.

Disclosure

All of the authors declare that they have no conflicts of interest for this work.

References

1. Richardson DL, Sill MW, Coleman RL. et al. Paclitaxel with and without pazopanib for persistent or recurrent ovarian cancer: a randomized clinical trial. JAMA Oncol. 2018;4(2):196–202. doi:10.1001/jamaoncol.2017.4218

2. Benedetti Panici P, Giannini A, Fischetti M, Lecce F, Di Donato V. Lymphadenectomy in ovarian cancer: is it still justified? Curr Oncol Rep. 2020;22(3):22. doi:10.1007/s11912-020-0883-2

3. Rizzuto I, Stavraka C, Chatterjee J, et al. Risk of ovarian cancer relapse score: a prognostic algorithm to predict relapse following treatment for advanced ovarian cancer. Int J Gynecol Cancer. 2015;25(3):416–422. doi:10.1097/IGC.0000000000000361

4. Ledermann JA, Raja FA, Fotopoulou C, Gonzalez-Martin A, Colombo N, Sessa C; ESMO Guidelines Working Group. Newly diagnosed and relapsed epithelial ovarian carcinoma: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2013;24(6):vi24–32. doi:10.1093/annonc/mdt333

5. Luvero D, Milani A, Ledermann JA. Treatment options in recurrent ovarian cancer: latest evidence and clinical potential. Ther Adv Med Oncol. 2014;6(5):229–239. doi:10.1177/1758834014544121

6. Zhang M, Cheng S, Jin Y, Zhao Y, Wang Y. Roles of CA125 in diagnosis, prediction, and oncogenesis of ovarian cancer. Biochim Biophys Acta Rev Cancer. 2021;1875(2):188503. doi:10.1016/j.bbcan.2021.188503

7. Rustin GJ, van der Burg ME, Griffin CL, et al. MRC OV05; EORTC 55955 investigators. Early versus delayed treatment of relapsed ovarian cancer (MRC OV05/EORTC 55955): a randomised trial. Lancet. 2010;376(9747):1155–1163. doi:10.1016/S0140-6736(10)61268-8

8. Muinao T, Deka Boruah HP, Pal M. Diagnostic and prognostic biomarkers in ovarian cancer and the potential roles of cancer stem cells - an updated review. Exp Cell Res. 2018;362(1):1–10. doi:10.1016/j.yexcr.2017.10.018

9. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2014;13:8–17. doi:10.1016/j.csbj.2014.11.005

10. Ludwig JA, Weinstein JN. Biomarkers in cancer staging, prognosis and treatment selection. Nat Rev Cancer. 2005;5(11):845–856. doi:10.1038/nrc1739

11. Di Donato V, Giannini A, D’Oria O, et al. Hepatobiliary disease resection in patients with advanced epithelial ovarian cancer: prognostic role and optimal cytoreduction. Ann Surg Oncol. 2021;28(1):222–230. doi:10.1245/s10434-020-08989-3

12. Huang H, Sun J, Jiang Z, et al. Risk factors and prognostic index model for pancreatic cancer. Gland Surg. 2022;11(1):186–195. doi:10.21037/gs-21-848

13. Urakawa N, Kanaji S, Kato T, et al. Neutrophil-lymphocyte ratio and histological response correlate with prognosis of gastric cancer undergoing neoadjuvant chemotherapy. Vivo. 2023;37(1):378–384. doi:10.21873/invivo.13089

14. Motwani M, Dey D, Berman DS, et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. Eur Heart J. 2017;38(7):500–507.

15. Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak. 2019;19(1):48. doi:10.1186/s12911-019-0801-4

16. Miao R, Badger TC, Groesch K, et al. Assessment of peritoneal microbial features and tumor marker levels as potential diagnostic tools for ovarian cancer. PLoS One. 2020;15(1):e0227707. doi:10.1371/journal.pone.0227707

17. Sultan AS, Elgharib MA, Tavares T, Jessri M, Basile JR. The use of artificial intelligence, machine learning and deep learning in oncologic histopathology. J Oral Pathol Med. 2020;49(9):849–856. doi:10.1111/jop.13042

18. Jan YT, Tsai PS, Huang WH, et al. Machine learning combined with radiomics and deep learning features extracted from CT images: a novel AI model to distinguish benign from malignant ovarian tumors. Insights Imaging. 2023;14(1):68. doi:10.1186/s13244-023-01412-x

19. Piedimonte S, Erdman L, So D, et al. Using a machine learning algorithm to predict outcome of primary cytoreductive surgery in advanced ovarian cancer. J Surg Oncol. 2023;127(3):465–472. doi:10.1002/jso.27137

20. Zhang F, Zhang Y, Ke C, et al. Predicting ovarian cancer recurrence by plasma metabolic profiles before and after surgery. Metabolomics. 2018;14(5):65. doi:10.1007/s11306-018-1354-8

21. Huang S, Yang J, Fong S, Zhao Q. Artificial intelligence in cancer diagnosis and prognosis: opportunities and challenges. Cancer Lett. 2020;471:61–71. doi:10.1016/j.canlet.2019.12.007

22. Gaur K, Jagtap MM. Role of artificial intelligence and machine learning in prediction, diagnosis, and prognosis of cancer. Cureus. 2022;14(11):e31008.

23. Aerts HJ. The potential of radiomic-based phenotyping in precision medicine: a review. JAMA Oncol. 2016;2(12):1636–1642. doi:10.1001/jamaoncol.2016.2631

24. Kawakami E, Tabata J, Yanaihara N, et al. Application of artificial intelligence for preoperative diagnostic and prognostic prediction in epithelial ovarian cancer based on blood biomarkers. Clin Cancer Res. 2019;25(10):3006–3015. doi:10.1158/1078-0432.CCR-18-3378

25. Qiu Y, Wang X, Sun Y, et al. TCF12 regulates exosome release from epirubicin-treated CAFs to promote ER+ breast cancer cell chemoresistance. Biochim Biophys Acta Mol Basis Dis. 2023;1869(6):166727. doi:10.1016/j.bbadis.2023.166727

26. Mathew AA, Zakkariya ZT, Ashokan A, et al. 5-FU mediated depletion of myeloid suppressor cells enhances T-cell infiltration and anti-tumor response in immunotherapy-resistant lung tumor. Int Immunopharmacol. 2023;120:110129. doi:10.1016/j.intimp.2023.110129

27. Lu J, Li J, Lin Z, et al. Reprogramming of TAMs via the STAT3/CD47-SIRPα axis promotes acquired resistance to EGFR-TKIs in lung cancer. Cancer Lett. 2023;564:216205. doi:10.1016/j.canlet.2023.216205

28. Zhang L, Guan Z, Yin Y, et al. Predictive value of indicator of CA125 combined with D-dimer (ICD) for lymph node metastasis in patients with ovarian cancer: a two center cohort study. J Cancer. 2022;13(8):2447–2456. doi:10.7150/jca.70737

29. Wu M, Zhao Y, Dong X, et al. Artificial intelligence-based preoperative prediction system for diagnosis and prognosis in epithelial ovarian cancer: a multicenter study. Front Oncol. 2022;12:975703. doi:10.3389/fonc.2022.975703

留言 (0)

沒有登入
gif