Predicting disease recurrence in breast cancer patients using machine learning models with clinical and radiomic characteristics: a retrospective study

Breast cancer exhibits the highest incidence rate among African females, accounting for 46.2% of reported cases. Furthermore, it is responsible for 39.2% of mortality in females below the age of 50 years. As of the year 2020, a total of 186,589 new cases and 85,787 deaths have been officially reported [1].

Treatment options for breast cancer include immunotherapy, radiation therapy, hormone therapy, and chemotherapy. Administering an estrogen receptor test (ER), progesterone receptor test (PR), and human epidermal growth factor receptor-2 (HER2) test can help determine which medication is the most effective. Patients with human epidermal growth factor receptor 2 (HER2), progesterone receptor (PR), and estrogen receptor (ER) deficiencies are referred to as having triple-negative breast cancer (TNBC). Ten percent to 20% of instances of breast cancer are triple negative (TNBC). Radiation therapy, chemotherapy, and surgery are often used in conjunction as part of the conventional treatment plan for TNBC [2].

In general, the terms “triple negative breast cancer (TNBC) and non-triple negative breast cancer (non-TNBC)” are used to refer to all cases of breast cancer that have all hormone receptor statuses, including hormone receptor positive, hormone receptor negative, triple negative, and triple positive. Even with recent developments in the field of breast cancer prognosis, recurrence remains a serious issue that greatly impacts mortality [3]. Predicting recurrence at diagnosis might enable optimizing treatment decisions, which may be an additional tool for medical physicists to assist in decision-making. Predictive machine learning (ML) models have been widely utilized in numerous studies [4,5,6,7,8,9,10]. Many of these studies have successfully demonstrated the predictive capability of utilizing clinical data in accurately predicting breast cancer recurrence.

More recent studies have incorporated radiomics as a complementary approach to augment the predictive power of machine learning algorithms, thereby further enhanc ing their ability to predict breast cancer recurrence [3, 11,12,13,14]. It is necessary to clarify the term “radiomic features,” also referred to as imaging features. In the literature, radiomic features represent a comprehensive collection of quantitative descriptors extracted from medical images that capture intricate details beyond what is perceptible to the naked eye. These features have been proven to play an intriguing role in predicting recurrent cancer, including distant metastasis, local recurrence, and locoregional recurrence, as evidenced by multiple studies [15,16,17].

When considering recurrent breast cancer, the integration of radiomics with machine learning techniques holds promise as a key approach for developing precise and accurate treatment plans in radiation therapy. This is primarily due to their ability to provide valuable imaging phenotypes, offering deeper insights into the tumor characteristics and facilitating personalized treatment strategies.

In this study, our objective was to forecast the likelihood of a recurrence of breast cancer across a highly diverse group of patients, representing a range of cancer types and stages, including triple-negative breast cancer (TNBC) and non-triple-negative breast cancer (non-TNBC).

To achieve this, we employed three distinct models. The first model, referred to as the clinical model, focused on utilizing machine learning algorithms with exclusively clinical and pathology data. The second model, known as the radiomic model, centered on machine learning algorithms specifically designed to analyze dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) features. Lastly, the merged model aimed to combine the strengths of the clinical and radiomic data by integrating both datasets. By employing these three models, we aimed to explore the predictive capabilities of machine learning in the context of a diverse patient population, encompassing both TNBC and non-TNBC cases. These models provided us with the opportunity to assess the individual contributions of clinical data, radiomic features, and their combined effects on predicting breast cancer recurrence accurately.

Related works

Using machine learning models in conjunction with clinical and radiomic characteristics, a number of research have been carried out to investigate the prognostication of disease recurrence among individuals suffering with breast cancer. In order to predict the chance of a recurrence of breast cancer, Alzu’bi et al. [6] in Jordan, used machine learning techniques in their study. The goal of the researchers’ natural language processing system was to extract important information from King Abdullah University Hospital’s computerized health records. The construction of a specific medical dictionary centered on breast cancer was made easier by the integration of these elements. Expert medical professionals evaluated the retrieved data after it had been analyzed using a variety of machine learning techniques, confirming the correctness of the projected results. The study’s conclusions on the effectiveness of machine learning algorithms in predicting breast cancer recurrence were presented, with the OneR algorithm being shown to be the most effective in terms of obtaining a desirable trade-off between sensitivity and specificity. The created medical lexicon has the potential to help doctors make quick and educated decisions about therapy, which will support the use of customized medicine techniques in the management of breast cancer.

In the aforementioned investigation [18], machine learning and ensemble learning methodologies were used to predict the probability of recurrence in patients with breast cancer. Through the examination of an extensive dataset obtained from The Cancer Imaging Archive, we integrated various types of information, including demographic, clinical, pathology, genomic, and treatment data, in order to construct prediction models that exhibit an outstanding degree of accuracy. The methodology employed in our study encompassed the utilization of feature selection and Synthetic Minority Over-sampling Technique (SMOTE) to mitigate the issue of data imbalance and identify pertinent features. Among the various methods utilized, the Extreme Gradient Boosting (XGBoost) exhibited the most superior predictive capability in relation to recurrence. The implications of these findings are of great importance in enhancing treatment planning and mitigating the likelihood of treatment failure among individuals with breast cancer. Rana et al. [19]. utilized machine learning techniques in a separate investigation centered on the diagnosis of breast cancer and the prediction of its recurrence. In the current study, the effectiveness of four machine learning algorithms was compared. The findings indicated that support vector machines (SVM) exhibited notable efficacy in predictive analysis, whereas K-nearest neighbors (KNN) demonstrated superior performance in accurately predicting the recurrence and non-recurrence of malignant cases. The research underscored the potential of machine learning in the automated diagnosis of breast cancer and emphasized the significance of precise and timely detection.

The researchers conducted a study in which they investigated the prediction of breast cancer using ensemble machine learning algorithms [20]. The study conducted an analysis on a dataset pertaining to breast cancer, with a specific focus on investigating risk factors including family history, physical inactivity, psychological stress, and breast size. The prediction task utilized two widely recognized ensemble algorithms, namely random forest and Extreme Gradient Boosting (XGBoost). The analysis encompassed a total of 275 instances, each characterized by 12 distinct features. The findings demonstrated that the random forest algorithm achieved an accuracy rate of 74.7%, while XGBoost achieved an accuracy rate of 73.63%. These results highlight the potential of ensemble techniques, such as random forest and XGBoost, in the prediction of breast cancer.

In another study, the researchers aimed to investigate the role of MRI-based radiomics features in predicting the risk of tumor recurrence in patients with ER + /HER2 − invasive breast cancer who underwent Oncotype DX testing [21]. A total of 62 patients were included in the analysis. Radiomics features were extracted from both the tumor and the surrounding peritumoral tissues. The multivariate machine learning algorithm used was partial least square (PLS) regression. The top 5% of radiomics features with the largest PLS β-weights were selected for the analysis. The performance of the radiomics model was evaluated using leave-one-out nested cross-validation (nCV) and receiver operating characteristic (ROC) analysis. The results showed that the radiomics model achieved an area under the curve (AUC) of 0.76, indicating its potential to accurately predict the risk of tumor recurrence in early ER + /HER2 − breast cancer patients. Additional examination that integrates particular dynamic contrast-enhanced (DCE) images also revealed a propensity towards statistical significance. The results indicate that utilizing a radiomics-based machine learning methodology shows potential in forecasting the likelihood of recurrence in this specific group of patients.

In a recent study with a similar focus [22], researchers developed machine learning models to predict disease recurrences in breast cancer patients who underwent surgery. They utilized clinical data and radiomic characteristics obtained from 2-deoxy-2-[18F]-fluoro-d-glucose positron emission tomography ([18F]-FDG-PET) scans. The study included 112 patients with 118 breast cancer lesions, divided into training and testing cohorts. By employing various machine learning algorithms, including decision tree, random forest, neural network, k-nearest neighbors, Naive Bayes, Logistic Regression, and support vector machine, the researchers achieved promising results. The models, which integrated clinical and radiomic data, exhibited a notable level of precision and attained area under the receiver operating characteristic curves (AUCs) surpassing 0.80. The present study emphasizes the potential of machine learning methodologies in the prediction of disease recurrences and the provision of support in the postsurgical management of breast cancer patients.

View original article

JOURNAL OF THE EGYPTIAN NATIONAL CANCER INSTITUTE

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Predicting disease recurrence in breast cancer patients using machine learning models with clinical and radiomic characteristics: a retrospective study

留言 (0)