Breast cancer is the most diagnosed cancer in women and the second leading cause of cancer-related mortality in women . Heritage is the most critical risk factor, and 15 to 20% of breast cancer is familiar . One of the characteristics of breast cancer is that it can be wholly cured given an early diagnosis . The mortality rate from breast cancer has been reduced by 1.9% annually from 2002 to 2011 and 1.3% from 2011 to 2020 . Diagnostics and treatments have continuously improved through the years. However, the situation is different in each country considering the costs and technological advances in each country. In the United States, 300,590 cases of breast cancer had been estimated for the year 2023, with a total of 43,700 deaths . Latin America has over 210,000 new cases and around 60,000 deaths yearly . For the year 2020, it was estimated that about 2.3 million breast cancer cases were diagnosed in women globally, and about 685,000 died from this disease . A recurrent problem with standard treatments are the side effects. Regarding the use of chemotherapeutic drugs, such issues are nephrotoxicity of cisplatin, cardiotoxicity of doxorubicin, and pulmonary fibrosis from the use of bleomycin . Besides, in the case of radiotherapy, fibrosis, atrophy, and neuronal damage caused by irradiation can occur . Consequently, novel treatments try to reduce the secondary effects while retaining the benefits of standard approaches.
Chemotherapy is one of the most extensively applied treatments for breast cancer, with different drug targets depending on the type of cancer. Progesterone- or estrogen-receptor-positive tumors are related to cancers with low mortality . Another common target in chemotherapy is human epidermal growth factor receptor 2 (HER2). Only 15 to 20% of all tumors are HER2-positive, overexpressing Erb-B2 receptor tyrosine kinase 2 (ERBB2) in the cell membrane. HER2 tumors are usually more aggressive than other ones, but the advantage is that their treatment is very effective . Another chemotherapy target is the chemokine C-X-C motif receptor 7 (CXCR7) . This G-protein is targeted because studies show a possible positive effect on inhibiting the metastasis of cervical cancer cells . However, more clinical and preclinical studies on CXCR7 and its co-player CXCR4 are required since alterations have been detected in diseases such as cancer, central nervous system and cardiac disorders, and autoimmune diseases .
In recent years, nanomaterials have attracted the attention of different scientific communities by providing them with new solutions for drug delivery . These nanotechnological applications have made it possible to obtain treatments that release substances at specific sites of interest, reducing the required drug amount and side effects. Nanostructures to form these drug delivery systems can be divided into organic and inorganic , with the latter one being the less extensively studied. One option currently considered in pharmacy and medicine is carbon-based nanomaterials because of their physicochemical, mechanical, electrical, thermal, and optical properties , as well as their capacity to modify existing drugs. Fullerene derivatives have been proposed recently, particularly those obtained from fullerene C60. The unmodified fullerene C60 is known as a “free radical sponge” because its double bonds tend to accept free radicals . Because of its size, surface area, and capacity to extinguish or generate reactive oxygen species, C60 is very promising in medicine and clinical therapy . It is also possible to modify pristine fullerenes by adding polar functional groups (e.g., –COOH, –OH, or –NH2), to improve water solubility, antioxidant properties, and even biological activity . For instance, polyhydroxy fullerenes (PHFs) exhibit properties suitable for biomedical applications, such as water solubility, biodegradability, biocompatibility, and hypoallergic response. It has been shown that PHFs can inhibit cancer tumor growth and positively regulate the immune system . The same is valid for carboxylated fullerenes ; for instance, C60[C(COOH)2]3 is well known for its high biological activity in plants and within mitochondrial dynamics .
Since the evaluation of novel drugs is a task that requires significant human and material resources, innovative strategies have been formulated as alternatives. Quantitative structure–activity and quantitative structure–property relationships (QSAR/QSPR) are a paradigm that can be useful in choosing promising molecules, considering the information on inactive and active compounds, through in silico approaches. According to the QSAR/QSPR paradigm, a given activity/property, f, can be modeled using a set of quantitative descriptors, x1, x2, x3,..., xn, theoretically determined or measured by experiments . A relationship f(x1, x2, x3,..., xn) can be defined to predict the activity or property of molecules after the evaluation of their quantitative descriptors. However, the QSAR/QSPR paradigm does not explain how to select the descriptors or how to build the mathematical function. Consequently, the following paragraphs discuss basic concepts about selecting descriptors and regression techniques implemented in this manuscript.
Lipinski’s rule of five is a compendium of guidelines commonly used to determine if a molecule can be proposed as an orally delivered drug according to its physicochemical properties. According to this rule, a drug compound should have a molecular weight below 500 g/mol, a octanol–water partition coefficient (LogP) below 5, less than five hydrogen bond donor sites, and less than ten hydrogen bond acceptors sites. It is possible to add two other conditions, namely polar surface area (PSA) ≤ 140 Å2 and less than ten rotatable bonds . Taking advantage of the readiness of these quantities in public datasets, the current study proposes some of these quantities as potentially suitable descriptors for predictive models. Besides, Pearson’s hard–soft acid–base (HSAB) theory suggests other descriptors to describe and predict the interactions between chemical species, such as those between a drug molecule as a ligand and a protein . These quantitative values are based on the vertical ionization energy (I) and electron affinity (A). According to Koopmans’ theorem, both can be approximated by I = −EHOMO and A = −ELUMO, where EHOMO is the energy of the highest occupied molecular orbital (HOMO), and ELUMO is the energy of the lowest unoccupied molecular orbital (LUMO). It is advantageous to combine these properties to find out if an interaction between two species will occur and to obtain new quantitative relationships. Another helpful descriptor is the global electrophilicity, calculated as ω = χ2/2η . Electrophilicity is related to the energetic stabilization that a species gains by obtaining an additional electron.
MethodsFirst, 42 drugs related to chemotherapy treatments for breast cancer were proposed. Although the most notable fullerene derivatives for biological applications are those with several hydrophilic groups, the carboxylic acid derivative C60–COOH has been studied as well. Baglayan and coworkers carried out a conformation analysis within DFT to obtain the ground state structure for C60–COOH . In addition, they discussed its usage as a potential drug carrier for the antimetabolic and anticancer drug 5-fluoruracil . Similarly, Parlak and Alver reported a theoretical study on the interactions and stability of paracetamol complexes with C60–COOH . Consequently, this work proposes the interaction of C60–COOH fullerene with anticancer drugs. As a complement, a water-soluble fullerene predicted as stable at the normal human body temperature was proposed to study the interactions with doxorubicin and gemcitabine . The water-soluble fullerene is introduced to avoid known mutagenic reactions related to breast cancer . It was also studied as a potential carrier for bedaquiline, an agent against tuberculosis . The current study only considered molecules and complexes formed with up to 100 atoms to be affordable with our computational resources.
A set of descriptors was chosen to build the dataset, including molecular weight and pKa. Also, LogP was included, as a descriptor associated with the concentration of a given substance in the aqueous phase of a two-phase octanol–water mixture . Similarly, LogS, related to the water solubility of a substance, was considered. Besides, PSA, as molecular surface associated with charge accumulation due to heteroatoms and polar groups, as well as polarizability (α) associated with the tendency of a given molecule to acquire an electric dipole moment in the presence of an external electric field were taken into account. The mentioned QSAR/QSPR descriptors were obtained from the Drugbank dataset (https://go.drugbank.com). Initial drug structures and connectivity were also obtained from the simplified molecular input line entry specification (SMILES) retrieved from Drugbank.
Molecular mechanics and density functional-based tight binding (DFTB) with dispersion and solvation corrections were used to obtain the optimized structures of the molecules under study and to compute EHOMO, ELUMO, and ω as quantitative descriptors. As an alternative to the most robust but computationally more expensive density functional theory (DFT) method, DFTB was used. A reference electron density ρ0 represents the sum of the neutral atomic densities . Within the third-order approach DFTB3, the ground state density ρ(r) is obtained as the reference density ρ0 perturbed by density fluctuations δρ, that is,
(1)For all calculations within DFTB3, the 3OB parameter set was used . To carry out the global optimization procedure, Balloon 1.8.2 and DFTB+ 17.1 were used for the initial conformational study by genetic algorithms and final optimization at the DFTB3 level, respectively. London dispersion forces were considered in the DFTB3 and global optimization procedures by Lennard-Jones potentials, as implemented in UFF and MMFF94 force fields, respectively. The solvent effect was included by the Born solvation model within DFTB3. The study considered the chemotherapy drugs isolated and interacting with pristine C60 fullerene as well as its carboxylic acid derivative C60–COOH. Eight initial drug–fullerene structures were proposed to obtain their global optimization by means of DFTB3. The drugs were initially set at 1.5 Å of minimal distance from the fullerene. Once the global optimization was done, the same steps as for the isolated drugs were carried out for the molecular docking. The datasets were modified to take into account the effect of the fullerenes. Also, the validation set was reduced because of the large size of the complexes.
The atypical chemokine receptor 3, also known as CXCR7 or G-protein-coupled receptor 159 (GPR159) , was selected as the target protein for molecular docking. The iterative assembly refinement server (I-Tasser) was used to produce an initial structure for the CXCR7 protein by the homology approach. The sequence was extracted from the UniProtKB/Swiss-Prot dataset. From all homology structures produced by the I-Tasser server, the one with the highest confidence coefficients was selected to produce a reliable initial structure . The lowest-energy structure, as in the study of Muthiah and coworkers , was validated using PROCHECK to check the quality of the protein structure. The PDB produced with the previous step was subsequently optimized by an energy minimization through Amber force fields using the USCF Chimera 1.14 toolkit . The secondary structural features were stabilized by TMpred and HMMTOP during energy minimization. Last, the protein was prepared by setting atomic charges and hydrogen atoms and merging the nonpolar groups. Once the structures were optimized, molecular docking was performed with the CXCR7 protein, using Autodock Vina 1.1, to obtain the docking score, established hydrogen bonds, and the binding site (pocket). The above was done for all drugs in the dataset and an external validation set.
IBM Watson AI was used to build the models and to predict the docking score through the Extra Trees regressor algorithm . It was also used to obtain the most significant quantum descriptors used in each model. Extra Trees, an abbreviation of “extremely randomized trees”, is a mathematical method used to estimate a relationship between data and the covariates . The Extra Trees algorithm creates many decision trees , but the sampling of each one is random. Thus, a dataset for each tree contains unique samples. The optimization of the hyperparameters associated with the decision trees obtained was performed by the derivative-free global search algorithm known as RBfOpt, which fits a radial basis function mode to accelerate the discovery of the hyperparameters . All the above was used through the AutoAI tool within IBM Watson, an automatized routine to select the model with the best performance among those available in the platform. Since this method does not produce exportable mathematical models, another approach was used as detailed below .
Multiple linear regression (MLR) could be a tool to solve the problem in a complementary way to Extra Trees regression. MLR is a mathematical model that can be seen as an extension of linear regression. In terms of n input variables, x1, x2,…, xn, the outcome y can be expanded by the following linear expansion :
(2)In Equation 2, βk are the partial regression coefficients, and β0 is the value of y when all variables are set to zero.
To obtain the AI and MLR models, a fivefold approach was implemented, by using 80% of the data available to obtain the predictive model as training set and the remaining 20% as testing set. Supporting Information File 1 gives the results of the cross-validation for all the models reported in the current manuscript. Once the models were built, an additional external validation set was used to obtain evaluation metrics and to determine the most accurate models between methodologies. The metrics proposed to evaluate the performance of the predictive models were mean squared error (MSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean squared error (RMSE). These metrics were computed as follows:
and
Here, yi is the docking score for compound I, ŷi is the estimated value of the docking score for compound I provided by the model. The workflow diagram in Figure 1 summarizes the procedure followed to obtain the models.
Figure 1: Workflow diagram of the stages during modeling.
Results and DiscussionTable 1 presents the quantum descriptors proposed for the current study and the symbols used for them. The physical unit of each descriptor, as well as references to their usage in similar QSAR/QSPR models, were included as well.
Table 1: Quantitative descriptors proposed to model the docking score of the isolated drugs, as well as of those modified with fullerenes C60 and C60–COOH, interacting with the protein CXCR7.
Variable Descriptor Symbol Unit Reference x1 molecular weight MW g/mol x2 water solubility WS mg/mL x3 octanol–water partition coefficient LogP — x4 solubility coefficient LogS — — x5 acid dissociation constant pKa — x6 hydrogen acceptor count Ac — x7 hydrogen donor count Dn — x8 polar surface area PSA Å2 x9 rotatable bond count RBC — x10 polarizability α Å3 x11 number of rings NOR — x12 energy of HOMO EHOMO eV x13 energy of LUMO ELUMO eV x14 electrophilicity ω eV Isolated drugsA dataset containing all the descriptors of Table 1 for 33 drugs was created to obtain the predictive models. Also, another nine compounds were considered to build an external validation set, allowing for the comparison between methodologies (Supporting Information File 1, Table S1). In the case of the training set, the molecular weight was obtained with values between 130.08 and 915.4 g/mol. Water solubility values varied between 0.0004 mg/mL and 22.3 mg/mL. The LogP values varied between −2 and 6.54, whereas LogS ranged from −6 to −1.1. Besides, pKa values varied between −8 and 14.55. The hydrogen acceptor count varied widely between 2 and 13, whereas the hydrogen donor count varied between 0 and 6. In addition, the polar surface area had variations between 12.47 and 221.29 Å2. The cases of thiotepa and aldoxorubicin were not considered because they are part of the validation set. Rotatable bonds were obtained ranging from 0 to 15. The polarizability varied from 9.46 to 87.46 Å3; everolimus was excluded as part of the external validation set. Also, values of number of rings were obtained from 0 to 9. The energy of the HOMO was computed ranging from −7.400 to −4.392 eV, and the LUMO energy from −5.341 to −0.889 eV. Finally, the electrophilicity varied from 2.12 to 180.39 eV. Figure S1 (Supporting Information File 1) shows the correlation matrix between the ten most relevant quantum descriptors used to obtain the mathematical models. There are significant correlations between the molecular weight and the polarizability of about 0.93 and between polarizability and the number of rings of about 0.88. Also, molecular weight and number of rings, as well as WS and LogS exhibited considerable correlations, with values of 0.87 and 0.73, respectively. However, all variables showed a correlation below 0.95. Once the drugs were optimized, blind molecular docking was performed with the CXCR7 protein to obtain the docking score, number of established hydrogen bonds, and the protein residues interacting with the ligands in a coordination sphere of 3 Å. The results obtained with Autodock Vina for training–testing and validation sets are shown in Table 2.
Table 2: Docking score, number of established H-bonds, and protein–ligand interacting residue in three-letter symbol, up to 3 Å distance. Drugs marked with an asterisk were used as external validation set.
Ligand Docking score (kcal/mol) H-bonds Interacting residues doxorubicin −8.4 5 Asp275, Phe294, Leu297, Tyr200, His298, Arg197, Ser198, Cys196, Asp179, Trp100 neratinib maleate −8.5 2 Ile276, Leu297, Phe294, His298, Tyr 268, Gln30, Ser216, Ser198, Phe129, Asp179, Leu183, His121, Phe124 epirubicin −8.1 2 Asp275, Leu297, Tyr200, Ser198, Arg197, Cys196, His121, Trp100, Leu104, His298, Phe294, Gln301, lapatinib ditosylate −8.9 1 Glu213, Ile276, Val272, Leu297, Tyr200, His298, Arg197, Ser198, Leu183, His121, Cys196, fulvestrant −7.9 0 Leu297, Ile276 Val272, Tyr268, Gln301, Trp100, Phe124, Cys196, Asp179, Ser198, Ser216, Arg197 dinaciclib −8.4 1 Leu297, Gln301, Tyr268, Trp100, Cys196, Arg197, Phe124, Leu128, Ser198, Asp179, Phe129, Met212, Ser216 abemaciclib −9.8 1 Asn108, Trp100, Trp110, Leu104, Arg197, Lys40, Asn36, Pro38, Leu43, His298 gemcitabine −6.2 0 Asn36, Phe294, Met37, Pro38, Asn39, Leu43, Lys40, Leu104, Asn108 voruciclib −8.8 0 Asn36, Pro38, Asn39, His298, Leu43, Lys40, Leu104, Asn108, Trp100, Ser103, Trp110, Arg197 fluorouracil −4.6 1 Asn319, Asn321, Lys73, Ile83, His80 letrozole −8.1 0 His298, Gln301, Arg197, Leu104, Trp100, Ser198, His121, Trp110, Leu183 olaparib −10.1 2 Tyr268, Val272, Ile276, Glu213, Leu209, Ile205, Met212, Tyr200, Phe199, Ser198, Leu128, Phe124 paclitaxel −9.6 0 His298, Phe294, Leu297, Tyr268, Gln301, Val272, Leu104, Trp100, Phe124, His121, Trp110, Arg197, Ser198, Leu209, Tyr200, Leu183 seliciclib −7.6 0 Leu183, Asp179, Ser125, Phe124, Phe129, Leu128, Ser216, Tyr268, Leu297 ixabepilone −8.2 0 Leu297, Val272, Tyr268, Trp265, Phe124, Arg197, His298, Leu128, Ser198, Leu209, Tyr200, Met212, Ser216 anastrozole −7.7 0 Met212, Leu209, Ser216, Tyr200, Phe129, Asp179, Ser125, Ser198, Phe124, Leu183, His121 pentostatin −6.3 3 Phe294, Lys40, Leu104, Trp100, Asn108, Trp110 alvocidib −8.6 1 His298, Leu43, Lys40, Leu104, Trp100, Asn108, Ser103, Trp110, Arg197 methotrexate −8.5 3 Leu104, Gln301, Tyr268, Val272, Trp100, Trp110, Cys196, Phe124, His121, Ser198, Leu183, Asp179, Phe129, Glu213, Ser216 milciclib −9.2 0 Lys40, Phe294, Leu297, Leu104, Asn108, Tyr268, Arg197, Val272, His121 ribociclib −9.5 1 Lys40, Leu104, Leu43, Tyr195, Cys196, Arg197, Phe124, His121, Ser198, Leu183, Asp179 exemestane −8.0 0 Phe294, Leu43, Lys40, Leu104, Trp110 tamoxifen −7.9 1 Asn39, Pro38, Leu43, Lys40, Leu104, Trp110, Tyr195, Arg197 idarubicin −8.8 1 Phe294, His298, Leu43, Leu104, Asn108, Ser103, Trp110, Cys196, His121 palbociclib −8.3 1 Phe294, Asn36, Pro38, Leu43, Lys40, Leu104, Asn108, Arg197, Cys196 toremifene −7.4 0 Leu43, Asn39, Leu104, Pro38, Lys40, Asn36, Phe294, Trp110, Tyr195, Arg197 vinblastine −9.5 2 Leu43, Asn39, Leu104, His298, Phe294, Lys40, Asn108, Trp110, Arg197, Tyr195, Ile205, Leu209 pirarubicin −9.4 1 Val272, Leu297, Tyr268, Gln301, Met212, Ile205, Leu209, Glu213, Arg197, Phe199, Trp110, Cys196, Phe199, roniciclib −8.4 1 Asn39, Met37, His298, Phe294, Pro38, Asn36, Trp100, Leu104, Lys40, Ser103, Asn108, Trp110, capecitabine −7.7 2 Gln301, Phe129, Phe124, Ser198, Arg197, His121, Cys196, Trp110, Leu183 sulfanilamide −5.4 2 Tyr268, Val272, Glu213, Ser216, Leu128, Phe129, Met212, Ser125 zoledronic acid hydrate −5.7 1 Tyr268, Glu213, Ser216, Phe129, Phe124, Ser125, Ser198, Met212, Asp179 pamidronic acid −4.8 4 Val272, Tyr268, Trp265, Val217, Glu213, Ser216, Val215, Phe129, Met212, Val175, Phe124, Ser125, His121, Ser76, Tyr200, Asp179 docetaxel* −9.5 2 Leu209, Glu213, Met212, Ser198, Ser216, Arg197, Leu183, Phe124, His121, Trp110, Ile276, Val272, Leu297, Tyr268, His298, Gln301, Leu104 topotecan* −8.2 2 Tyr268, His298, Gln301, Trp100, Leu104, Trp110, Cys196, Phe124, Leu183, Ser198 tucatinib* −10.1 1 Val272, Tyr268, Leu104, Arg197, Ser198, Trp110, His121, Phe124, Asp179, Leu183 aldoxorubicin* −8.4 1 Ile205, Leu209, Tyr200, Ser198, Asp179, Arg197, Leu193, Asp179, His121, Phe124, Ser125, Trp110, Asn108, Leu104, His298, Gln301, Leu297, Val272, Tyr268 vinorelbine* −9.1 0 Leu297, Tyr268, Phe294, His298, Gln301, Leu43, Leu47, Leu104, Arg197, Trp110, Ser198, Tyr200, Leu209 etoposide* −9.3 2 Lys 40, Leu43, Asn108, Leu104, Trp100, Trp110, His121, Phe124, Leu183, Gln301 pemetrexed* −9.1 2 Leu104, Ser103, Trp100, Trp110, Cys196, His121, Phe124, Leu183, Ser198, Tyr200, Val272, Glu213, Ser216 everolimus* −9.2 3 Leu209, Met33, Asn36, Phe294, His298, Asn39, Leu43, Lys40, Leu104, Asn108, Tyr195, Arg197 thiotepa* −3.9 1 Tyr268, Val272, Ser216, Phe124The docking scores ranged from −10.1 to −4.6 kcal/mol for the training set. The molecule with the most significant bond strength, according to its docking score, was olaparib, whereas the one with the lowest bond strength was fluorouracil. The number of hydrogen bonds was computed ranging from 0 to 5. It is important to note that the number of hydrogen bonds is not directly related to the docking score since there are weak and strong hydrogen bonds. This assumption was proved by the analysis performed to obtain the predictive models, as discussed below.
According to the results of the interacting residues annotated in Table 2, two things can be highlighted. First, the analyzed isolated drugs bind inside the protein CXCR7 (Figure 2); second, the pocket is similar for several analyzed drugs. For example, the leucine residue Leu297 is shared by eleven drugs, indicating that their binding zone is close to each other. Comparing the interacting residues with those recently obtained by Muthiah et al. , it is possible to conclude that the pockets are similar. For instance, doxorubicin was obtained in both cases with Asp179, Cys196, and Trp100 as interacting residues. Also, similar pockets could be obtained because the selected drugs are mostly designed to serve as chemotherapy agents for breast cancer. Thus, it is possible to assume that several drugs share a common mechanism of action and, subsequently, a common protein target, such as CXCR7. For instance, gemcitabine, shown in Figure 2, shares the pocket within CXCR7 with several drugs.
留言 (0)