On the additive artificial intelligence-based discovery of nanoparticle neurodegenerative disease drug delivery systems

Introduction

Over time, there has been a significant shift in global dietary habits and lifestyle standards. Poor dietary choices, irregular eating patterns, extended working hours, and sedentary behaviors have contributed to a trend towards an unhealthy lifestyle . This shift has resulted in a rise in chronic degenerative diseases among the elderly population. These diseases encompass a diverse range of conditions characterized by the gradual deterioration of bodily structures and functions . Although the exact causes leading to these diseases remain unidentified, there is evidence that oxidative damage plays a crucial role in the progressive neuronal cell death, particularly through the generation of reactive oxygen and nitrogen species . In this regard, Alzheimer’s and Parkinson’s diseases are the most severe and untreatable conditions. Conventional drug treatment methods, such as acetylcholinesterase inhibitor drugs, often encounter obstacles due to their inadequate solubility, limited bioavailability, and inability to effectively penetrate the blood–brain barrier (BBB) . Therefore, there is an urgent need to focus on the advancement of novel neurodegenerative disease drugs (NDDs) . The major obstacle encountered by NDDs is the selectivity of the BBB, which limits the number of therapeutic substances able to reach the brain in order to induce a positive effect. Recently, many efforts have been made to develop systems that facilitate the passage of NDDs through the BBB.

Interestingly, nanoparticle (NP) systems are gaining increasing interest among the possible nanomedicine strategies for NDD transport to the central nervous system (CNS) . For simplicity, we are going to call them nanoparticle neuronal diseases drug delivery systems (N2D3Ss). N2D3Ss have the ability to protect NDDs from chemical and enzymatic degradation, direct the active compound towards the target site with a substantial reduction of toxicity for the adjacent tissues, and help the NDDs to pass physiological barriers, increasing bioavailability without resorting to high dosages . Therefore, researchers are studying and developing new treatment approaches that use N2D3Ss for diagnosis and treatment .

Also, over the last few years, artificial intelligence/machine learning (AI/ML) models have been applied successfully to solve problems in different disciplines, especially in the interface of chemistry and ND research . In this regard, we consider AI/ML to be helpful in the development of N2D3Ss to select the most efficient combination of NP and drug, taking into account properties regarding chemical absorption, distribution, metabolism, excretion, and toxicity (ADMET), and the biological activity regarding NDs . Nevertheless, there is relatively limited experimental data on NPs reported in the scientific literature in comparison to drugs, which increases the difficulty of designing systems based on AI/ML techniques.

An additional essential downside of developing N2D3Ss with AI/ML techniques is the great complexity of the data to be explored. As a result, N2D3S development by the additive approach requires an AI/ML technique to achieve multioutput and multilabel classification . In addition, the AI/ML technique includes a pre-processing step to perform information fusion (IF) of the preclinical NDD assay and NP cytotoxicity datasets. Nevertheless, most of the AI/ML methods reported to date only consider the structural/molecular descriptors of the NDDs or NPs as input. Therefore, these methods exclude completely non-structural parameters, specifically experimental conditions of the assays, in order to list NDD or NP labels. Consequently, the resulting model cannot predict multioutput properties and/or labels such as different organisms or cell lines . Sizochenko et al. reported a new methodology for NP safety estimation in different organisms . Predicting NP safety instead of biological activity has been the objective of other studies as well .

As a new strategy to tackle this problem, González-Díaz et al. have developed IFPTML, a multioutput, and input-coded multilabel ML method, which stands for information fusion (IF) + perturbation theory (PT) + machine learning (ML) algorithm . In recent investigations, the IFPTML model has shown to be a powerful tool in molecular sciences and NDD research for the analysis of big datasets that include both structural and non-structural parameters. Application examples are drug screening, protein targeting, the prediction of coated-NP drug release systems , multitarget networks of neuroprotective compounds for a theoretical study of new asymmetric 1,2-rasagiline carbamates , a TOPS-MODE model of multiplexing neuroprotective effects of drugs, an experimental/theoretical study of new 1,3-rasagiline derivatives potentially useful in neurodegenerative diseases , as well as QSAR and complex networks in pharmaceutical design, microbiology, parasitology, toxicology, cancer, and neurosciences . Furthermore, this new model also has been used for very similar systems to this research work such as NP systems, taking into account NP structure and coating agents, synthesis conditions of NPs and loaded drugs, cancer co-therapy drugs, or assay conditions . Here we developed IFPTML models for the proposal of N2D3Ss containing NDD and NP components.

Results and Discussion

In order to build the IFPTML models we carried out the steps shown in Figure 1, which shows the general workflow of all computational procedures in this study. For a better understanding of all steps, we enumerated them with 2.1, 2.2., and so on.

[2190-4286-15-47-1]

Figure 1: Detailed information processing workflow of the IFPTML models. Steps 2.1 and 2.2: data collection (ChEMBL dataset of NDDs and NP cytotoxicity dataset); step 2.3: data pre-processing and information fusion (NP and NDD assays); step 2.4: definition of objective and reference functions; step 2.5: calculation of the perturbation theory operator (PTO).

Figure 2 shows the connections regarding methodology and used databases to our previous publications. For each PTML model development, data download/compilation, data curation, and so on were carried out separately by researchers. First, the database of antineurodegenerative drugs (ADs) was downloaded from ChEMBL by Alonso and coworkers. These researchers employed this database to create advanced predictive models known as multitarget or multiplexing QSAR. These models are designed to forecast both the potential neurotoxicity and neuroprotective effects of drugs across various experimental setups, including multiple assays, drug targets, and model organisms . Later, Romero Durán et al. enriched the AD database and constructed multitarget networks of neuroprotective compounds to study new asymmetric 1,2-rasagiline carbamates. These authors developed a TOPS-MODE model to analyze the multiple neuroprotective effects of drugs and to conduct experimental/theoretical studies on new 1,3-rasagiline derivatives potentially useful in neurodegenerative diseases . Additionally, Romero Durán et al. expanded the AD database to develop artificial neural network (ANN) algorithms. These models were designed to forecast how ADs interact with targets within the CNS interactome . Speck-Planche et al. compiled manually a database of NPs from the literature. They constructed a QSAR model to investigate multiple antibacterial profiles of NPs under diverse experimental conditions. Furthermore, Ortega-Tenezaca et al. enriched the NP dataset and developed a PTML model for the discovery of antibacterial NPs . Diéguez et al. expanded the NP database and developed a PTML model in order to design antibacterial drug and NP systems .

[2190-4286-15-47-2]

Figure 2: Connection of the current IFPTML model to other PTML models developed by our research group.

In this study, we utilized the IFPTML model to investigate N2D3Ss, encompassing assays of ADs and preclinical assays for NPs. To achieve this, we conducted the IF of AD and NP databases, curated the data, combined the objective and reference functions, and calculated the PTO.

NDDs ChEMBL dataset

First, we collected the data of preclinical assays for NDDs from the ChEMBL dataset (see step 2.1. in Figure 1) . This dataset contained 4403 preclinical assays for 2566 NDDs (unique drugs), that is, approximately 1.71 assays for each drug. The information downloaded from ChEMBL included discrete variables cdj used to specify the conditions/labels of each assay. These variables are cd0, the biological activity parameter, cd1, the target protein involved in NDs, cd2, the cell line for NDD assays, and cd3, the model organism. Each one of these assays included one out of n(cd0) = 46 possible biological activity parameters (e.g., EC50 or Ki (nM)). They also involved some of the n(cd1) = 21 target proteins, n(cd2) = 7 cell lines (SH-SY5Y, CHO-K1, HEK293, PC-12, CHO, HEK-293T, and HuT78), and n(cd3) = 7 model organisms (Homo sapiens, Rattus norvegicus, Mus musculus, Cavia porcellus, Canis lupus familiaris, Macacafas cicularis, and Caenorhabditis elegans). The information downloaded from ChEMBL also included another set of discrete variables used to codify the nature/quality of data. These variables are cd4, the type of target, cd5, the type of assay, cd6, the data curation, cd7, the confidence score, and cd8, the target mapping. Specifically, the target types are n(cd4) = 6 (single protein, organism, tissue, non-molecular target, and ADMET), and the assay types are n(cd5) = 3 (binding, functional, and ADMET). In addition, data curation has n(cd6) = 3 different values (auto-curation, expert, and intermediate), the confidence scores are n(cd7) = 4 (9: direct single protein target assigned, 1: target assigned is non-molecular, 0: default value, that is, target assignment has yet to be curated, and 8: homologous single protein target assigned) and the target mapping is n(cd8) = 3 (protein, non-molecular target, and homologous protein). Furthermore, this database included the molecular descriptor Ddk = [Dd1, Dd2, Dd3] in order to define the chemical structure of the NDD compound. Specifically, we used two types of molecular descriptor for the i-th compound, namely Dd1 = logarithm of the n-octanol/water partition coefficient (LOGPi) and Dd2 = topological polar surface area (PSAi). The detailed information of this dataset is given in Supporting Information File 1 (datasheet “ChEMBL”).

NP cytotoxicity dataset

Simultaneously, we downloaded the data of preclinical assays for the cytotoxicity of NPs from different sources (see step 2.2. in Figure 1). We selected 62 papers from the scientific literature databases Pubmed and SciFinder . This dataset included 260 preclinical assays for 31 unique NPs. Therefore, the number of assays for each NP is about 8.39. Moreover, the data covered a huge range of properties of NPs such as morphology, physicochemical properties, coating agents, length, and time of assay. These properties were defined as discrete variables cnj applied to identify the conditions/labels of each assay. Then, we enumerated all particular conditions of each assay as a general vector cnj = [cn1, cn2, cn3,…, cnmax]. These variables are cn0, the biological activity parameter, cn1, the cell line, cn2, the NP shape, cn3, the measurement conditions, and cn4, the coating agent. Each of these assays involved at last one out of n(cn0) = 5 possible biological activity parameters (CC50, EC50, IC50, LC50, and TC50). They also include n(cn1) = 53 cell lines (e.g., A549 (H), RAW 264.7, and Neuro-2A (M)) and n(cn2) = 10 NP shapes (spherical, irregular, slice-shaped, needles, rods, elliptical, pseudo-spherical, polyhedral, pyramidal, and strips). In addition, they contain n(cn3) = 8 NP measurement conditions (dry, H2O, DMEM, RPMI, 1% Trion X-100/H2O, H2O/TMAOH, egg/H2O, and H2O/HMT) and n(cn4) = 16 coating agents (UC, PEG-Si(OMe)3, PVA, sodium citrate, 11-mercaptoundecanoic acid, PVP, propylamonium fragment, undecylazide fragment, CTAB, N,N,N-trimethyl-3(1-propene) ammonium fragment, potato starch, N-acetylcysteine, CMC-90, 2,3-dimercaptopropanesulfonate, 3-mercaptopropanesulfonate, and thioglycolic acid). The full information of this dataset is shown in Supporting Information File 1 (datasheet “NP”).

DNDS pair resampling IF processing of biological parameters

First, we described and acquired the objective value in order to design the IFPTML model for N2D3S. We defined the target function by applying the vectors of descriptors for all cases Dk to use as the input variable in the ML model. The target function is commonly achieved by a mathematical conversion of the original theoretical or observed feature of the scheme under analysis . In this IFPTML model, it includes two groups of observed values, specifically vij(cd0) and vnj(cn0). In addition, it contains two types of input vectors, Ddk and Dnk, for the preclinical NDD and NP assays, respectively. Moreover, in this dataset was a large number of different biological parameters cd0 and cn0. For example, there are properties such as half the maximum inhibitory concentration (IC50 (nM)), half the maximum effective concentration (EC50 (nM)), or the lethal concentration of a substance for an organism (LC50 (nM)). Another difficulty is that the majority of vij(cd0) and vnj(cn0) values collected are numbers with decimals. Furthermore, in order to acquire the optimum N2D3S, we prioritize some properties and deprioritize others. In this context, we introduced a “desirability” parameter to tackle this problem

The desirability value was established as d(cd0) = 1 or d(cn0) = 1 when the value of vij(cd0) or vnj(cn0) needs to be maximized, otherwise d(cd0) = −1 or d(cn0) = −1. The different NDD and NP properties/characteristics possess a large number of designations or labels cd0 and cn0, respectively, and increase the unreability of the data, making it more laborious to build a regression model. For example, in context of a specific case, biological activity parameters cd0 with d(cd0) = 1 are Bmax (fmol/mg), the total number of receptors expressed in the same units, activity (%), and Cp (nM). Whereas parameters with d(cd0) = −1 are, for example, EC50 (nM), IC50 (nM), and Imax (%). To address this problem, we used a cutoff value to divide AD and NP assays into favorable and non-favorable assays. It is worth mentioning that using a cutoff is a common practice in drug discovery processes. As a result, acquiring the final target function, the pre-processing of all observed vij(cd0) and vnj(cn0) values is crucial in order to remove or reduce imprecisions. Eventually, IF processing of the parameters vij(cd0) and vnj(cn0) enabled us to obtain a target function of the N2D3Ss.

We also used a cutoff to rescale the parameters of vij(cd0) and vnj(cn0) to obtain the Boolean (dummy) functions f(vij(cd0))obs and f(vnj(cn0))obs. These values were obtained as f(vij(cd0))obs = 1 if vij(cd0) > cutoff and d(cd0) = 1, or vij(cd0) < cutoff and desirability d(cd0) = −1; otherwise f(vij(cd0)) = 0. Similarly, f(vnj(cn0))obs = 1 if vnj(cn0) > cutoff and d(cn0) = 1, or vnj(cn0) < cutoff and d(cn0) = −1; else f(vij(cd0), vnj(cn0)) = 0. The values f(vij(cd0))obs = 1 and f(vnj(cn0))obs = 1 mean to have a positive desired effect of both NDDs and NPs. As a result, the target function was described as f(vij(cd0), vnj(cn0))obs = f(vij(cd0))obs·f(vnj(cn0))obs. Therefore, the outcome of the IF scaling f(vij(cd0), vnj(cn0))obs is determined by the i-th NDD compound and the n-th NP measurement conditions. The remaining cases, f(vij(cd0), vnj(cn0))obs = 0, indicate that at least one of the abovementioned conditions fail.

Definition of objective and reference functions IF phase for combining the references

After we obtained the target function, the next step is to describe the input variables of the IFPTML model. Input variable for this model is the reference function f(vij(cd0), vnj(cn0))ref. The function f(vij(cd0), vnj(cn0))ref plays an important role because this function characterizes the expected probability f(vij(cd0), vnj(cn0))ref = p(f(vij(cd0), vnj(cn0))ref = 1) for achieving the required level of activity for a specific property acquired from well-known systems. IFPTML uses values from well-known systems or subset systems as reference. Afterwards, this model includes the effect of different deviations (perturbations) of the query function from the reference function. Accordingly, f(vij(cd0), vnj(cn0))ref can be considered a function related to observed (not predicted) outcomes. In the above section, we mentioned the step of IF scaling to transform the original vij(cd0) and vnj(cn0) values into f(vij(cd0))obs and f(vnj(cn0))obs functions. When we acquire f(vij(cd0))obs and f(vnj(cn0))obs for all cases in our dataset, the next step is to quantify each of the positive outcomes n(f(vij(cd0))obs = 1) and n(f(vnj(cn0))obs = 1). Subsequently, in order to obtain the reference or expected functions (Figure 3), we divide the previous values by the entire number of cases for the NDD and NP systems separately. We describe these functions as f(vij(cd0))ref = p(f(vij(cd0))obs = 1) = n(f(vij(cd0))obs = 1)/n(cd0)j and f(vnj(cn0))ref = p(f(vnj(cn0))obs = 1) = n(f(vnj(cn0))obs = 1)/n(cn0)j. In this context, we can calculate the reference function directly to recognize the probability products for both subsystems f(vij(cd0), vnj(cn0))ref = p(f(vij(cd0), vnj(cn0))obs = 1) = p(f(vij(cd0))obs = 1)·p(f(vnj(cn0))obs = 1). It is worth mentioning that the usage of the reference function at this point is another representation of the IF (combination) of NDD and NP datasets.

[2190-4286-15-47-3]

Figure 3: Reference function calculation workflow.

PTO calculation IFPTML N2D3S data analysis

As we mentioned in the previous section, we acquired the results of many cytotoxicity preclinical assays of different NPs . Complementarily, we obtained the data of preclinical assays for NDDs from the ChEMBL database . It included the calculation of the vectors Dnk and Ddk of structural descriptors for all NPs and NDDs. In addition, we constructed the vectors cnj and cdj in order to list each label and assay condition for all preclinical assays of NPs and NDDs. Subsequently, we obtained the values ΔDdk(cdj) and ΔDnk(cnj) of the respective moving average deviation PTOs.

The NDD vector lists each element Ddk = [Dd1, Dd2]. Precisely, these elements are the NDD structural descriptors, which have enabled the development of various strategies to characterize and classify the structure of potential bioactive molecules . These structural descriptors are Dd1 = logarithm of the n-octanol/water partition coefficient (LOGPi) and Dd2 = topological polar surface area (PSAi). In contrast, the cytotoxicity NP vector lists the elements as Dnk = [Dn1, Dn2, Dn3, Dn4, Dn5, Dn6, Dn7, Dn8, Dn9, Dn10, Dn11, Dn12, Dn13, Dn14, Dn15, Dn16, Dn17, Dn18, Dn19, Dn20]. Specifically, they are Dn1 = NMUn (number of monomer units), Dn2 = Lnp (NP length), Dn3 = Vnu (NP volume), Dn4 = Enu (NP electronegativity), Dn5 = Pnu (NP polarizability), Dn6 = Uccoat (unsaturation count), Dn7 = Uicoat (unsaturation index), Dn8 = Hycoat (hydrophilic factor), Dn9 = AMR coat (Ghose–Crippen molar refractivity), Dn10 = TPSA(NO)coat (topological polar surface area using N,O polar contributions), Dn11 = TPSA(Tot)coat (topological polar surface area using N,O,S,P polar contributions), Dn12 = ALOGPcoat (Ghose–Crippen octanol/water partition coefficient), Dn13 = ALOGP2coat (squared Ghose–Crippen octanol/water partition coefficient (logP^2)), Dn14 = SAtotcoat (total surface area from P_VSA-like descriptors), Dn15 = SAacccoat (surface area of acceptor atoms from P_VSA-like descriptors), Dn16 = SAdoncoat (surface area of donor atoms from P_VSA-like descriptors), Dn17 = Vxcoat (McGowan volume), Dn18 = VvdwMGcoat (van der Waals volume from McGowan volume), Dn19 = VvdwZAZcoat (van der Waals volume from the Zhao–Abraham–Zissimos equation), and Dn20 = PDIcoat (packing density index).

PT data preprocessing

Apart from the vectors Ddk and Dnk, the IFPTML study takes into account all vectors cdj and cnj as parts of the non-numerical experimental conditions and labels for both NDD and NP preclinical assays. We calculated the PTOs of the NDD and NP preclinical assays including this additional information. We used Equation 1 and Equation 2 in order to obtain the moving average (MA) PTOs of NDDs and NPs. The PT model begins with the expected value of a well-known activity and adds the effect of different perturbations/variations to the system. Consequently, the model includes two different input variables, namely the reference or expected-value function f(vij)ref and the PT operators ΔDk(cj). Specifically, they are applied for accounting structural and assay information on NDDs and NPs. In addition, the PTOs ΔD(Ddk) and ΔD(Dnk) label structural and/or physicochemical characteristics of NDDs and NPs on the variables ΔD(Ddk) and ΔD(Dnk), respectively. Furthermore, the PTOs ΔD(Ddk) and ΔD(Dnk) classify biological assay data of NDDs and NPs with the variables ⟨D(Ddk)cdj⟩ and ⟨D(Dnk)cnj⟩, respectively. ⟨D(Ddk)⟩ and ⟨D(Dnk)⟩ are the representations of the average operator for counting all cases with the equivalent subset of methodology conditions cdj and cnj, respectively. Accordingly, they ought to provide exact values for a particular assay with minimum one altered element in methodology conditions of the vectors cdj or cnj. In this regard, they can specify which assay we are referring to . Another kind of PTOs involved in this model is the NDD–NP coating agent moving average balance (MAB) PTO ΔΔD(Dca1, Dca2, Ddk) (Equation 3). The MAB PTO takes into consideration the likenesses between the information on NDDs and the NP coating agent. Furthermore, PTOs centered straightly on MA and/or linear and non-linear conversions of MA have been applied for NDD and NP development in previous research work . The MAS is another way of expressing the combination of IF and PT cumulative procedures of NDD and NP datasets.

[2190-4286-15-47-i1](1) [2190-4286-15-47-i2](2) [2190-4286-15-47-i3](3) IF phase and proposal of training and validation series subsets

To develop the ML models, each of the sample cases are assigned to either the training (subset t) or validation (subset v) series. The process of assignment ought to be random, illustrative, and stratified . Because of the nature of this combinatory system, our sampling also has to take into account the IF scaling procedure. Initially, we obtained the NDD activity dataset from the open database ChEMBL, which has been compiled from primary published literature. The preclinical NP cytotoxicity assays were acquired from journal articles. Afterwards, we prepared each case as the following labels cd0, cd1, cd2, cd3, cd4, cd5, cd6, cd7, cd8, cn0, cn1, cn2, cn3, and cn4. These cases were organized by ranking the labels alphabetically from A to Z (as we mentioned before, they are non-numeric variables in nature). The preference order of the labels on the procedure of ranking was cd0 → cn0 → cd1 → cn1 → cd2 → cn2→ cd3 → cn3. In other words, we organized the cases first by cd0, then by cn0, and so forth. This preference order considers the IF step by interchanging labels from AD and NP datasets. Afterwards, we assigned three quarters of the cases to subset t and the remaining quarter to subset v. This random assignment improves the likelihood that nearly all categories of individual labels are denoted by subsets t and v (stratified or proportional random sampling). In addition, this boosts the possibility that practically all cases for each label are in a distribution of 3/4 in subset t and 1/4 subset v, known as representative sampling. It is worth mentioning that the 75% and 25% proportion between training and validation is the most used one in big data analysis .

IFPTML-LDA model

The IFPTML N2D3S model utilizes as input variables the PTOs specified in the previous section to codify information of the putative N2D3Ss with their corresponding subsystems NDD and NPs. Combining objective function f(vij, vnj)obs and reference function f(vij, vnj)ref and adding the IF PTOs ΔΔD(Dc1, Dc2, Ddk), we obtained the output function f(vij, vnj)calc. This function carries out dataset crosscut classification of NDD and NP information. The generic equation for the IFPTML linear model is the following (Equation 4):

[2190-4286-15-47-i4](4) Generalities for IFPTML model training and validation series

In many big data systems, the linear discriminant analysis (LDA) model is the most commonly used tool to seek the preliminary model because of the simplicity of this technique. In this regard, within this model we applied a forward stepwise (FSW) process that can select automatically the most essential input variables for N2D3Ss. We obtained all results by using the software STATISTICA 6.0 . Afterwards, we applied the expert-guided selection (EGS) heuristic in order to retrain the LDA method using the most crucial parameters selected by the FSW process along with other missing aspects. All IFPTML models were obtained by calculating different statistical parameters, specifically sensitivity (Sn), specificity (Sp), accuracy (Ac), chi-square (χ2), and the p-level .

IFPTML-LDA vs cross linear model

In the Introduction section, we indicated the use of ML approaches as a promising strategy in order to tackle practical problems of nanotechnology, such as reducing the number of experiments . In this paper the IFPTML method was used to combine preclinical assays of NDDs and NPs. Speck-Planche et al. described multiple IFPTML approaches regarding toxicity and drug delivery of NPs with a large number of species under a wide variety of experimental conditions. However, this study did not take into account the NDDs . In contrast, Nocedo-Mena et al. reviewed an IFPTML method to explore the activity of NDDs against numerous species and under different assay conditions; but this research they did not consider NPs as part of the system . Accordingly, these models could not take into consideration both components (NDD and NPs) of the N2D3Ss. In our group, Dieguéz-Santana et al. for the first time applied successfully the IFPTML technique to study the combination of multiple antibacterial drugs and preclinical assays on the cytotoxicity of NPs . In this paper, we used this new approach to develop complex N2D3Ss containing NDDs and NPs, taking into account, among other things, NDD assays, NP types including coating agents, and NP morphologies. To complete the IF scaling process, we calculated the objective function f(vij, vnj)obs = f(vij)obs·f(vnj)obs. The main purpose of this function is to increase the effect of certainty and maintain the homogeneity of scales. Once the PTOs were obtained, we applied ML methods so as to fit f(vij, vnj)obs and to achieve the IFPTML models. As indicated in the previous section, we classified the preclinical NDD assays, cdj, onto two different partitions (subsets) of variables cI and cII. The partition cI defines the biological characteristics; it contains, among other things, cd0 = biological activity parameters of NDDs (e.g., IC50, Ki, potency, and time) and cd1 = type of proteins involved in the NDs. The partition cII defines the data quality; it contains, among other things, cd4 = type of target and cd5 = type of assay. For the preclinical NP cytotoxicity assays, cnj forms only one partition cIII, which describes its nature and involves cn0 = biological activity parameters of the NPs (e.g., CC50, IC50, LC50, and EC50), cn1 = cell lines, cn2 = NP morphology, and cn3 = NP synthesis conditions. In addition, we acquired two types of IFPTML-LDA model for designing the N2D3Ss. On the one hand, we obtained the IFPTML-LDA by calculating the PTOs ΔDk(cj) as the difference between the average value ⟨Dk(cj)⟩ and the partition cn within of their own set. As result, the best IFPTML-LDA model found is as follows (Equation 5):

[2190-4286-15-47-i5](5)

On the other hand, we tested the possibility to improve the results of statistical parameters for the IFPTML-LDA algorithm. To this end, we calculated the PTOs ΔDk(cj) by performing all possible combinations among the average values ⟨Dk(cj)⟩ of both vectors Dnk and Ddk with each partition. As a result, we obtained three different combinations of crossing PTOs for each sample, one for NDDs (ΔDdk(cIII)) and two for NPs (ΔDnk(cI) and ΔDnk(cII)). For simplicity, they are named “IFPTML-LDA with cross” (see more details in Figure 1). The best IFPTML-LDA found with the cross model is the following (Equation 6):

[2190-4286-15-47-i6](6)

The output function f(vdij, vnij)calc provides a real numeric value that will probably be applied to counting N2D3Ss. This function was acquired by calculating the objective function f(vij(cd0), vnj(cn0))obs with the ML method making use of the PTOs. The characteristic of the IFPTML models was defined by the statistical parameters sensibility (Sn), specificity (Sp), accuracy (Ac), chi-square test (χ2), and p-level . The results summary collected in Table 1 contains the statistical parameters for the best models found (Equation 2) for each sample (standard IFPTML-LDA and IFPTML-LDA with cross) are collected in Table 1. The statistical parameters obtained for both methods were in the accuracy range described for the classification model of ML algorithms . The standard IFPTML-LDA contains all indispensable variables for defining the NDD structures and the most significant parameters for NPs, such as morphology, size, and assay conditions, among other things. In the IFPTML-LDA with cross system, we included not only all essential variables but also two crossing PTOs. These new PTOs were chosen by the FSW method, which can select the most influential variable in the system under study.

Table 1: IFPTML-LDA N2D3S model results summary.

Data Stat. Param. Without cross
Subset predicted Param. With cross
Subset predicted Sample Set Subset Param. (%) 0 1 (%) 0 1 1 t 0 Sp 73 255190 94292 72.2 252534 97042 1 Sn 71 7398 18120 74.4 6517 18907 v 0 Sp 73.3 85369 31125 72.3 84183 32315 1 Sn 70.3 2522 5984 73.9 2218 6284 2 t 0 Sp 70 244548 105076 79.5 277907 71717 1 Sn 62.1 9528 15848 70.1 7584 17792 v 0 Sp 70 81640 35009 79.7 92929 23720 1 Sn 63.1 3081 5270 70.7 2451 5900 3 t 0 Sp 70.6 246551 102809 79.6 277921 71439 1 Sn 62.3 11616 15974 70.1 7668 17972 v 0 Sp 70.7 82370 34174 79.6 92726

留言 (0)

沒有登入
gif