The round-robin approach applied to nanoinformatics: consensus prediction of nanomaterials zeta potential

Abstract

A key step in building regulatory acceptance of alternative or non-animal test methods has long been the use of interlaboratory comparisons or round-robins (RRs), in which a common test material and standard operating procedure is provided to all participants, who measure the specific endpoint and return their data for statistical comparison to demonstrate the reproducibility of the method. While there is currently no standard approach for the comparison of modelling approaches, consensus modelling is emerging as a “modelling equivalent” of a RR. We demonstrate here a novel approach to evaluate the performance of different models for the same endpoint (nanomaterials’ zeta potential) trained using a common dataset, through generation of a consensus model, leading to increased confidence in the model predictions and underlying models. Using a publicly available dataset, four research groups (NovaMechanics Ltd. (NovaM)-Cyprus, National Technical University of Athens (NTUA)-Greece, QSAR Lab Ltd.-Poland, and DTC Lab-India) built five distinct machine learning (ML) models for the in silico prediction of the zeta potential of metal and metal oxide-nanomaterials (NMs) in aqueous media. The individual models were integrated into a consensus modelling scheme, enhancing their predictive accuracy and reducing their biases. The consensus models outperform the individual models, resulting in more reliable predictions. We propose this approach as a valuable method for increasing the validity of nanoinformatics models and driving regulatory acceptance of in silico new approach methodologies for the use within an “Integrated Approach to Testing and Assessment” (IATA) for risk assessment of NMs.

Introduction

Nanotechnology, defined as the ability to manipulate matter at the nanoscale, has opened an array of possibilities for multiple applications that take advantage of the unique properties of nanomaterials (NMs). From targeted drug delivery to environmental sensing, the versatility of NMs makes them ideal candidates for a broad range of innovative applications . However, the complexity and unique properties of these materials also present significant challenges, especially when it comes to the assessment of their potential adverse effects. The integration of in silico new approach methodologies (NAMs) within the area of nanotechnology has created a plethora of possibilities for the assessment of NM properties and toxicity to support and/or substitute traditional experimental methodologies .

The field of nanoinformatics covers a broad range of computational and data-driven methodologies for the exposure, hazard, and risk assessment of NMs, such as quantitative structure–activity relationship models adapted to the specificities of NMs (nanoQSAR) and grouping/read-across models, specifically developed to accurately predict NMs’ properties when small datasets are available . These in silico methodologies can be used in the early steps of the “safe-and-sustainable by design” framework and in the development of novel NMs to filter out unpromising candidates and prioritize NMs with desired properties. The rational use of in silico methods allows for the identification of potential hazardous effects caused by NMs’ interactions with biological systems with a simultaneous decrease of workload, cost, research duration, and use of laboratory animals. Several computational approaches and predictive models have been presented recently for predicting various NM properties and toxicity effects.

The combination of multiple NAMs, both experimental and computational, within an “Integrated Approaches to Testing and Assessment” (IATA) framework will further improve the entire risk evaluation of NMs and accelerate regulatory decision-making procedures . An IATA scheme for the prediction of the short-term regional lung-deposited dose of inhaled inorganic NMs in humans following acute exposure and the longer-term NM biodistribution after inhalation, has already been presented . Another example of an IATA is the combination of predictions from two or more individual models under a consensus framework. Consensus models combine outputs from several individual models built upon different sets of descriptors and/or machine learning (ML) algorithms, leading to more trustworthy results and enhancing stakeholders’ confidence. In detail, as each individual model covers a specific area of the descriptor/property space, by combining them it is possible to capture a wider range of factors that influence the relationship between the NMs’ independent variables and the endpoint and, thus, to approach the problem from different perspectives. Furthermore, by combining different models, it is possible to address the limitations of each model and to achieve more precise predictions (e.g., by avoiding the overfitting phenomenon when small training datasets are involved) . Prediction combination can be performed in a regression problem through an arithmetic average or via a weighted average scheme . It has been demonstrated that consensus QSAR models exhibit lower variability than individual models, resulting in more reliable and accurate predictions . In the area of nanoinformatics, various consensus approaches have been proposed over the past years for the prediction of different NM endpoints, such as NMs’ cellular uptake , zeta potential (ZP) , and electrophoretic mobility .

The complexity of predictive models requires the development of standardized protocols to ensure their accuracy and robustness. Just as laboratory experiments rely on repeatability and reproducibility to validate results, computational methods require similar validation processes. Special emphasis is given to the predictive accuracy of models. For this purpose, it is sought that nanoinformatics models comply with a set of predefined criteria, often supplemented by statistical methods recommended by the Organisation for Economic Co-operation and Development (OECD) and the European Chemicals Agency (ECHA) . In addition, there is a growing effort from various groups to enhance the transparency and, consequently, the reproducibility of their results by delivering standardized reports along with their models (e.g., QSAR model reporting format (QMRF) and modelling data (MODA) reports). By documenting computational steps through the standardized reports, it is possible to deliver reproducible models within and between computational groups, and over time, and to conduct interlaboratory comparisons (ILC) or round-robin (RR) tests on the models and their outputs, like those performed in laboratory settings to validate a new test method or protocol .

The computational prediction of the ZP of NMs (Figure 1) has been of high interest in the area of nanoinformatics during the last decade, given the role of surface charge in determining NMs interactions with membranes and in driving toxicity, whereby positively charged particles are generally more toxic than negatively charged particles of similar composition . In fact, several in silico models for the ZP have been developed based on different theoretical and experimental descriptors employing a range of approaches, that is, quantitative structure–property/feature relationship (QSPR/QSFR) modelling, read-across, and deep learning models. Mikolajczyk et al. implemented a consensus nano-QSPR scheme for the prediction of the ZP of metal oxide nanoparticles (NPs) based on the size and a quantum mechanical descriptor encoding the energy of the highest occupied molecular orbital per metal atom of 15 metal oxide NPs. Toropov et al. developed, for a set of 15 metal and metal oxide NPs, a QFPR model considering both the NPs’ molecular structure and the experimental conditions, encoded in quasi-SMILES. Furthermore, research has explored the computational assessment of the ZP in media besides water. Wyrzykowska et al. proposed a nano-QSPR model for the prediction of the ZP of 15 NPs in a low-concentration KCl solution considering the NPs’ ZP in water and the periodic number of the NPs metal.

Figure 1: Schematic representation of a negatively charged uncoated spherical NM. The ZP corresponds to the electric charge at the slipping plane.

Read-across approaches presented to date include a k-nearest neighbours (kNN) model developed by Varsou et al. to predict the ZP of 37 metal and metal oxide NPs based on their core type and the NPs main elongation (image descriptor derived from microscopy images). Papadiamantis et al. developed a kNN/read-across model for the estimation of the ZP of 69 pristine and aged NPs, considering the size, coating, absolute electronegativity, and periodic table descriptors. Finally, advances of artificial intelligence (AI) have been also considered in the computational assessment of the ZP. Yan et al. employed deep learning techniques and developed a convolutional neural network to predict the ZP of 119 NPs based on their nanostructure images. The abovementioned studies are indicative examples of models that have been used for the computational assessment of NPs ZP. As research progresses, such models are expected to become increasingly sophisticated and accurate, contributing to a deeper understanding of NP behaviour in diverse environments.

The diversity of datasets and endpoints measured is challenging when comparing or combining results between different studies, making it crucial to ensure that data are compatible in terms of metadata (e.g., used experimental protocol). Similarly, models developed using different sets of descriptors need to have a basis for comparison in order to drive regulatory acceptance of models. To address this challenge, under the NanoSolveIT EU project (https://nanosolveit.eu/) the first RR approach in nanoinformatics was implemented, to computationally assess the ZP of NPs. The RR exercise involved four groups (NovaM, NTUA, QSARLab and DTC Lab), from both academia and industry, from four countries (Cyprus, Greece, Poland, and India) who were asked to develop individual models for the prediction of the ZP based on a common dataset of metal and metal oxide-cored NPs. In this way, different descriptors were employed, and various modelling approaches were applied, including QSAR type and read-across models. The developed models were later integrated into a consensus modelling scheme by combining the predictions of the individual models through average and weighted average, to acquire more robust and stable results. While the dataset’s extent and, consequently, the generated models’ applicability domain are rather limited, this initiative underscores the potential of synergistic approaches in the nanoinformatics field. By leveraging the collective knowledge of diverse teams and perspectives, these approaches can effectively assess the properties and toxicity of NPs and democratize decision-making processes in the assessment of NMs’ exposure, hazard, and risk.

Materials and Methods Data overview

A dataset of 71 pristine engineered NMs was explored in silico in order to predict their ZP based on physicochemical and molecular descriptors. The physicochemical characterization of the NMs was performed under the EU-FP7 NanoMILE project (https://cordis.europa.eu/project/id/310451) . From the available descriptors/properties , the following four were included in this study because of the completeness of the data (absence of data gaps): the NMs’ core chemistry, coating, morphology, and hydrodynamic diameter measured using dynamic light scattering (DLS). The ZP of the NMs was measured in water (pH 6.5–8.5). To enrich the library of the NMs’ physicochemical properties and increase the amount of available information, the corresponding sphere diameter (the diameter of the sphere with a surface area equal to the area of the NM) was calculated, as well as three molecular descriptors commonly used in nanoinformatics studies . These descriptors were chemical formula-related descriptors, specifically the numbers of metal and oxygen atoms present in the core’s chemical formula and the molecular weight of the core compound.

Finally, the Hamaker constants of the NMs were calculated in vacuum and in water using the NanoSolveIT Hamaker tool (https://hamaker.cloud.nanosolveit.eu/). The Hamaker constant is a material-specific value that quantifies the strength of van der Waals interactions between NPs, depending on the materials and the surrounding medium. A higher (positive) Hamaker constant indicates stronger attractive forces, while a negative value suggests repulsive interactions between the NPs, preventing aggregation or agglomeration. These calculations were performed considering spherical and uncoated NMs. The balance between the Hamaker constants (expressing van der Waals attraction between particles) and the ZP values of particles (expressing their electrostatic repulsion) controls the stability of colloidal dispersions according to the Derjaguin–Landau–Verwey–Overbeek (DLVO) theory . For the computational analysis, the TIP3P force field was employed for water, while the DREIDING force field was used for the NMs. In the case of Zr-doped CeO2 NMs (CexZryO2), the same density as for pure CeO2 NMs was considered to maintain consistency. It should be noted that the different working groups were free to enrich or transform the above-described dataset, as it is explained in the next sections, to cover a wider feature space with each individual model. All the information about the available descriptors is summarised in Table 1. The entire dataset used in the models can be found in the Supporting Information File 1 of this publication.

Table 1: Available descriptors in the dataset used to build the individual ZP models (five models from four labs).

Descriptor Symbol Unit chemical formula CF — equivalent sphere diameter Dsph nm shape group Shape — coating CT — hydrodynamic diameter measured by DLS DLS nm molecular weight MW g/mol Hamaker constant of NMs in vacuum A11 × 10−20 J Hamaker constant of NMs in water A132 × 10−20 J number of metal atoms Nmetal — number of oxygen atoms Noxygen — sum of ionization potential energy of metals Metals_SumIP kJ/mol a read-across-derived composite function that encodes chemical information from all the selected structural and physicochemical features RA function coefficient of variation of the similarity values of the close source compounds for a particular query compound CVsim total number of atoms in a molecule Tot num atoms weighted standard error of the observed response values of the close source compounds for a particular query compound SE weighted standard deviation of the observed response values of the close source compounds for a particular query compound SD Activity standard deviation of the similarity values of the close source compounds for a particular query compound SD Similarity average similarity values of the positive close source compounds for a particular query compound Pos.Avg.Sim average similarity values of the negative close source compounds for a particular query compound Neg.Avg.Sim the log-transformed hydrodynamic diameter measured by DLS LOG_DLS similarity value of the closest positive source compound MaxPos Banerjee–Roy similarity coefficient 1 [Graphic 1]

Modelling techniques kNN/read-across model

The kNN/read-across model employs the k-nearest neighbours approach, an instance-based method that predicts the endpoint of a sample based on its k nearest neighbours in the data space. The proximity between samples is measured using Euclidean distance, which is adjusted slightly for categorical descriptor values using a binary value (0 in the case of same class data points or otherwise 1) . The endpoint prediction, in this case the ZP value, is the weighted average of the endpoint values of the k closest neighbours, with each neighbour’s weighting factor inversely proportional to its distance from the evaluated sample .

The kNN algorithm can be incorporated into the general NMs read-across framework because it relies on the similarity of neighbouring NMs to estimate the endpoint of interest. Specifically, by identifying and analysing the resulting groupings, it is possible to map the prediction space into distinct clusters of k neighbours that can subsequently be explored to identify patterns and similarities within the neighbourhood space, in accordance with the ECHA’s read-across framework. The EnaloskNN functionality offers the advantage of not only delivering predictive results but also identifying the specific neighbours and their Euclidean distances, as well as enabling visualization of the overall prediction space .

Random forest regression model

Random forest regressor is an ensemble learning, tree-based method. It combines multiple decision tree predictors to create a more robust and accurate prediction, which individual trees cannot always provide. This algorithm constructs a forest of independent trees. Each tree is being trained on a random subset of data and features. The regressor’s output is calculated based on the average predictions from all individual trees. Some benefits of this algorithm besides its robustness include resistance to overfitting and the ability to process datasets with numerous variables without the need of feature scaling . This algorithm was implemented in Python, using scikit-learn package, a widely used library for ML models.

Adaboost regression model

The development of the ZP QSPR model involved the utilization of the Adaptive Boosting (AdaBoost) ML methodology, implemented through Python 3.8.8 and the scikit-learn library (version 0.24.1). AdaBoost represents an early instance of leveraging boosting algorithms to address complex problem types within the domain of ML . Like its counterpart, the random forest algorithm, AdaBoost employs a multitude of elementary classifiers to enhance the model’s predictive ability. In brief, the AdaBoost model comprises an ensemble of multiple “weak” estimators, such as decision trees, each possessing modest individual predictive prowess. However, when integrated into an ensemble, they collectively augment the predictive efficiency of the model. A notable distinction between the random forest algorithm and AdaBoost lies in their operational frameworks. In the random forest, individual estimators function independently of each other, operating in parallel. In contrast, in AdaBoost, the prediction process within the ensemble unfolds sequentially, with each subsequent estimator’s outcome influenced by its predecessor.

Stacked PLS and MLP q-RASPR models

The q-RASPR approach, combining read-across and QSPR, has been recently introduced and applied to the prediction of NM cytotoxicity , power conversion efficiency of organic dyes in dye-sensitized solar cells , detonation heat for nitrogen containing compounds , and to the prediction of surface area of perovskite materials . Both the QSPR and read-across approaches are extensively used for data gap filling (predicting activity/property/toxicity values of compounds devoid of experimentally derived endpoint values). Recently, Luechtefeld et al. introduced the concept of classification-based read-across structure–activity relationship (RASAR) by combining the concepts of read-across and QSAR using ML algorithms. Banerjee and Roy merged chemical read-across and regression-based QSAR into quantitative RASAR (q-RASAR). Several ML models can be applied including partial least squares (PLS), linear support vector regression (LSVR), random forest regression, Adaboost, multiple layer perceptron (MLP) regression, and kNN regression. This study reports the first application of q-RASPR in a stacked modelling framework.

Apart from the supplied structural and physicochemical information of the engineered NMs, we have computed descriptors based on the periodic table using the tool Elemental Descriptor Calculator (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/other-dtc-lab-tools). The complete descriptor pool underwent feature selection using stepwise selection and a genetic algorithm to obtain a reduced descriptor pool consisting of 72 descriptors. A grid search/best subset selection was applied to this reduced descriptor pool to obtain a combination of ten different QSPR descriptors. Additionally, log-transformed hydrodynamic diameter (LOG_DLS) was taken as an additional descriptor. These eleven QSPR descriptors were used to define similarity among the source and query compounds, which is an integral part of the computation of the RASPR descriptors using the tool RASAR-Desc-Calc-v3.0.2 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. This tool uses three different algorithms for computing similarity, that is, Euclidean distance-based, Gaussian kernel similarity-based and Laplacian kernel similarity-based. The selection of the best similarity measure and the optimization of the associated hyperparameters were performed by dividing the training set into calibration and validation sets, which were supplied as inputs for the tool Auto_RA_Optimizer-v1.0 available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. The combination of hyperparameters that generated the best predictions for the validation set was selected as the optimized hyperparameter setting and used to compute the RASPR descriptors for the training and test sets. Clubbing of the initially selected eleven QSPR descriptors with the RASPR descriptors was performed, a process known as data fusion . This complete data pool underwent feature selection to generate four different MLR q-RASPR models. The predictions from these models were generated for both the training and test sets since these predictive values will serve as descriptors for the final stacking regressors. Finally, PLS and MLP modelling algorithms were employed as the final stacking regressors, where the optimized settings of the hyperparameters were obtained by grid search on the cross-validation statistics.

Consensus modelling

The meta-modelling approach allows one to use the output of one modelling approach as an input to another or the use of a few models/algorithms in parallel or in sequence, allowing for the strengths of individual models to be combined and their limitations to be circumvented . Consensus modelling is based on the parallel approach where multiple ML algorithms are used to investigate the available dataset and to find relationships between the considered NMs’ features and the physicochemical descriptors or biological activity of interest. Each ML algorithm has its strengths and weaknesses; thus, there is no universal solution for modelling regression or classification cases. The choice of the adequate ML method depends on the problem to be solved and the available data, and in some cases multiple methods are employed to decide which one works best for each case . Depending on the amount of available data, different methods may be applied. In general, support vector machines, decision trees, random forests, and neural networks are methods good in generalisation of trends or behaviours and can lead to accurate predictions. However, in cases of small datasets, the same ML methods may lead to the overfitting and low predictivity of the model for untested samples. The idea of consensus modelling by combining a set of diverse algorithms for the prediction endpoint of interest is an efficacious manner to achieve reliable results of data-driven analysis. However, this approach is also open to criticism that it is even more “black box” than the individual models; thus, even more care needs to be taken to fully document the predictive models with their QMRFs reports and to fully describe the underpinning datasets.

Here, a consensus strategy was employed in addition to the individually developed models, based on the combination of the predictions from the initial models generated by the four groups NovaM, NTUA, QSARLab, and DTC Lab. Two techniques were used to derive consensus predictions, namely, the simple average of the predictions of the individual models and the weighted average of the original predictions. Simple averaging combines the predictions of all individual models equally, while weighted averaging assigns more weight to models with higher individual performance. This combination aims to leverage the strengths of each model, reducing individual biases and enhancing overall prediction accuracy.

Validation

In line with the OECD QSAR model validation principles , all models presented in this work were validated externally using the exact same training and test sets, which were produced by randomly dividing the original dataset using a ratio of 0.75:0.25. The training subset was used each time to calculate and adjust the model parameters, whereas the test subset was not involved in model development, and it was used as an external validation set to assess the model’s generalization on new (previously unseen) data, which is crucial for its practical application in regulatory settings.

According to the OECD’s fourth principle , statistical model validation is indispensable for assessing a model’s performance. To quantify the model’s accuracy, appropriate “fitness” metrics were employed, ensuring that the models’ predictions closely align with their actual values. This validation process helped to prevent underfitting and overfitting phenomena. Upon training, the models generated endpoint predictions for both the training and test subsets. The training subset predictions served to evaluate each model’s goodness-of-fit, while predictions on the test subset assessed the model’s predictability, for example, its ability to generalize well to new data . The statistical criteria used to evaluate model performance are outlined below. These metrics collectively provide a comprehensive assessment of model accuracy and reliability.

The mean absolute error (MAE, Equation 1) and the root mean squared error (RMSE, Equation 2) were used to evaluate the accuracy of the models applied on both train and test sets. MAE measures the average magnitude of errors in predictions, while RMSE provides a quadratic scoring rule that gives higher weight to larger errors. When these indexes are used simultaneously, they permit a complete and thorough validation of prediction accuracy, regardless of the training and test endpoint values’ distribution level. MAE and RMSE values closer to 0, correspond to more reliable models.

(1)

(2)

where N is the number of samples, and yi and [Graphic 3] are the actual and predicted endpoint values of the i-th sample, respectively.

The quality-of-fit between the predicted and experimental values of the training and test sets was expressed by the coefficient of determination (R2, Equation 3), which indicates the proportion of variance in the dependent variable that is predictable from the independent variables. R2 values closer to 1, correspond to models that fit the dataset better.

(3)

where N is the number of samples, yi and [Graphic 4] are the actual and predicted endpoint values of the i-th sample, respectively, and [Graphic 5] is the average value of the experimental endpoint values.

To quantify the credibility of predictions on new data (including the test set), the external explained variance is used ( [Graphic 6] or [Graphic 7] , Equation 4), which compares the predictions for the test set samples with their actual endpoint values. [Graphic 8] values closer to 1, correspond to models with higher predictive power.

(4)

where N is the number of test samples, yi and [Graphic 9] are the actual and predicted endpoint values of the i-th test sample, respectively, and [Graphic 10] , is the averaged value of the experimental endpoints of the training set.

Another variant of the external explained variance is [Graphic 11] (Equation 5) which uses the averaged value of the experimental endpoints of the test set ( [Graphic 12] ).

(5)

The produced models were validated internally by employing leave-one-out (LOO) cross-validation on the training set, to ensure that the model is robust and no single data point is actually responsible for the enhanced quality of fit. The performance in LOO cross-validation was assessed by calculating [Graphic 13] (leave-one-out Q2), a form of cross-validated R2 of the predictions (Equation 6) .

(6)

where N is the number of training samples, yi and [Graphic 14] , are the actual and predicted from LOO cross-validation endpoint values of the i-th sample, respectively, and [Graphic 15] is the average value of the experimental training endpoint values.

Finally, the quality-of-fit and the predictive ability of the models is assessed using the statistical metrics proposed by Golbraikh and Tropsha (Equations 7–11, including [Graphic 16] , Equation 6) on the test set. According to Golbraikh and Tropsha a regression model is considered predictive if all of the conditions presented in Table 2 are satisfied.

where N is the number of samples, yi and [Graphic 17] are the actual and predicted endpoint values of the i-th sample, respectively, and [Graphic 18] and [Graphic 19] are the average endpoint values of the experimental and predicted values, respectively.

Table 2: Model acceptability criteria as defined by Golbraikh and Tropsha .

Statistic Rule r2 >0.6 [Graphic 20]

>0.5

<0.1 k or k’ ∈[0.85,1.15] [Graphic 22]

<0.3 Applicability domain

To ensure the robustness and reliability of predictive models, particularly adhering to the OECD guidelines, defining the applicability domain (AD) is crucial. The AD refers to the specific subset of the overall data space where a model can make reliable predictions through interpolation. When the model encounters data points beyond this designated domain, those predictions should be flagged as unreliable because of their extrapolation-based nature, which inherently carries more uncertainty than interpolation .

In the present study, the leverage method was employed to assess the prediction reliability. This was done to empower users to apply the models with greater confidence to external datasets and real-world scenarios while having, at the same, time a clear understanding of their optimal operating parameters. The leverage method measures the similarity between the query samples and the training set using the leverage values, h, which are essentially the diagonal elements of the Hat matrix (Equation 12). These values quantify the distance of each query sample from the centroid of the training set , taking into account the descriptor values employed in model development. The AD boundaries are determined by a predetermined threshold leverage value h* (Equation 13). A test prediction is deemed reliable if its corresponding leverage value falls below this threshold (h < h*).

(12)

(13)

where X is the table containing the descriptor matrix, p is the number of descriptors used in the model , and N is the number of samples in the training dataset.

Results and Discussion

In the next paragraphs the five developed individual models are briefly described. To ensure fair comparison, all models were trained and tested on identical subsets of the data. More information can be found in the respective QMRF reports, provided as Supporting Information Files 2–5 to this publication.

kNN/read-across model Data preprocessing

Initially, the z-score normalisation method was employed to standardise the descriptors in the training set (53 NMs), ensuring their equal contribution to the model. Each descriptor was adjusted to have a mean of zero and a standard deviation of one . Next, the identical normalisation parameters were applied to the descriptors in the test set (18 NMs). To identify the most relevant parameters, eliminate noise, and avoid overfitting, the BestFirst method with the CfsSubset evaluator were employed . Four descriptors were selected to use in the model (see below Table 15), that is, the NMs’ coating, their equivalent sphere diameter, their hydrodynamic diameter, and the number of oxygen atoms present in the core’s chemical formula. To enhance the model’s performance and interpretability, the Hamaker constant of the NMs calculated in water and the shape group were added to the subset of the selected descriptors. All analysis steps were performed in Isalos Analytics Platform .

Model development and validation

The kNN algorithm with a value of k = 7 was selected to perform a read-across assessment of the dataset. Similarly to the preprocessing steps, modelling was implemented in Isalos Analytics Platform using the Enalos+ tools and especially the EnaloskNN function . This function identifies the neighbouring training samples for each test NM alongside the predicted values, facilitating a deeper understanding of the results in terms of NM grouping and providing insights into the overall samples space. The model was validated following the OECD principles to ensure robust and reliable predictive modelling. The key statistical metrics of internal (training set) and external (test set) validation are presented in Table 3. The Y-randomization test was also performed ten times, giving RMSE values on the test set in the range of 23.1–43.4, confirming that the predictions were not a coincidental outcome. In Table 4 the results of the Golbraikh and Tropsha test for the kNN/read-cross model are presented.

Table 3: Internal (training set) and external (test set) validation statistics of the kNN/read-across model.

Training set Test set MAE 0.29 7.81 RMSE 0.54 9.71 R2 0.99 0.88 [Graphic 23]