Systematic evaluation of high-throughput PBK modelling strategies for the prediction of intravenous and oral pharmacokinetics in humans

PK data extraction and simulation strategy

After the retrieval of PK data from the different databases and the literature (SI-1), we initially obtained a total of 2235 healthy human adult in vivo concentration–time profiles of 210 unique compounds (Fig. 1A). For all compounds in this dataset, we then collected the various input data we intended to evaluate for PBK model parameterisation, for instance, lipophilicity values or predicted and measured plasma clearances. This resulted in a complete dataset of 718 IV profiles (143 compounds) and 1402 PO profiles (169 compounds) for which all compounds also had the required input data. Notable exceptions to this were in vitro measured intrinsic hepatic clearance values (CLint), which were taken from httk and were available for only 97 compounds, and measured solubility values which were only available for 167 compounds. The 182 compounds in the final dataset were a diverse set of small molecules with a molecular weight of less than 900 Dalton, different ionisation states at physiological pH, and mostly belonged to ECCS class 2, suggesting that metabolism is driving their clearance (Fig. 2B).

Fig. 1figure 1

Summary statistics of the collected PK dataset and of various compound properties. A Overview of the number of concentration–time profiles and compounds during and at the end of the PK data retrieval process. B Physico-chemical and pharmacokinetic properties of compounds in the retrieved PK dataset. LogD, LogP, aqueous solubility and CACO2 permeability are given as the mean of values predicted in silico by the different prediction tools used in this study. Plasma clearance values are the median of in vivo values collected from the literature. Ionisation at physiological pH (7.4) was predicted using ChemAxon

Fig. 2figure 2

Conceptual overview of the simulation and analysis strategy. A For each compound various PBK models were generated using all combinations of available input parameter sources. Their performance was then compared against gathered concentration–time profiles. B The simulation and evaluation of PBK models built from the various parameterisation sources was split into three steps, each focusing on the evaluation of a separate level of ADME processes. C Model performance was first evaluated at the level of individual simulations as the difference between model simulation and observed data. Then, for every parameterisation strategy, the median error of all compounds was used as a measure of the performance of that parameterisation strategy. Finally, comparing all performances of all strategies using the same input property prediction tools allowed benchmarking of the various tools against each other

Evaluating the performance of all PBK models parameterised from all the different input data sources against a large number of in vivo PK data requires many model simulations and leads to long computation times. For this reason, we split the simulation and analysis of the HT-PBK model parameterisation strategies into three steps, to be able to systematically evaluate all relevant parameterisation decisions one-at-a-time while keeping the number of required model simulations manageable (here 15 + million).

Briefly summarised, the rationale for our simulation and analysis strategy was as follows (Fig. 2). In the first step, we investigated which of the physico-chemical parameter sources performed best for predicting the passive distribution of compounds within the body. We generated, simulated, and evaluated every PBK model parameterisation strategy for all compounds of which IV data were available. To initially limit the variability, we only used in vivo observed plasma clearance and in vitro measured Fu values as high-quality benchmark reference values in the first step. In the second step, we then used the same IV data, but now only using the best physico-chemical parameter predictions as determined in the first round, and then tested various Fu and clearance prediction tools to understand which of these would result in the best PK predictions. In the third step, we finally used the oral PK data for evaluation, along with the best physico-chemical, Fu and clearance prediction sources as determined in the previous steps. Then, we systematically varied the various solubility and intestinal permeability values to evaluate how to best predict oral absorption and to assess how well the best full high-throughput strategies would perform overall.

Step 1: evaluation of physico-chemical property predictions

In the first step, we systematically evaluated how to best set the physico-chemical PBK model parameters that determine the passive distribution of compounds within the body. In PK-Sim, these are primarily the lipophilicity of a compound, its pKa values, and the method used to predict the compound’s partitioning coefficients. The lipophilicity values of compounds were predicted with six different in silico tools: three LogD prediction tools (SimPlus, ADMETLab, Bayer), two LogP tools (OCHEM and VEGA), and one LogMA tool (Bayer). pKa values were predicted using ChemAxon and SimPlus, and for comparison we also tested not providing pKa values, effectively assuming that all compounds were neutral. The five tested partitioning methods available in PK-Sim were PK-Sim (Willmann et al. 2005), Schmitt (2008), Rodgers and Rowland (2006), Poulin and Theil (2002), and Berezhkovskiy (2004). To limit the variability in this first analysis, we used in vivo observed plasma clearance and experimentally measured Fu values as high-quality benchmark values for simulation, so that any remaining PBK model simulation inaccuracies would only be due to mispredictions of passive compound distribution alone. Then, we evaluated which physico-chemical prediction tools resulted in the best PBK model simulations by systematically testing all combinations of input parameter sources against our 718 collected concentration–time profiles after IV administration. For each simulation, we calculated Median Relative and Absolute Log2 Errors as measures of prediction bias (systematic error) and precision (random error), respectively.

Out of the tested parameters, we found the strongest factor determining PBK model accuracy was which lipophilicity values were used for PBK model building (Fig. 3A). Tools that predicted LogP performed overall worse than those predicting LogD or LogMA lipophilicity values. Likewise, the performances of tools predicting the same type of lipophilicity also differed. For example, LogD values predicted by the Bayer tool worked better for PBK model parameterisation than the ones coming from ADMETLab or SimPlus (ADMETPredictor). The higher errors of the LogP tool based predictions were correlated with a general bias for underprediction of the PK data. When investigating this effect on the individual compound level (Fig. 3E–J), it became apparent that for some compounds the LogP tools predicted very high lipophilicity values (> 5), which then led to a severe underprediction of those compounds’ plasma concentrations, while the same compounds’ PK was predicted reasonably well when using LogD or LogMA values for PBK model parameterisation.

Fig. 3figure 3

Comparison of predictive performances of different physico-chemical parameter sources (step 1). Combinations of all available PBK model parameterisation sources were evaluated against the collected IV dataset. Clearance and fraction unbound were parameterised using in vivo and in vitro benchmark reference values, respectively. The top row shows Median Absolute Log2 Errors (A) and Median Relative Log2 Errors (B) for different lipophilicity prediction methods. The middle row shows Median Absolute Log2 Errors (C) and Median Relative Log2 Errors (D) for different partitioning methods. The bottom rows (EJ) show the Relative Log2 Errors of predictions for every individual compound for the different lipophilicity prediction tools used. Dashed lines indicate tenfold errors

The results of the partitioning methods were less straightforward. We observed a stable hierarchy in the predictive performances of the different methods, with the Berezhkovskiy method performing best and the Schmitt method performing worst under most circumstances (Fig. 3A). However, this difference in performance was only observed clearly when using lipophilicity values from the less well-performing LogP prediction tools, whereas when using the better lipophilicity values (LogD and LogMA Bayer) the difference in prediction precision between the different partitioning methods was only marginal (Fig. 3C). We further investigated this at the individual compound level (SI-Fig. 1), and we observed that the methods of Poulin & Theil and of Berezhkovskiy were not generally more predictive for the majority of compounds. Rather, we found that Poulin & Theil and Berezhkovskiy were less negatively impacted by the very high lipophilicity values predicted by the LogP tools for some compounds (SI-Fig. 2). It was only their robustness to these high lipophilicity outliers, which decreased the observed performance of the other partitioning methods but not theirs, that made them appear to be superior overall (SI-Fig. 3).

For the provision of predicted pKa values, there was no strong trend observable, even though one may have expected that partitioning methods that use pKa values as input should perform better when those are provided. However, this was only consistently the case for Rodgers & Rowland partitioning, and only when it was used with LogP lipophilicity values (SI-Fig. 4). The other methods using pKa values, namely the methods of Schmitt, Poulin & Theil and Berezhkovskiy, performed sometimes better, sometimes worse, depending on which lipophilicity prediction tool was being used for simulation.

Finally, we evaluated two approaches to further improve lipophilicity predictions for PBK modelling. The first approach was that of using consensus values of the different prediction tools, e.g., for LogP, simply by taking the mean of the values predicted by the different tools. We did this with both the tools for LogP, as well as for LogD, respectively, and, interestingly, observed opposite effects (SI-Fig. 5). Averaging the LogP predictions indeed resulted in better predictivity of the PBK models than any of the individual LogP tools. For the LogD, however, averaging produced better results than the worst tool (SimPlus) but worse results than the better tools (Bayer, ADMETLab).

The second strategy we tested to improve lipophilicity predictions was to use regression equations that empirically relate LogP or LogD values to membrane affinity (LogMA). We obtained two equations for converting LogP values (Yun et al. 2014; Endo et al. 2011) and generated a comparable equation for LogD values based on the data presented in Loidl-Stahlhofen et al. (2001). But we found that only one of the three strategies, namely using the Yun et al. (2014) equation, consistently improved PK predictions, regardless of which LogP tool or partitioning method it was used with (SI-Fig. 6). Whereas the two other methods showed at best mixed results, or even worsened predictions in the case of our self-derived equation based on the Loidl-Stahlhofen et al. (2001) data. The improvement in PK predictions, achieved by converting LogP to LogMA values using the equation from Yun et al. (2014), suggests that the tested LogP prediction tools may not be inherently less accurate than the LogD tools. Rather, they may just provide a type of lipophilicity value that is less suitable for the PBK modelling of certain compound classes.

Given these results, we concluded that there was no obviously superior partitioning method, nor that providing the pKa values was consistently providing better predictions. However, the lipophilicity values provided by the Bayer tools (LogD and LogMA) did appear to give superior predictions compared to the other lipophilicity prediction sources. For this reason, we proceeded with the mean of those tools as the best lipophilicity prediction, as well as all partitioning and pKa prediction methods into the next round for the evaluation of clearance and Fu prediction tools (step 2).

Step 2: evaluation of clearance and fraction unbound predictions

After evaluating the physico-chemical properties determining passive distribution, we continued with the tools predicting key parameters depending on organism biology, specifically the Fu and clearance of compounds. We used the same IV PK data for validation as in the first step, but only using the best physico-chemical predictions as determined previously, while this time varying the Fu and clearance predictions.

For the prediction of the Fu, we had obtained values from seven in silico tools, as well as in vitro measured benchmark values. As expected, we found that the importance of Fu predictions depended on the clearance prediction approach used. When using in vivo plasma clearance benchmark values for model parameterisation, only marginal differences between the performances of the different Fu prediction tools were observed (SI-Fig. 7). But when predicting in vivo clearance using in vitro measured hepatic CLint values, we observed larger differences between the different Fu prediction tools (Fig. 4A, B). Our experimentally determined Fu values yielded better PK predictions than any in silico tool, which confirmed the validity of our benchmark reference values. However, the differences between the prediction qualities were overall relatively small. All Fu prediction tools led to Median Absolute Errors within the two- to threefold range when using in vitro CLint values and there was no obvious systematic bias for under- or overprediction for any of the Fu prediction tools.

Fig. 4figure 4

Comparison of predictive performances of different fraction unbound and clearance prediction sources (step 2). Combinations of all available parameterisation sources were evaluated against the collected IV dataset. Results shown were generated using benchmark values for parameterisation of the other parameters, i.e., in vitro CLint values from httk when comparing Fu predicting sources, and in vitro measured Fu values when comparing clearance predicting sources, as well as the mean of the two previously determined best lipophilicity prediction tools (LogD and LogMA Bayer) as lipophilicity values. The top row shows the Median Absolute Log2 Errors (A) and the Median Relative Log2 Errors (B) for different fraction unbound prediction methods. The bottom row shows the Median Absolute Log2 Errors (C) and the Median Relative Log2 Errors (D) for different clearance prediction methods. CL refers to plasma clearance values (measured or predicted), and CLint signifies hepatocyte intrinsic clearance (measured or predicted). In vitro measured CLint values from httk were only available for a subset of compounds (97 out of 143)

For the prediction of compound clearance, three in silico tools directly predicting plasma clearance, as well as our own previously used in vivo plasma clearance benchmark values, were available. Further, we had retrieved in vitro measured hepatic intrinsic clearance (CLint) values from httk, as well as in silico predictions of CLint values from OPERA, SimPlus and ADMETAI.

Before comparing the performances of the different clearance prediction strategies, we first evaluated whether activating passive renal excretion would improve or worsen PBK model simulations. Plasma clearance values already represent the total effect of all systemic clearance processes, so that adding passive renal clearance on top of them should theoretically lead to less accurate results. Whereas PBK models are expected to yield better PK predictions when passive renal excretion is incorporated if their in vivo clearance prediction is scaled up from hepatocyte-derived CLint values.

Overall, our results were consistent with these expectations. When adding passive renal clearance on top of the in vivo observed plasma clearance, prediction quality became worse and shifted from no bias to underprediction, whereas in vitro hepatocyte-based clearances improved from stronger to weaker overprediction of the PK data (SI-Fig. 8). For in silico predicted clearances, the situation was less straightforward. For instance, in silico CLint values predicted by SimPlus and ADMETAI already led to underpredictions of PK, which was then further exacerbated by additionally adding passive renal clearance. However, given the theoretical considerations, we continued our simulations by adding passive renal clearance on top of hepatocyte-scaled CLint, but not plasma clearance values.

Similar to the Fu, the in vivo observed plasma clearance benchmark values were the best input source for PBK model parameterisation (Fig. 4C). All clearance prediction strategies yielded profoundly worse results than the benchmark in vivo clearance-based strategy, and almost all of them gave Median Absolute Errors worse than the twofold range. However, the differences between the clearance prediction tools were much larger than those between the Fu prediction tools. Out of the in silico predicted plasma clearance tools we found that ADMETLab gave the best predictions, followed by pkCSM and ScitoVation. While ADMETLab and pkCSM plasma clearance predictions led to a slight overprediction of the PK data, ScitoVation plasma clearance values led to a severe underprediction. We further confirmed these findings by directly comparing our in vivo plasma clearance values to the in silico predicted values of the different tools (SI-Fig. 9), which showed that ADMETLab’s plasma clearance values correlated best with our in vivo measured benchmark values.

In vitro hepatocyte CLint values from httk were the second-best clearance prediction source, after our in vivo plasma clearance benchmark values. Similar to what was observed for the in silico tools predicting plasma clearance values, the values from in silico CLint prediction tools also resulted in substantially worse PK predictions than the in vitro benchmark values. The best-performing CLint prediction tool was OPERA, followed by SimPlus and ADMETAI. However, we noted that for some compounds OPERA provided CLint values identical to the in vitro CLint values retrieved from httk (SI-Fig. 10). This suggested that those values were not true in silico predictions, which implies that the OPERA predictions may not be directly comparable to the other tools. Overall, when comparing in silico tools predicting plasma clearance values to the tools predicting hepatocyte CLint, most plasma clearance tools resulted in better PK predictions than most hepatocyte CLint prediction tools.

Step 3: evaluation of solubility and intestinal permeability

In the third evaluation step, we investigated how to best predict parameters required for simulating oral administrations. In PK-Sim, these are primarily the solubility and intestinal permeability of a compound. We had obtained experimentally measured benchmark values of aqueous solubility, as well as predictions of aqueous solubility from four in silico tools (OPERA, ADMETLab, ProtoQSAR, SimPlus), and of Fasted State Simulated Intestinal Fluid (FaSSIF) and Fed State Simulated Intestinal Fluid (FeSSIF) solubility from SimPlus. For the intestinal permeability, no benchmark reference values were obtained. Instead, CACO2 permeability predictions from three in silico tools were used (OPERA, ADMETLab, ProtoQSAR), as well as MDCK permeability predictions from ADMETLab and SimPlus. Finally, we also obtained intestinal permeability predictions using the PK-Sim internal prediction equation, which is based on compounds’ molecular weight and lipophilicity (LogMA Bayer).

For the evaluation, we initially only used data from PK studies in which the administered formulation implied that the compound was already dissolved at administration (e.g., labelled as “solution” or “suspension”) and not in a solid state (e.g., “tablet” or “capsule”), since this additionally requires knowledge about the dissolution times of these formulations. We tested the mentioned prediction tools against the 286 concentration–time profiles (94 compounds) from those liquid formulation studies (Fig. 5). However, no substantial difference was observed between the different parameterisation sources for either property. In the case of solubility, all in silico tools gave results similar to each other, and also comparable to the results of our experimentally measured benchmark values. Likewise, all intestinal permeability prediction tools gave comparable results.

Fig. 5figure 5

Comparison of predictive performances of different solubility and intestinal permeability prediction sources (step 3). Combinations of all available parameterisation sources were evaluated against the collected PO dataset (dissolved formulations). Results shown were generated using benchmark values for parameterisation of other parameters, i.e., in vivo plasma clearances, in vitro fraction unbound values and the mean of the two previously determined best lipophilicity prediction tools (LogD and LogMA Bayer) as lipophilicity values. The top row shows the Median Absolute Log2 Errors (A) and the Median Relative Log2 Errors (B) for different solubility prediction methods. The bottom row shows the Median Absolute Log2 Errors (C) and the Median Relative Log2 Errors (D) for different intestinal permeability prediction methods. For comparison of solubility predictions, the intestinal permeability values used were the PK-Sim internal equation values (PK-Sim eq.). For comparison of intestinal permeability sources, the solubility values used were SimPlus FaSSIF values. Results for the PK-Sim internal equation were generated with the Bayer LogMA predictions. Intestinal permeability predictions are either CACO2 or MDCK permeability predictions

Further, we observed a general trend for overprediction of the velocity of oral absorption which resulted in a consistently strong underprediction of Tmax values, and a slight tendency for overprediction of Cmax values (SI-Fig. 11). We hypothesised that this might be because the in silico tools do not predict the PK-Sim specific intestinal permeability directly but instead were trained to predict in vitro measured CACO2 or MDCK permeabilities. However, when using such in vitro measured permeabilities, the standard procedure would be to scale these values, for example, using reference compounds, to the PK-Sim intestinal permeability parameter. Only when no measurements for reference compounds exist would one use the in vitro measured permeability values directly without scaling.

To take this into account, we extracted fitted PK-Sim intestinal permeability values of 56 compounds from Willmann et al. (2004) and then, for every in silico tool, determined a scaling factor based on the relationship between the values predicted by every tool and the assumed to be optimal values. While we found that there was a clear trend for the CACO2 values to be larger than the optimal PK-Sim intestinal permeability values (SI-Fig. 12), incorporating this scaling did not substantially improve PK predictions overall (SI-Fig. 13). Even though, it did reduce the strength of the bias in the underprediction of Tmax values.

Finally, we evaluated whether our conclusions based on the 94 compounds from liquid formulation studies would also hold true for the compounds of which we only had data from solid formulation studies. The simulation of these formulations required at least one additional parameter to describe the dissolution velocity of the solid formulations, which in reality will vary between different formulations. To at least determine which values might be appropriate average values, we tested different Lint80 dissolution times (10–30 min for capsules, 15–60 min for tablets) and then compared which average dissolution time would yield errors similar to what we had observed for the liquid formulations. Based on this we decided to use 25 min for formulations labelled “capsules” and 40 min for “tablets”, which extended the oral dataset for evaluation to 1200 PO concentration–time profiles (161 compounds). Using this larger dataset, all previously outlined conclusions were confirmed.

Predictive performances of full HT-PBK strategies

After evaluating step-by-step how to best predict every compound property required for HT-PBK modelling, we eventually assessed how well different types of HT-PBK strategies would predict the collected PK data overall. We identified the best strategies out of three classes. (1) As a benchmark comparison, we determined the performance of the best strategy overall, using in vivo and in vitro determined benchmark values of plasma clearance and Fu. (2) Additionally, the best fully in silico-based strategy was identified, for which we also considered property predictions coming from proprietary tools. (3) And finally, the best in silico strategy based exclusively on freely available tools was determined. The respective parameterisation strategies and their performances are presented in Table 1. Unsurprisingly, we found the strategy using benchmark reference values to be the most predictive. However, even fully in silico-based strategies yielded acceptable predictivity with 87%, or 89%, of Cmax values being predicted within tenfold when using proprietary, or freely available prediction tools, respectively. Even more importantly, due to overestimation of the velocity of oral absorption in all strategies, the Cmax mispredictions outside the tenfold range were mostly over- not underpredictions and therefore would lead to conservative, health-protective risk assessment conclusions. The performance of the best in silico-based HT-PBK approach is presented in Fig. 6 and that of the other strategies is shown in SI-Fig. 14.

Table 1 Overview of different HT-PBK modelling strategies and their predictive performancesFig. 6figure 6

Predictive performance of the best fully in silico-based HT-PBK modelling strategy (proprietary). HT-PBK models were generated following the approach outlined in Table 1, then model simulations were compared against the collected PO dataset of 161 compounds (dissolved and solid formulations). A Predicted against observed Cmax values of all concentration–time profiles in the PO dataset. B Predicted against observed AUC values. C Predicted against observed Tmax values. Mind that AC show individual concentration–time profiles with some compounds only represented by a single and others by multiple PK profiles. D The median Log2 Cmax and AUC predicted/observed values of each compound in the PO dataset. Dashed lines indicate tenfold errors. EG Representative concentration–time profile predictions, and corresponding in vivo PK data, of three compounds embodying different levels of prediction quality. Blue colour marks the same representative compounds in D (colour figure online)

留言 (0)

沒有登入
gif