Group-by-Treatment Interaction Effects in Comparative Bioavailability Studies

The simulations and evaluation of datasets in the meta-study were performed in R 4.3.1 (11).

Models

The following linear models of loge-transformed pharmacokinetic (PK) responses with all fixed effects were used:

(1)

group, sequence, treatment, subject(group × sequence), period(group), group × sequence, group × treatment

(2)

group, sequence, treatment, subject(group × sequence), period(group), group × sequence

(3)

sequence, subject(sequence), period, treatment

First public information about the use of Model 1 to test for a group-by-treatment interaction for the 2-treatment 2-sequence 2-period crossover design (2 × 2 × 2) by the US Food and Drug Administration (FDA) became available in 1999 (12), where subject(group × sequence) was considered a random effect. It must be mentioned that due to the group × treatment term, the main effect of treatment cannot be interpreted and, hence, must not be used to assess bioequivalence. The FDA suggested testing the group-by-treatment interaction at the 0.1 level (12, 13). If significant, data of groups must not be pooled, and bioequivalence can be demonstrated in one of the groups by Model 3, provided that the group meets the minimum requirements for a complete BE study that might also lead to the paradoxical situation that BE is demonstrated in a small group but fails in larger ones. If not significant, pooled data can be analyzed by Model 2. More details were given by the FDA later (14, 15), but without specifying a level of the test.

Model 2 takes the multi-group nature of the study into account and provides an unbiased estimate of the treatment effect. In the Eurasian Economic Union, Model 2 is mandatory, unless a justification to use Model 3 is stated in the protocol and discussed with the competent authority (16). Health Canada and the FDA recommend mixed-effects models, where subject-related effects are random and all others are fixed (2, 12, 14). Model 3 is the standard model for bioequivalence (e.g., 4, 5) with all effects fixed (analysis of variance, ANOVA).

In Model 2, the residual degrees of freedom (df) is \(\sum _-2-(_-1)\), where ni is the number of subjects in sequence i, and nG is the number of groups, and in Model 3 \(df=\sum _-2\). In both models, the back-transformed (1-2α) confidence interval (CI) is calculated as

$$}=100}\left(\overline} }__}}}-\overline} }__}}}\mp _\sqrt^\frac_}}\right),$$

where \(\overline} }__}}}\) and \(\overline} }__}}}\) are the means of the loge-transformed responses of the test and reference treatments, t is the t-value for df degrees of freedom at level α (commonly 0.05), m is the design constant (e.g., 1/2 in a 2 × 2 × 2 crossover design, 3/8 in a two-sequence three-period full replicate design, 1/4 in a two-sequence four-period full replicate design, 1/6 in a three-sequence three-period partial replicate design), MSE is the residual mean squares error, s is the number of sequences, and ni is the number of subjects in sequence i.

It must be mentioned that the MSE is generally slightly different in Models 2 and 3, whereas the point estimate (PE) is identical if sequences are balanced and group sizes are identical, but different in the case of imbalanced sequences and unequal group sizes. Due to the fewer degrees of freedom, the CI of Model 2 is consistently wider than that of Model 3.

It should be mentioned that in comparative BA studies, subjects are uniquely coded (17, 18). Thus, sequence and related nested effects — as recommended in all guidelines — lead to over-specified models and can be removed entirely, without affecting the estimated treatment effect and its associated MSE.

Simulation Scenarios

Monte-Carlo simulations were performed based on the fact that the mean μ follows a lognormal distribution and the variance s2 follows a χ2-distribution with n–2 degrees of freedom (19). We simulated 100,000 studies in each scenario using the pseudo-random number generator Mersenne-Twister (20) with a fixed seed of 123456 to support reproducibility and assessed them for the group-by-treatment interaction. In scenarios 1–12, we simulated 2 × 2 × 2 designs with two groups. In scenarios 1–10, we simulated a sample size of 48 subjects to achieve ≥ 90% power for a geometric mean ratio (GMR) = 1 and CVw = 33.5%. This sample size was selected to align closely with the median sample size 47 in the meta-study (see below). In scenarios 11 and 12, we simulated a sample size of 80 subjects to achieve ≥ 80% power for GMR = 0.90. To simulate unequal variances of groups, variance ratios of 0.667 and 1.5 were explored.

The level of the test of the group-by-treatment interaction was set to 0.05 (21). If no true group-by-treatment interaction was simulated, the fraction of studies with p(G × T) ≤ 0.05 represents empirical α, whereas if a true group-by-treatment interaction was simulated, it represents empirical power. The p-values of the group-by-treatment interaction tests are expected to follow a standard uniform distribution with ∈  and were assessed by the Kolmogorov–Smirnov test. Supplementary graphs illustrating the distribution of these p-values for each scenario are included to complement the Kolmogorov–Smirnov test findings, and the R-script to reproduce the simulations is provided in the Online Resource.

Table I presents a summary of simulation scenarios, categorizing them based on multiple parameters such as group sizes (n), whether the data exhibit equal or unequal variances of groups, and the corresponding CV, the GMR for each group involved in the scenarios, and indicating the presence or absence of true group-by-treatment interaction. Below is a detailed breakdown of these scenarios:

(1)

Two groups of 24 subjects each, equal variances of groups, GMR = 1 in both groups, no group-by-treatment interaction

(2)

Two groups of 24 subjects each, unequal variances of groups (variance-ratio 0.667), GMR = 1 in both groups, no group-by-treatment interaction

(3)

Two groups of 24 subjects each, unequal variances of groups (variance-ratio 1.5), GMR = 1 in both groups, no group-by-treatment interaction

(4)

n1 = 38, n2 = 10,equal variances of groups, GMR = 1 in both groups, no group-by-treatment interaction

(5)

Two groups of 24 subjects each, equal variances of groups, GMR = 0.95 in the first group, GMR = 1.0526 in the second group, true group-by-treatment interaction; pooled GMR = 1

(6)

Two groups of 24 subjects each, unequal variances of groups (variance-ratio 0.667), GMR = 0.95 in the first group, GMR = 1.0526 in the second group, true group-by-treatment interaction; pooled GMR = 1

(7)

Two groups of 24 subjects each, unequal variances of groups (variance-ratio 1.5), GMR = 0.95 in the first group, GMR = 1.0526 in the second group, true group-by-treatment interaction; pooled GMR = 1

(8)

n1 = 38, n2 = 10, equal variances of groups, GMR = 0.95 in the first group, GMR = 1.0526 in the second group, true group-by-treatment interaction; weighted GMR = 1

(9)

n1 = 38, n2 = 10, unequal variances of groups (variance-ratio 0.667), GMR = 0.95 in the first group, GMR = 1.0526 in the second group, true group-by-treatment interaction; weighted GMR = 1

(10)

n1 = 38, n2 = 10, unequal variances of groups (variance-ratio 1.5), GMR = 0.95 in the first group, GMR = 1.0526 in the second group, true group-by-treatment interaction; weighted GMR = 1

(11)

n1 = n2 = 40, equal variances of groups (CVw = 30%), GMR = 0.90 in both groups, no group-by-treatment interaction

(12)

n1 = 64, n2 = 16, unequal variances of groups (variance-ratio 1.5), GMR = 0.8290 in the first group, GMR = 1.2500 in the second group, true group-by-treatment interaction; weighted GMR = 0.9000

Table I Simulation ScenariosMeta-study

The meta-study included a total of 328 datasets of AUC and 331 of Cmax from 249 comparative BA studies (BE, food effect, drug-drug interaction, dose-proportionality), 157 analytes; 242 2 × 2 × 2 designs, 33 two-sequence four-period full replicate designs, three partial replicate design, as well as 46 incomplete block designs extracted from six-sequence three-period and four-sequence four-period Williams’ designs. The studies consisted of two to seven groups, with a median sample size of 47 subjects (15–176) and a median interval separating groups of six days (1 to 62 days). It should be noted that the extreme interval of 2 months in one study was due to COVID-19 restrictions. The next largest interval was 18 days. In 76.3% of the studies, the interval was 1 week or less; in 30.8%, it was only 1 or 2 days. There are more datasets than studies because some contain more than one analyte (fixed-dose combinations or parent and metabolite). The datasets were assessed by all models. Since in some of the datasets bioequivalence of Cmax was assessed by reference-scaling or with wider fixed limits, only AUC targeting BE with conventional limits of 80–125% was assessed by a recently proposed method (22), where

a “concordant quantitative interaction” was defined as when the treatment effect is overall equivalent as well as in all groups but differs in magnitude,

a “concordant qualitative interaction” was defined as when the treatment effect is overall and in at least one group equivalent, in at least one group not equivalent, and the treatment effects in all groups are in the same direction, and

a “discordant qualitative interaction” was defined as when the overall treatment effect is equivalent, the treatment effect in some groups is not equivalent, and the treatment effect in some groups can be in opposite directions.

We restricted the method to two groups, because more would result in a multidimensional problem. Of note, a manipulation (i.e., an undocumented interim analysis after the first group and switching Test (T) with reference (R) in the second) would be only possible if groups are separated by a long interval. Such suspected manipulation could be easily detected by plotting T/R-ratios against subject ID. Details of the datasets are given in the Online Resource.

留言 (0)

沒有登入
gif