A puzzle of proportions: Two popular Bayesian tests can yield dramatically different conclusions

1 INTRODUCTION

Researchers frequently wish to test whether two populations differ. In medicine and public health, for example, the resulting statistical analysis frequently concerns testing whether or not two proportions differ. Examples include testing whether a vaccine decreases the number of infections compared to a control,1 whether sexual minorities are more prone to suicide compared to their heterosexual counterparts,2 or whether tightly or less-tightly controlling hypertension leads to fewer miscarriages in pregnant women.3

In these applications, it is crucial to be able to discriminate between evidence of absence and absence of evidence. For example, Magee et al conducted a trial to investigate the effect of a tight (target diastolic blood pressure, 85 mm Hg) or a less-tight (target diastolic blood pressure, 100 mm Hg) control of hypertension in pregnant woman on, among other outcomes, pregnancy loss.3 They found no significant difference between the two conditions, with 15 out of 493 women in the less-tight control condition and 13 out of 488 in the tight control condition having lost their child, yielding an estimated odds ratio of 1.14 (95% CI: [0.53, 2.45]). How confident are we that there is indeed no difference between the two conditions rather than the data being inconclusive? Bayesian statistics provides a principled way of quantifying evidence via the Bayes factor,4-6 thus providing one avenue to discriminate between evidence of absence and absence of evidence.7

The Bayes factor quantifies how well one hypothesis predicts the data compared to another. Using the test of equality between two proportions as an example, let ?=(y1,y2,n1,n2) denote the combined data from the two groups. We have:

Y1∼Binomial(n1,θ1),Y2∼Binomial(n2,θ2),

where the sample sizes (n1,n2) are assumed fixed and under ℋ0 we have that θ≡θ1=θ2 while under ℋ1 we have that θ1≠θ2. By quantifying relative predictive performance, the Bayes factor tells us how we should update our prior beliefs about ℋ0 relative to ℋ1 after observing the data:8

p(ℋ0|?)p(ℋ1|?)⏟Posterior odds=p(?|ℋ0)p(?|ℋ1)⏟Bayes factor×p(ℋ0)p(ℋ1)⏟Prior odds.

A Bayes factor of, say, 15 means that the data are 15 times more likely under one hypothesis compared to the other. While there exist verbal guidelines that may aid in the interpretation of the Bayes factor (eg, Bayes factors in the range from 1 to 3 constitute weak evidence, those in the range from 3 to 10 constitute moderate evidence, and values larger than 10 constitute strong evidence)9-11 the Bayes factor should be understood as a continuous measure of evidence.6 While the Bayes factor does not depend on the prior probability of hypotheses, it does depend crucially on the prior over parameters in the models instantiating ℋ0 and ℋ1, which becomes apparent when expanding:

p(?|ℋ0)p(?|ℋ1)=∫θp(?|θ,ℋ0)π0(θ|ℋ0)dθ∫θ1∫θ2p(?|θ1,θ2,ℋ1)π1(θ1,θ2|ℋ1)dθ1dθ2,

where π0 and π1 indicate the respective prior distributions.

There exist two main Bayes factor approaches for testing the equality of two proportions. The more popular one comes from the analysis of contingency tables, and assigns independent beta distributions directly to (θ1,θ2).12, 13 We call this approach the “Independent Beta” (IB) approach. The second approach is less widely used, and assigns a prior to the average log odds β and the log odds ratio ψ.14, 15 We call this approach the “logit transformation” (LT) approach. In this paper, we show that these two approaches can yield markedly different results. This is especially the case when the observed proportions are at the extremes (ie, very low or very high), as is the case for a large number of applications including the three examples mentioned above. Consider the study by Magee et al again.3 The IB approach yields a Bayes factor of 12.30 in favor of ℋ0, while the LT approach yields a mere 1.17. In other words, under the IB approach the data are about 12 times more likely under the hypothesis that tightly or less-tightly controlling hypertension have the same effect on miscarriages compared to a hypothesis assuming a difference. Under the LT approach, however, the data are about equally likely under both hypotheses, which constitutes equivocal evidence. The answer to the question “is observing two equally small proportions strong or weak evidence for the null hypothesis?” depends, therefore, crucially—and nontrivially—on the prior setup.

This article is structured as follows. In Section 2, we outline these two ways of testing the equality of two proportions in more detail. In Section 3, we highlight the occasionally stark differences of the two approaches by reanalyzing 39 statistical tests reported in the New England Journal of Medicine and explain why these differences occur. In Section 4, we end by reviewing the implications of the prior setup and what users of Bayes factors should be mindful of when testing the equality of two population parameters. We argue that the LT approach should become the default when testing the equality of two proportions because it (a) induces prior dependence between proportions which almost always are, in fact, dependent, and (b) yields a sensibly milder assessment of the evidence compared to the IB approach when the observations are at the extremes.

2 TWO WAYS OF TESTING THE EQUALITY OF TWO PROPORTIONS

In this section, we outline two ways of testing the equality of two proportions. In Section 2.1 we describe the Independent Beta approach, and in Section 2.2 we describe the logit transformation approach.

2.1 The independent beta (IB) approach In order to nest the null hypothesis under the alternative hypothesis, we introduce the difference parameter η=θ2−θ1 and the grand mean ζ=12θ1+θ2. Using this parameterization, we have that:

θ1=ζ−η2,θ2=ζ+η2.

The hypotheses are then specified as:

ℋ0:η=0,ℋ1:η≠0.

Under this approach, we assign independent Beta(a,a) priors to θ1 and θ2. Figure 1 visualizes the joint prior distribution (top) under ℋ1 for a=1 (left) and a=2 (right). Increasing values of a implies that the joint prior mass is more concentrated around (θ1,θ2)=(12,12). The bottom panels visualize the conditional prior distribution of θ2 given that we know that θ1=0.10. Knowing the value of θ1 does not change our prior about θ2, which follows from the assumption of prior independence.

Top: Joint prior distribution assigned to (θ1,θ2) under a=1 (left) and a=2 (right). Bottom: Conditional prior distribution of θ2 given that θ1=0.10 [Colour figure can be viewed at wileyonlinelibrary.com]

The IB Bayes factor is available in analytic form.9, 12, 16 In the literature on contingency tables, our setup corresponds to an independent multinomial sampling scheme where the row (or column) sums of the contingency table (here n1 and n2) are fixed; for an extension to other sampling schemes and more than two groups, see References 12 and 13. Gunel and Dickey suggest a=1 as a default value.12 Note that values a<1 lead to an undefined prior density for θ=0 and θ=1, which implies model selection inconsistency in case θ is indeed 1 or 0. Hence these values should be avoided on principle grounds. Because the beta prior is conjugate for the binomial likelihood, the posterior distributions of θ1 and θ2 are again (independent) beta distributions.

2.2 The logit transformation (LT) approach The test proposed by Kass and Vaidyanathan14 and implemented by Gronau et al15 does not assign a prior directly to (θ1,θ2), but applies a logit transformation and assigns priors to the transformed parameters (β, ψ). Specifically, we write:

logθ11−θ1=β−ψ2,logθ21−θ2=β+ψ2,

where β is a grand mean and ψ is the difference in log odds (ie, the log odds ratio):

β=12logθ11−θ1+logθ21−θ2,ψ=logθ21−θ2−logθ11−θ1.

While this is a more involved reparameterization than in the IB approach above, another way to formulate this setup is by writing:

θ1=eβ−ψ21+eβ−ψ2,θ2=eβ+ψ21+eβ+ψ2,

which readers familiar with logistic regression may recognize. Using this setup, we test the hypotheses:

ℋ0:ψ=0,ℋ1:ψ≠0.

In contrast to above, we now assign priors to β and ψ rather than to θ1 and θ2 directly. In particular, under both hypotheses we assume β∼?(0,σβ) with σβ=1. Under ℋ1 we assume ψ∼?(0,σψ).1 The top left panel in Figure 2 visualizes the implied joint prior distribution on (θ1,θ2) under ℋ1 for σψ=1. The prior mass is concentrated along the diagonal, which indicates that θ1 and θ2 are dependent. The bottom left panel illustrates this fact: if we know that θ1=0.10, then the prior on θ2 shifts toward this value. Setting σψ=2 removes the prior dependency, as the right column in Figure 2 shows. For values of σψ>2, θ1 and θ2 become anti-correlated and hence observing a small value of θ1 results in a prior that puts more mass on large values for θ2. Such an inverse relation is undesirable in almost all empirical applications, and so values σψ>2 are therefore to be avoided. Gronau et al15 developed software to compute the Bayes factor using this prior specification, first proposed by Kass and Vaidyanathan,14 suggesting σψ=1 as a default value.

Top: Joint prior assigned to (θ1,θ2) for σψ=1 (left) and σψ=2 (right). Bottom: Conditional prior distribution of θ2 given that θ1=0.10. In both cases we assume σβ=1 [Colour figure can be viewed at wileyonlinelibrary.com] 2.3 Comparison of priors

A direct comparison of the two prior specifications may be helpful to get further intuition for their differences. While the IB approach does not assign a prior distribution to (β,ψ) or (η,ζ) explicitly, assigning a prior to θ1 and θ2 induces a prior distribution on these quantities. Conversely, the LT approach assigns a prior to ψ and β and this induces a prior on (θ1,θ2) and (η,ζ). The induced prior distributions under both approaches are nonstandard (see Appendix A), but their densities can be calculated numerically.

The top left panel in Figure 3 shows the prior distribution assigned to η by the LT approach for σψ=1 (shaded blue) and σψ=2 (striped blue) and the IB approach for a=1 (shaded red) and a=2 (striped red) under ℋ1. Similarly, the top right panel shows the prior distribution assigned to ψ for the two approaches and prior parameter values. The (default) IB approach assigns comparatively more mass to large values of η and ψ, which in practice means that it expects larger differences between the sample proportions. The bottom panel shows the marginal priors for θ1 and θ2, where we find that the LT approach assigns comparatively less mass to extreme values. The LT approach cannot result in a uniform distribution on the proportions under ℋ0 because of the Gaussian prior on β. If it instead would assign a (standard) logistic prior to β (which has fatter tails), the prior on the proportions would be uniform; see a related discussion in Appendix B. In the next section, we discuss a somewhat surprising difference between these two tests.

Top: Prior distributions assigned to η (left) and ψ (right) under the LT (blue, vertical lines) and IB approach (red, horizontal lines), respectively. Bottom: Marginal prior distribution of θ1 and θ2 under the two approaches. The density that is filled and has the highest peak corresponds to σψ=1 [Colour figure can be viewed at wileyonlinelibrary.com] 3 PRACTICAL IMPLICATIONS OF THE PRIOR SETUP

To see the implications of the two different prior specifications in practice, in Section 3.1 we reanalyze 39 null results published in the New England Journal of Medicine, previously analyzed by Hoekstra et al17 using the IB approach. We then explain why these difference occur in Section 3.2. The data and code to reproduce all analyses and figures is available from https://github.com/fdabl/Proportion-Puzzle.

3.1 Reanalysis of New England Journal of Medicine studies

Hoekstra et al considered all 207 articles published in the New England Journal of Medicine in 2015.17 The abstract of 45 of these articles contained a claim about the absence or nonsignificance of an effect for a primary outcome measure, and 37 of those allowed for a comparison of proportions, reporting 43 null results in total. We focus on those results that can be reanalyzed using a test between two proportions, which results in a total of 39 tests from 32 articles. The top left panel in Figure 4 contrasts Bayes factors in favor of ℋ0 computed using the IB approach (rectangles) across a∈[1,5] with Bayes factors computed using the LT approach (circles) across σψ∈[1,2]. In virtually all cases and across specifications, the Bayes factor in favor of ℋ0 is higher under the IB approach, and this difference is frequently substantial.2 As the parameter a is increased under the IB approach, the expected difference between the two groups is smaller (see top left panel in Figure 3). Therefore, the predictions under ℋ1 become more similar to the predictions under ℋ0, and the Bayes factor decreases. Conversely, as σψ is increased under the LT approach, the expected difference between the two group increases, and the Bayes factor in favor of ℋ0 increases.

Top: Bayes factors using the IB (rectangles) and LT approach (circles) in favor of ℋ0 across studies reported in Hoekstra et al17 (left) or for simulated equal proportions with n=100 (right) for values a∈[1,5] and σψ∈[1,2] with σβ=1. Bottom: Joint prior distribution of (θ1,θ2) under the IB approach with a=1 (left) and under the LT approach with σψ=1 and σβ=1 (right). Black dots and blue rectangles indicate the maximum likelihood estimates of the proportions in the studies analyzed by Hoekstra et al. [Colour figure can be viewed at wileyonlinel

View original article

STATISTICS IN MEDICINE

Like

分享书签

0 0 0 0 0 0 0

More from this channel

A puzzle of proportions: Two popular Bayesian tests can yield dramatically different conclusions

留言 (0)