An illustrated guide to context effects

Across the many disciplines that study decision making and choice, the term context effect broadly refers to how preferences between alternatives can systematically change when (typically non-preferred) alternatives are added to the choice set under consideration. Such behaviour can call into question standard models of choice and suggest new directions for theoretic development. This article focuses on three well studied context effects relevant to the study of stochastic discrete choice: the similarity, compromise, and asymmetric dominance effects.

There is a vast empirical literature documenting various combinations of context effects across many different choice settings, including: consumer choice (e.g., Berkowitsch et al., 2014, Doyle et al., 1999), policy decisions (Herne, 1997), intertemporal choice (Gluth, Hotaling, & Rieskamp, 2017), and perceptual stimuli (Trueblood, Brown, Heathcote, & Busemeyer, 2013). Context effects have also been observed in the animal kingdom. For example, researchers have observed monkeys (Parrish, Evans, & Beran, 2015), felines (Scarpi, 2011), and insects (Latty & Trueblood, 2020) exhibiting phantom decoy effects (where the decoy alternative is not available for selection).

Despite this interest, the empirical literature on context effects is largely disconnected and somewhat contradictory, with some context effects failing to be observed, or, in some cases, reversing their direction altogether (e.g., Spektor, Kellen, & Hotaling, 2018). Spektor, Bhatia, and Gluth (2021) provide a comprehensive overview and synthesis of this contradictory empirical literature. To briefly summarize, they argue that whether or not a particular context effect manifests may depend upon three broad categories referring to the nature of the choice environment.

1.

The spatial arrangement of the choice stimuli. As one example, Cataldo and Cohen (2018) demonstrated that the robustness and direction of some context effects depend on whether the choice stimuli is presented in tabular formats that encourage either by-alternative or by-attribute comparisons.

2.

Whether the stimuli are abstract or concrete in nature. There is good evidence that context effects can manifest when the choice alternatives are defined via concrete, clearly understood and easily comparable attributes (e.g., numerical monetary values). However, the evidence for context effects becomes murkier when the choice alternatives become more abstract in nature (e.g., rectangles of varying sizes; Spektor et al., 2018).

3.

How much time is available for individuals to make choices and how that time is controlled. Generally speaking, context effects are more robust when decision makers have more time to deliberate (Spektor et al., 2021). Cataldo and Cohen (2021) provide a comprehensive analysis of time constraints on context effects (also see Molloy, Galdo, Bahg, Liu, & Turner, 2019).

Given the many variables that impact the presence (or absences) of context effects, Spektor et al. (2021) conclude that a deeper understanding of how choice alternatives are cognitively represented is an important direction for future theory development.

Trueblood (2022) presents a theoretic synthesis of major competing theories of multi-alternative, multi-attribute choice that can account for various combinations of context effects. By focusing on dynamic theories of choice, Trueblood concludes that some of the disparate empirical observations in the study of context effects can be explained by changes in attention and attentional processes. Relatedly, there is growing work examining context effects from a neural perspective, focusing on value-based cognitive systems (Busemeyer et al., 2019, Gluth et al., 2020, Gluth et al., 2018).

In addition to the issues raised above, we argue that the literature on context effects is further complicated by inconsistencies in how context effects are defined, tested, and ultimately related to common decision models, such as random utility. To this end, we develop, under a universal notation, all relevant context effect definitions and known relations to a set of fundamental choice properties, which includes regularity, the common-ratio rule, and simple scalability. This section of the article can be considered as a precise tutorial review for readers interested in studying context effects. We also provide new theoretic results on how these context effects relate to one another and fundamental choice properties. We also explore how context effects at the individual level can sometimes aggregate to context effects at the population level. While we are certainly not the first to examine individual-population relationships among context effects, see Liew, Howe, and Little (2016) and Katsimpokis, Fontanesi, and Rieskamp (2022), our results bring additional precision about which definitions of various context effects will aggregate. Altogether, we provide a comprehensive guide for investigating context effects, one that may help bring further clarity to the study of context effects.

Similarity Effect. The similarity effect is commonly attributed to Tversky (1972a), who describes a “similarity hypothesis” relating to a choice environment in which choice objects x and y are similar and object z is dissimilar to both x and y. The hypothesis is that the addition of y to choice set will reduce the share of x of the total probability of choosing x or z and symmetrically, that the addition of x to will reduce the share of y of the total probability of choosing y or z. The same paper introduces the Elimination by Aspects (EBA) class of models, to accommodate the similarity effect.

Tversky (1972a) and Tversky (1972b) motivate the similarity hypothesis by providing several theoretical counterexamples to simple scalability, and by extension, to the constant ratio rule (we provide definitions of these properties in the next section).  Tversky (1972a) (page 292) claims that the similarity hypothesis is “incorporated into the [EBA] model”. The two papers make it clear that the similarity hypothesis is a prediction about individual and not population choice. The evidence of this is three-fold: first, the theoretical counterexamples in these papers pertain to individual choice; second, EBA models are described as a random process giving rise to individual choice probabilities; third, the empirical results in Tversky (1972a) are based on observations of the repeated choices of individuals.

Compromise Effect. Simonson (1989) and Tversky and Simonson (1993) each introduce a version of the compromise effect; both effects pertain to a situation where an object y is “between” two other objects, x and z. Simonson (1989) describes a strong version of the effect, where adding x to the choice set increases the probability of choosing y, and adding z to the choice set also increases the probability of choosing y. Tversky and Simonson (1993) describe a weak version, where adding x to increases the relative probability of y over z and where adding z to the choice set increases the relative probability of y over x. Tversky and Simonson (1993) interpret probabilities exclusively as population probabilities. However, the theoretical result they provide to argue that the compromise effect gives indirect evidence against random preference applies equally well to individual and population random preferences: they show that a condition on random preferences (the ranking condition, defined below) rules out the compromise effect. The plausibility of the ranking condition is subjective, and one may regard its plausibility differently for individual and population random preferences, but the theoretical result applies regardless.

Asymmetric Dominance Effect. Huber, Payne, and Puto (1982) introduce the asymmetric dominance effect. They consider the addition of a third object, called a decoy, to a binary choice set in which neither object “dominates” the other. The decoy is dominated by one object in the pair but not the other. They claim that adding the decoy “can [emphasis added] increase the probability of choosing the item that dominates it”. When it does, the asymmetric dominance effect is said to occur. Their discussion of choice probabilities does not distinguish between individual and population choice probabilities, and the empirical evidence provided in the paper comes from both within- and between-subject experimental designs. There is a tighter connection between individual and population effects for the asymmetric dominance effect than for the other two effects: aggregation properties of the former are much more solid than those of the latter, as we will see in Section 3.2.

Huber et al. (1982) describe an “attraction” effect, similar to the asymmetric dominance effect, but where the decoy is not necessarily dominated by its “target”. Many authors have since treated the terms “attraction effect” and “asymmetric dominance effect” as synonymous and requiring dominance, including the original authors of both papers, in Huber, Payne, and Puto (2014). We adopt the latter term, to avoid any ambiguity.

Already, three interpretive issues arise in these descriptions: the distinction between population and individual choice probabilities, the distinction between a weak and a strong version of an effect, and the question of whether a claim about a “context effect” is a prediction that a phenomenon will occur or a claim that a phenomenon can occur. In this paper, we break down the descriptions of context effects into three components and relate them to various choice properties. We now define our notation and provide definitions of relevant choice properties.

We use the following notation throughout the paper. Let U= be a finite universe or master set of choice objects. When faced with a non-empty choice set A⊆U, a decision maker (DM) chooses a single object from A. The probability that the DM chooses x∈A is denoted PA(x). A random choice structure (RCS) on a master set U is the complete specification of the PA(x), x∈A⊆U, and is denoted P. For distinct x,y∈U, we use the shorthand notation pxy to mean P(x). We denote by Δ(U) the set of all RCSs on U; it is a Cartesian product of unit simplices of various dimensions.

Table 1 gives an example of an RCS on U= by specifying a complete list of choice probabilities on the non-empty subsets of U. The singleton choice probabilities in the first three rows are, of course, degenerate, but it is useful to include them in the definition of an RCS so that conditions like those in Eq. (1) below have concise expressions.

Fig. 1 gives a graphical representation of the same RCS using a Barycentric coordinate system. Each point in the triangle xyz is a unique convex combination λxx+λyy+λzz of the vertices x, y and z, where λx,λy,λz≥0 and λx+λy+λz=1. The vector (λx,λy,λz) gives the Barycentric coordinates of the point. Thus, the vertices x, y and z have Barycentric coordinates (1,0,0), (0,1,0) and (0,0,1), respectively. Since xyz is equilateral, the distances of the point (λx,λy,λz) to the sides yz, xz and xy of the triangle are fractions λx, λy and λz, respectively, of the height of the triangle.

We will be representing probability vectors on doubleton and tripleton choice sets as points in Barycentric coordinates. Here, for example, the ternary probability vector PU(⋅) in the final row of Table 1 is represented by a solid dot in the interior of the triangle in Fig. 1, with Barycentric coordinates (PU(x),PU(y),PU(z))=(0.6,0.1,0.3). The lengths of the solid light grey line segments joining the point (PU(x),PU(y),PU(z)) to the sides yz, xz and xy of triangle xyz are fractions 0.6, 0.1 and 0.3, respectively, of the height of xyz, as can easily be seen using the light grey dashed grid lines.

We represent the binary choice probabilities in rows four to six of Table 1 as points on the boundary of triangle xyz in Fig. 1. The hollow dot on the left side of the triangle gives the choice probabilities pxy=0.7 and pyx=1−pxy=0.3. It is the convex combination pxyx+pyxy of vertices x and y, and has Barycentric coordinates (pxy,pyx,0). Similarly, the hollow dot on the right side gives the choice probabilities pyz and pzy; and the hollow dot on the base gives pxz and pzx. We adopt the convention that binary probabilities are indicated by hollow dots and ternary probability vectors by solid dots, so we can tell the difference between a binary probability and a ternary probability vector that happens to be on the boundary of the triangle.

We distinguish between two different interpretations of a RCS. An individual RCS governs the choices of a single individual; a population RCS, those of a random sample of individuals from a population. Note that if each individual in a population is governed by an individual RCS, which may be degenerate, then the population RCS will be a convex combination of individual RCSs: each population choice distribution PA(⋅), A⊆U, is a mixture – with random sampling from the population being the common mixing distribution across choice sets A – of the various individual PA(⋅).

We will use the term discrete choice model to refer to any model specifying an RCS, directly or indirectly. Examples of indirect specifications include random utility models such as multinomial probit models as well as sequential elimination models such as Elimination by Aspects (EBA) models. In these examples, choice probabilities are not specified directly but are derived from the specification of a parametric random choice process and cannot be expressed as functions of the model parameters in closed form. Often in the literature, the term “discrete choice model” refers to the specification (direct or indirect) of an RCS up to a vector of parameters; that is, the specification of a set of models, with each model specifying an RCS. To make a clear distinction, we will call such a specification a class of discrete choice models. So for example, while many might refer to “the logit model” as “a discrete choice model”, we will refer to logit as a class of models.

Over the last several decades, many conditions on choice probabilities, including various kinds of stochastic transitivity, regularity and simple scalability have been studied (see Fishburn, 1999 for a thorough survey). For a given discrete choice model and a given condition, the model either satisfies the condition or it does not. When we say that a class of discrete choice models satisfies some condition, we mean that every model in the class satisfies that condition.

We will emphasize conditions that are targets of criticism (for being unrealistic) in the context effects literature. Thus, our discussion is organized around two properties: (i) regularity, a property of all random utility models that is inconsistent with the asymmetric dominance effect, and (ii) the constant ratio rule, a property of multinomial logit models that is inconsistent with the similarity and compromise effects. We discuss regularity and related properties in Section 1.3.1 immediately below, then the constant ratio rule and related properties in Section 1.3.2.

Most classes of discrete choice models in wide use in Economics and Marketing (and many, but fewer, in Psychology) satisfy a condition known as random utility. We will define this condition, but we first need to define a random utility model.

Definition 1

A random utility model (RUM) for a master set U is a probability space (Ω,F,μ), where Ω is a sample space, F is an event space and μ is a probability measure on F; and a measurable function u:U×Ω→R, where μ is non-coincident, meaning that for all distinct x,y∈U, μ=0.

We call u a utility function; its maximization over the available options governs choice in state ω∈Ω. Non-coincidence rules out ties. Henceforth, we will denote a RUM by the utility function u, suppressing notation for (Ω,F,μ). A RUM u for U induces the RCS P(u) through the construction PA(u)(x)=μ,x∈A⊆U.We say that a RCS P satisfies random utility if it can be induced by a random utility model. This is a restrictive condition; Block and Marschak (1960) show that the following set of linear inequalities is necessary for random utility: for all non-empty A⊆U and all x∈A, ∑B:A⊆B⊆U(−1)|B∖A|PB(x)≥0,where |⋅| indicates the cardinality of a set. Provided P satisfies random utility, the left hand side of the inequality is, for given x and A, the probability of x having greater utility than each of the other elements of A and less utility than each of the elements of U∖A. Falmagne (1978) shows that this set of inequalities is also sufficient. McCausland and Marley (2013) summarize what is known about how random utility relates to other conditions on choice probabilities, and illustrate how strong a condition it is, in terms of its low prior probability as an event in Δ(U), for a large class of prior distributions on Δ(U).

Examples of random utility models include multinomial logit, multinomial probit, mixed multinomial logit, and generalized extreme value models. In these examples and others, the specification of a discrete choice model proceeds by describing a random utility model and then computing – often numerically – the induced choice probabilities. However, there are other discrete choice models that are more naturally specified in some other way, but which satisfy random utility nonetheless; one can check the conditions in (1) without actually constructing a random utility model. For example, Tversky (1972b) shows that the class of Elimination By Aspects (EBA) models, introduced by Tversky (1972a), satisfies random utility, although EBA choice probabilities are usually derived as the result of a random sequential choice process.

Because the term “context effect” is open to interpretation, it will be useful to carefully examine the way in which random utility models – and by extension, their induced choice probabilities – are context invariant. In the definition of a random utility model, the utility distribution does not depend on the choice set offered. This is one kind of context invariance, and it is what gives content to the random utility condition. It is important to understand, however, that this does not imply that a so-called context effect is inconsistent with random utility. The similarity and compromise effects, as defined by Tversky (1972a) and Tversky and Simonson (1993), respectively, are both consistent with random utility, as we show below. In fact, we have just seen that EBA models, introduced by Tversky (1972a) specifically to account for the similarity effect, satisfy random utility.

Other classes of models generate choice probabilities in an explicitly context dependant manner. An example where individual models may or may not satisfy random utility, depending on the universe of choice objects, is the random regret minimization model of Chorus (2010). In simulations, not reported here but using R code available at 10.6084/m9.figshare.21791186, we found regions of a variable object’s attribute space for which choice probabilities satisfy random utility and other regions for which they do not.  Davis-Stober, Brown, Park, and Regenwetter (2017) provide a set of models which do or do not satisfy random utility depending upon their parametric specifications.

A weaker condition than random utility is regularity, the condition that adding objects to a choice set cannot increase choice probabilities of the objects already in the set.

Definition 2

A RCS P satisfies regularity if for all x∈A⊂B⊆U, PA(x)≥PB(x).

The regularity condition is relevant because it is both a consequence of random utility and (transparently) inconsistent with the original (i.e. the binary–ternary version, as defined below) asymmetric dominance effect.

Much of the evidence against random utility comes from the literature on the asymmetric dominance effect, where inequalities of the form (2), for particular choice sets A and B, are tested and often rejected: Rieskamp, Busemeyer, and Mellers (2006) survey empirical violations of five “consistency principles” in economics, regularity is the only one of these principles that is necessary for random utility, and the asymmetric dominance effect is the only empirical evidence they document against regularity.

There is also some literature on other kinds of tests, either direct tests of random utility or tests of other necessary conditions. Regenwetter, Dana, and Davis-Stober (2011) test a condition on binary choice probabilities called the triangle inequality, a condition that is itself implied by regularity. Using experimental binary choice data for 18 participants, they find strong evidence against the triangle inequality for one participant (also see Cavagnaro & Davis-Stober, 2014). McCausland and Marley (2014) and McCausland et al. (2020) jointly test the full set of necessary and sufficient conditions for random utility in (1). Using a subset of the data in Regenwetter et al. (2011), McCausland and Marley (2014) find moderate evidence against random utility for the same participant and mild evidence against for another participant. Using experimental choice data featuring choice from all binary and larger subsets of a master set McCausland et al. (2020) find strong evidence against random utility for four participants out of 141.

In some cases, it will be convenient to speak of random preference models, which are very similar to random utility models for finite master sets. In a random preference model, a random strict preference (or ranking) replaces the random vector of utility values in a random utility model. A random preference model induces choice probabilities when a decision maker chooses the highest ranked object (rather than the object with the highest utility) from a given choice set. For a finite master set U, the discrete choice models that can be induced by a random preference are the same ones that can be induced by a random utility model (see Marley & Regenwetter, 2018). Analogous to the definition of a random utility model and the construction of a RCS from one, we have the following:

Definition 3

A random preference model (RPM) on a master set U is a probability space (R,2R,π) where R is the set of strict linear orders on U.

We use the notation ≻∈R to denote an outcome, a strict linear order on U. Thus, for example, π() is the probability that the fixed object x∈U is ranked above (i.e., is preferred to) the fixed object y∈U, y≠x.

A random preference model with random binary relation ≻ induces the RCS P(≻) through the construction PA(≻)(x)=π},x∈A⊆U.

A second pair of conditions targeted by the context effects literature is the constant ratio rule and simple scalability. The constant ratio rule underlies the multinomial logit model, a model that has been widely used since the onset of the context effects literature and is inconsistent with all three context effects, as they are defined in the seminal context effect articles outlined in Section 1.1 above.

Definition 4 Tversky, 1972a

P satisfies the constant ratio rule if for all x,y∈A, pxypyx=PA(x)PA(y),whenever the denominators do not vanish.

The term independence of irrelevant alternatives (or IIA) is commonly used, especially in economics, to mean the same thing, although the term has other meanings: for example, Ray (1973) documents three different conditions with the IIA name and clarifies the relationships among them. To avoid ambiguity or confusion, we will use the term constant ratio rule.

If, in addition to the constant ratio rule, we also require all choice probabilities to be positive, we get the class of multinomial logit models. The constant ratio rule is a consequence of Luce’s choice axiom, introduced in Luce (1959), but because this axiom does not rule out choice probabilities equal to zero, there are models satisfying Luce’s choice axiom that are not multinomial logit models.

Many theoretical and empirical problems with the constant ratio rule were raised before the seminal papers in the context effect literature. Some of the theoretical arguments and empirical evidence against the constant ratio rule also apply to a weaker condition, called simple scalability.

Definition 5 Krantz, 1964

P satisfies simple scalability if there exists a function u:U→R and functions F2:R2→R,F3:R3→R,…,F|U|:R|U|→R such that for all A=⊆U, PA(x)=F|A|(u(x),u(y),…,u(z)),where each Fi is increasing in its first argument, and strictly so if PA(x)<1; and decreasing in its other arguments, and strictly so if PA(x)>0.

Tversky (1972a) shows that simple scalability is equivalent to the following condition:

Definition 6 Tversky, 1972b

P satisfies order independence if for all A,B⊆U, x,y∈A−B and z∈B PA(x)≥PA(y)⇔PB∪(z)≤PB∪(z)provided the choice probabilities on the two sides of either inequality are not both 0 or 1.

The order independence condition can be seen as an ordinal version of the constant ratio rule. Since it is a condition on choice probabilities, it allows for statistical testing of simple scalability and the exploration of the logical relationship between simple scalability and various context effects.

One of the first theoretical counterexamples1 to simple scalability was provided by Debreu (1960) in a review of Luce (1959). Another well-known counterexample is the Bicycle/Pony example provided in Luce and Suppes (1965), attributed there to L.J. Savage. The Red bus/Blue bus example of McFadden (1974) appears shortly after Tversky (1972a), but it is a particularly well known one.

Fig. 2 illustrates how regularity, the constant ratio rule, and simple scalability constrain random choice structures. Again, we consider RCSs on the set U=. We use three Barycentric coordinate systems to plot choice probability vectors and sets of them. In all three, we fix the binary probabilities pxz=0.45 (point A) and pyz=0.35 (point B). To summarize, the top left plot shows the values of PU(⋅) that are consistent with the fixed pxz and pyz, under regularity; the top right plot shows the values of pxy and PU(⋅) that are consistent with the fixed pxz and pyz under each of the constant ratio rule and simple scalability. The bottom plot illustrates all the relevant regions in a single plot, illustrating the logical independence of regularity and simple scalability.

We now explain the figure in more detail. First, suppose that regularity holds. Then pxz is an upper bound for PU(x), and its complementary probability pzx=1−pxz is an upper bound for PU(z). The set of ternary probability vectors PU(⋅) satisfying PU(x)≤pxz and PU(z)≤pzx is the parallelogram A′AA″y in the top left and bottom plots. Likewise, pyz and pzy give upper bounds on PU

留言 (0)

沒有登入
gif