How do people build up visual memory representations from sensory evidence? Revisiting two classic models of choice

We make choices in virtually every real-world and laboratory task. For example, we decide which cereal we prefer in a supermarket, which color a word is in a Stroop task, or which item is ‘old’ in a forced-choice memory study. Because decision processes are ubiquitous, there is great value in determining the type of quantitative model that best captures them. To this end, we examine the generalizability of two prominent probabilistic models of choice. The first is a Gaussian signal detection model, which is based on classic Signal Detection Theory (e.g., Wixted, 2020) and Thurstone’s law of comparative judgment (Thurstone, 1927). The second is the normalized exponential model, commonly known as the softmax function (e.g., Bridle, 1990), which is based on Luce’s Choice Axiom (LCA) (Luce, 1959) and the ratio of strengths formula (Bradley & Terry, 1952) (for extensive taxonomy of these models see: Townsend & Landon, 1983).

In the current work, we focus on how these two models generalize across different decision-based visual memory tasks in order to better understand the types of computations people use to convert sensory evidence to memory representations to make memory-based decisions. Focusing on the generalizability of these models is key because this allows us to better isolate latent variables of interest (e.g., Navarro, 2021). For illustration, consider a standard forced-choice task in which you are shown an object that you have to remember. Subsequently, when you are tested on your memory, you are shown that object along with one or seven foil objects, where foils refer to objects you were never actually shown. In this simple forced choice task, as more foil items are added you will tend to become less accurate at choosing the object you saw because the probability of you incorrectly choosing a foil will tend to increase when more foils are present (Wickens, 2001). Importantly, if memory conditions are held constant across these decision tasks, the fidelity of your memory for the object you saw should also remain unchanged, regardless of how many foils you are shown (Swets, 1959). Thus, a key question is what decision model best allows us to assess people’s memory strength independently of the decision task we use to test it. In other words, which decision model’s parameters are invariant and best generalizes across variations in task structure that affect the decision process but not memory fidelity?

We focus on Signal Detection Theory and Luce’s Choice Axiom because they are prominent in different domains, such as decision-making and memory research, but within some domains, there are relatively few comparisons between them. Furthermore, early work that examines the connections between these models (for recent review see: Pleskac, 2015), has yet to be linked to contemporary research questions. We illustrate these points in the context of recent computational modeling research on visual memory.

Our article has the following structure. First, we overview each of the theories and their corresponding models. Second, we outline how early work on the relationship between models based on SDT and LCA applies to contemporary research on visual memory and describe a critical test that we used to discriminate between them. Finally, we discuss our findings and their relevance for theorizing about decision processes within and outside of visual memory tasks.

The application of SDT to the study of sensory and cognitive process comes from the tradition of perceptual psychophysics (e.g., Green, Swets, et al., 1966), which highlights the relationship between sensory signals that must be used to make a decision, and the physical and neural noise that perturbs them before a decision is made (Wickens, 2001). Over the years, SDT has been used in other domains, such as memory research (e.g., Wixted, 2007), to provide a detailed description of decision processes in detection and discrimination tasks by postulating latent memory-strength signals that are perturbed by noise. The two core assumptions of signal detection models is that when faced with making a decision, the conceivably rich and multi-dimensional representation of each alternative is collapsed down into a scalar value – the decision variable – and that the decision variable invoked by a particular alternative is probabilistic. Jointly, these assumptions capture the mainstream view that there are internal and external sources of noise that corrupt sensory and memory signals (e.g., Dosher & Lu, 1998). For instance, in the memory domain, a familiar object, such as a backpack, will produce a decision variable of some magnitude with respect to some task, such as a familiarity signal for a recognition task. The decision variable produced by observing a backpack will vary from one instance to another due to variation in external circumstances, such as its lighting and vantage point, as well as fluctuations of internal states, such as memory, attention and motivation.

Because decision variables in this view are seen as random variables, it is common to postulate a specific probability distribution over them (although see: Kellen, Winiger, Dunn, & Singmann, 2021). While in some low-level perceptual domains, great care has been taken to characterize the functional form of this distribution, and thus the form of the psychometric function (e.g., Green et al., 1966), in most applications such fidelity is unattainable and researchers simply assume that decision variables are normally distributed. Thus, historically, the normality assumption common in SDT is made primarily for convenience (Wickens, 2001). Furthermore, in contemporary modeling work it is often treated as an auxiliary assumption that does not have a theoretical justification (Kellen et al., 2021, Rouder et al., 2010). To preview our analysis and results, we show that the Gaussian parameterization of signal detection models is not merely ancillary. Instead, its use can have a principled theoretical basis that formalizes how sensory signals are converted to decision variables. We discuss this point in depth when reviewing the mathematical link between the Gaussian signal detection and softmax model.

Finally, most mainstream signal detection models postulate that, while decision variables are probabilistic, the decision making process is deterministic (for exceptions see, e.g., Benjamin, Diaz, & Wee, 2009). That is, once decision variables are sampled from their probability distributions, choices are made deterministically by comparing the decision variables to one another, or to a fixed decision criterion. Next we describe how these principles are used to explain performance in mainstream detection and discrimination tasks.

In detection tasks the observer responds by indicating the presence or absence of a target stimulus. The classic Gaussian signal detection model posits that this decision is made by collapsing the rich stimulus representation down into a single decision variable and then comparing this decision variable X against a fixed decision threshold C. Accordingly, the probability of responding that a target is “Present” on target present and absent trials is given by Eqs. (1), (2), respectively: P('Present'∣Present)=P(XT>C),P('Present'∣Absent)=P(XF>C).

In Eq. (1) XT denotes the decision variable elicited by the target stimulus, which is a random variable sampled from a normal distribution with free parameters, mean μ>0 and variance σ2: XT∼N(μ,σ2). A common assumption is that, on average, decision variables on target present trials will be of greater magnitude than on target absent trials, and it follows that their mean will also be greater. Therefore, with no loss in generality, the mean and variability of the decision variable elicited by foil items, XF in Eq. (2), on target absent trials is set to 0 and 1, respectively: XF∼N(0,1).

Unlike in detection tasks, in forced-choice discrimination tasks the target is always shown and an observer must select it out of a set of n alternatives. Classic signal detection models postulate that this selection process involves computing the maximum of a set of n independent random variables corresponding to the decision variables invoked by each of the stimuli: Xi. More precisely, the probability of identifying a given item i as the target is the probability that the magnitude of the decision variable generated by the target Xi exceeds the decision variables generated by each of the n−1 foil items Xj for j≠i: P(ID(i))=P(∀j≠i:Xi>Xj).

This general expression can be written out for the special cases of correct choices, when Xi corresponds to the target (i=1), and incorrect choices, when i≠1. For correct choices, or Target Identifications, Xi is the target (Xi=X1=XT) and all Xjs are foils, thus Xi∼N(μ,σ2), and Xj∼N(0,1). For incorrect choices, or Foil Identifications, the target is one of the Xjs while Xi and the remaining Xjs are foils. For both of these special cases, we can rewrite the general expression: X1=XT∼N(μ,σ2)X2...n∼iidN(0,1)P(ID(Target))=P(X1>max(X2...n)),P(ID(Foil))=∑i=2nP(Xi>max(X1..n∖i)).

Luce’s Choice Axiom (LCA) comes from the decision theory tradition, rather than psychophysics, and is predated by the ratio of strengths formula for pairwise choices (Bradley & Terry, 1952) (for empirical tests and extended discussion of these models see: Townsend and Ashby, 1982, Townsend and Landon, 1983). Unlike SDT, the LCA framework is silent about the mechanisms of detection and discrimination processes. Instead, it consists of a set of axioms that impose “plausible constraints” on choice probabilities.

The central axiom is called Independence from Irrelevant Alternatives and states that the probability of choosing one alternative over another should not change if irrelevant alternatives are added or taken away. Under this view, response probabilities for each alternative are computed by dividing each response strength by the sum of all response strengths in the set. For instance if a is one alternative out of a larger set T, the probability of choosing a out of S is P(a,S)=ϕ(a)∑z∈Sϕ(z),where ϕ is a response strength function. Note that independence from irrelevant alternatives follows directly from this formula because the odds of choosing a over a different alternative b∈S remains the same, even if we consider a larger set of alternatives T where S⊆T. That is, for 0<P(x)<1, P(a,S)P(b,S)=P(a,T)P(b,T)=ϕ(a)ϕ(b).

Eq. (8) also implies that the function ϕ lies on a ratio scale. That is, assume there exists another function ϕ′ that satisfies the equality ϕ(a)ϕ(b)=ϕ′(a)ϕ′(b).

Substituting 1 for ϕ(b) and τ>0 for ϕ′(b) yields τϕ(a)=ϕ′(a), showing that the scale ϕ is unique up to multiplication by a positive constant (proof adapted from: Krantz, Luce, Suppes, & Tver-sky, 1971). This entails that the response function ϕ lies on a ratio scale, an important and rare property of psychological metrics (Falmagne & Doble, 2015).

Finally, note that in order for choice probabilities in Eq. (8) to be restricted between zero and one, response strengths should be constrained to be non-negative. One way to impose this constraint is to parameterize the Luce choice model with an exponential function, such that P(a,S)=eϕ(a)∑z∈Seϕ(z).

This formulation of LCA is equivalent to the exponential form of the multinomial distribution and the softmax function (Bridle, 1990), which is routinely used in econometrics (McFadden, 1980), machine learning (Murphy, 2012) and reinforcement learning (Sutton & Barto, 2018).

Through the lens of LCA, performance in detection and discrimination tasks is not determined by random decision variables but by fixed response strengths. In detection tasks, assume that β denotes response strength generated by the target stimulus1 and V denotes a bias parameter for reporting the stimulus is absent. Then, on target present trials, the probability of correctly responding target present is P('Present'∣Present)=eβeβ+eV.

On target absent trials, the probability of incorrectly responding target present is determined by the response strength generated by the foil, which is zero. Thus, the probability of incorrectly responding target present on target absent trials is P('Present'∣Absent)=11+eV.

Note that the formulas for choice probabilities in Eqs. (12), (13) are formally equivalent to a logistic cumulative distribution (Suppes & Krantz, 2007), a special case of the softmax function for binary choices.

Extending this logic to discrimination tasks with n alternatives uses the standard assumption that the response strength generated by the target and n−1 foils is equal to β and zero, respectively. Accordingly, the probability of correctly selecting the target is P(ID(Target))=eβeβ+n−1,and the probability of incorrectly selecting a foil item is P(ID(Foil))=n−1eβ+n−1.

Due to their distinct origins and distinct mathematical instantiations, models based on SDT and LCA may seem extremely different from one another. However, the Gaussian signal detection and softmax models turn out to be close approximations in some tasks. More precisely, in detection tasks, the connection between these models follows simply from the fact that the logistic distribution approximates the normal distribution (Treisman & Faulkner, 1985). This entails that the LCA for binary choices is essentially equivalent to a signal detection model with a logistic parameterization, which closely approximates the Gaussian signal detection model. Thus, in detection tasks LCA and Gaussian signal detection models are closely related.

In discrimination tasks with more than two alternatives the Gaussian signal detection and softmax model no longer approximate each other. The relationship between these two models breaks down in m-afc tasks (where m>2) because the distribution of maximums of normally distributed variables is not a normal distribution. However, it is possible to establish an equivalence between the two models by dropping the normality assumption in the signal detection model. Holman and Marley (1974) as well as Yellott (1977) showed that, if decision variables in the signal detection model have a Type 1 extreme value Gumbel distribution for the maximum (Gumbel, 1954), than the signal detection model is mathematically equivalent to the Luce model for any number of alternatives (m) in an m-afc task. We provide our own proof of this result in Appendix.

In the current context, the major implication of this result is that comparing the softmax model to the Gaussian signal detection model can be recast as a comparison of two different parameterizations of the signal detection model, that is, a signal detection model with a Gumbel versus a Gaussian parameterization. As we discuss next, these two parameterizations have an important conceptual basis because they describe different ways of translating sensory evidence into decision variables.

A common assumption is that the Gaussian parameterization of signal detection models is made for mathematical convenience and does not have a theoretical basis (e.g., Kellen et al., 2021). However, early work by Thompson and Singh (1967) provides one principled justification for using a normal distribution to model decision variables. These researchers noted that each time we observe a stimulus, it produces a sensory response of some variable magnitude. For instance, through the lens of contemporary population coding neural models, these sensory responses can be conceived of as distributed patterns of activation in populations of neurons (e.g., Averbeck, Latham, & Pouget, 2006).

If this large number of sensory signals (e.g., patterns of activation across a population) are pooled together by summing or averaging to compute decision variables, then in accordance with the Central Limit Theorem, decision variables will be normally distributed. In contrast to the Gaussian, the Gumbel distribution is an extreme value distribution used to model the maximum of a set of random variables (Gumbel, 1954). Thus, a signal detection model with a Gumbel parameterization is most consistent with the view that, rather than pooling, the observer takes the maximum of sensory signals to compute decision variables. Fig. 1 depicts these predictions by showing how a stimulus produces a neural response profile that consists of a set of tuning functions (colored distributions), and how these neural responses can be converted to a single decision variable through the lens of each model.

Together, a test between these models can be recast as a test of two different signal detection models. To further motivate the comparison of these two models, we underscore that there is extensive support for signal detection theory as a general theory in the memory domain (for recent overview see: Wixted, 2020) using diverse methods, including Receiver Operating Characteristics analysis (e.g., Robinson et al., 2020, Williams et al., 2022, Wixted, 2007), and a novel critical test which rests on minimal assumptions (Winiger, Singmann, & Kellen, 2021) While some authors reported evidence for alternative models under some conditions (e.g., Balakrishnan, 1999, Rouder et al., 2008), follow-up work suggests that these results were spuriously driven by either restricted model assumptions, or non-diagnostic data and inadequate metrics of model fit (Mueller and Weidemann, 2008, Robinson et al., 2022). Moreover, recent modeling work in the visual memory domain indicates that a signal detection model constrained by psychophysical scaling methods outperforms all extant alternative models of visual memory both in fit and generalization (Schurgin, Wixted, & Brady, 2020). Thus, classic and contemporary modeling work demonstrates robust evidence for signal detection models of memory. Our work builds on this literature by highlighting that the parametric assumptions of signal detection models are not merely ancillary, but can have different implications for how we think observers convert rich sensory or memory evidence to decision variables when making memory-based decisions.

We compared the Gaussian signal detection and softmax model by examining which model’s parameters (d′ in SDT; β in LCA/softmax) are invariant across variations in the number of alternatives presented at test in an m-afc task. Our test rests on the assumption that, everything else being equal, the way in which observers compute decision variables should be invariant across changes in m-afc. This assumption aligns with the broader view that model parameters that generalize across task structures may also provide better approximations of latent cognitive processes (Busemeyer & Wang, 2000).

We note that a similar test was used in an auditory memory task in an early study by Treisman and Faulkner (1985). These authors reported evidence for the Gaussian signal detection model, however, their results were somewhat ambiguous. Mainly, they found that variations in m-afc produces decreases in d′ and increases β, parameters in the Gaussian signal detection and softmax model, respectively. The researchers interpreted this as evidence for the Gaussian signal detection model because they reasoned that increasing the number of alternatives in the auditory task may increase memory load and hurt performance, but not improve it. However, while the finding that d′ decreases with m may be more psychologically plausible, it does not demonstrate that parameters of this model are invariant with m because m is confounded with memory load. Furthermore, this study only used data from 6 participants and may have been underpowered. As we discuss next, one of our goals was to address both of these methodological limitations and perform a more direct test of the two theories.

We ran a new set of experiments that extends the critical test of Treisman and Faulkner (1985) to the visual memory domain. The first reason we used a visual memory task is because this allows us to present all m-afc alternatives visually, instead of having participants maintain these in working memory. Accordingly, this study design minimizes differences in memory load across m-afc task, addressing the core limitation of the Treisman and Faulkner experiment and providing a strong testbed of parameter invariance. We also increase the number of participants in our experiments to ensure that our studies are sufficiently powered.

Another motivation for extending this test to the visual memory domain is because a comparison between these models has direct relevance for contemporary models of visual memory. That is, both the Gaussian signal detection and softmax models have been used in recent modeling work as response functions that capture how people make decisions in m-afc visual memory tasks (Oberauer and Lin, 2017, Schurgin et al., 2020). However, these models have not been empirically compared with critical tests, and the processing implications for understanding how people compute decision variables in visual memory tasks have not been discussed. Finally, there is much recent interest in instantiating human visual memory models using neural network architectures (e.g., Bates et al., 2023, Brady and Störmer, 2020, Hedayati et al., 2022) that routinely use the softmax as a response function (Murphy, 2012); it remains unclear whether this provides the best approximation of how humans map latent states to memory judgments. Our goal is to fill these gaps by comparing these models in a set of visual memory experiments. To this end, we ran two experiments in which we varied the structure of the stimulus space, the dimensionality of stimuli and the presentation format to ensure that our results were robust across different processing domains and theoretical assumptions.

留言 (0)

沒有登入
gif