A statistical foundation for derived attention

Selective attention is a key part of learning theory. Many experimental results can be explained by supposing that organisms pay attention to some cues and ignore others, and that attention changes as a result of experience. For decades, researchers have attempted to mathematically model this interplay between selective attention and memory formation. One prominent class of models assumes that organisms form direct associations between cues and predicted outcomes (e.g. food, shock, category labels); selective attention acts to re-scale cues and/or control learning rates (e.g. Esber and Haselgrove, 2011, Frey and Sears, 1978, Kruschke, 2001) In this paper, we shall focus on a sub-class of such models based on the principle of derived attention.

Derived attention theories assume that the attention paid to a cue is proportional to the size of its association weights. A cue with large association weights (whether positive or negative) will thus be attended to, while one with small association weights will be ignored. Attention is thus derived from existing associations. Derived attention models have been proposed in several forms (Esber and Haselgrove, 2011, Frey and Sears, 1978, Le Pelley et al., 2016), but a review article by Le Pelley et al. (2016) has brought the theory into particular prominence.

Le Pelley et al. (2016) show how derived attention can explain many important learning results, despite its simplicity compared to other attention learning rules (e.g. Kruschke, 2001). For example, consider the learned predictiveness effect (Le Pelley and McLaren, 2003, Lochmann and Wills, 2003). During initial training, certain cues (A, B, C, and D) are correlated with category labels, while others (V, W, X, and Y) are not. In a later transfer stage, people pay more attention to the previously relevant cues than the previously irrelevant ones, even though all cues perfectly predict the new categories (see Fig. 2(a)). Derived attention explains this result by noting that the predictive cues develop larger associations in the first stage and hence greater attention during the second. Similar reasoning explains why people pay a great deal of attention to cues associated with large monetary rewards (Anderson et al., 2011, Le Pelley et al., 2013, see Fig. 4(a)) and little attention to redundant (blocked) cues (Beesley and Le Pelley, 2011, Kruschke and Blair, 2000, see Fig. 3(a)). However, not all attentional phenomena are explained by derived attention (Medin and Edelson, 1988, Swan and Pearce, 1988).

In this paper we offer a normative foundation for derived attention by reformulating it in terms of Bayesian inference, and show how this significantly expands the scope of the theory. The new model is based on an insight of Le Pelley et al. (2016): “The idea that attention toward a cue increases to the extent that it predicts a high-value outcome – attention is determined by associative strength – is very intuitive, and is consistent with the idea that attention goes to cues that are known to be significant” (page 1129). We develop this insight into a probabilistic generative model of the organism’s environment, and then derive an online variational Bayesian regression algorithm. The resulting algorithm resembles previous derived attention models and explains the same experimental results, including learned predictiveness (Lochmann & Wills, 2003), inattention after blocking (Beesley and Le Pelley, 2011, Kruschke and Blair, 2000), and value-based attention (Anderson et al., 2011, Le Pelley et al., 2013).

The new Bayesian derived attention model can also explain retrospective revaluation effects (which are characterized by learning about absent cues), a class of phenomena that Le Pelley et al.’s (2016) derived attention model cannot handle. Backward blocking is one example of retrospective revaluation (Kruschke and Blair, 2000, Shanks, 1985, see Fig. 5(a)). In a backward blocking task, participants receive paired cue training (A.X → I, B.Y → II) followed by further training with only one cue from each pair (A → I, B → II). This continued single cue training weakens the associative strength of the dropped cues (X and Y) even though they are not present during it. Le Pelley et al.’s (2016) model cannot explain this or other retrospective revaluation effects because its learning rule (based on Rescorla & Wagner, 1972), only updates the value of cues that are present during a trial. It is thus not adequate to describe backward blocking or other retrospective revaluation phenomena. However, the new Bayesian derived attention model produces these effects through an explaining away mechanism: if further training shows that A and B are sufficient to explain the outcomes, then X and Y’s weights decrease toward zero (Dayan & Kakade, 2001). Moreover, casting derived attention into a Bayesian framework produces a novel prediction that goes beyond both Le Pelley et al.’s (2016) version of derived attention and Dayan & Kakade’s (2001) Bayesian regression model: cues suffer a loss of attention after being subject to backward blocking.

留言 (0)

沒有登入
gif