Modelling constituent order despite symmetric associations in memory

Memory for associations forms the cognitive basis for a large portion of behaviour (Lashley, 1951, Murdock, 1974). In many cases, such as remembering face-name relationships at a dinner party, or that colourful snakes are poisonous, it is sufficient to remember that stimuli are associated to each other. But sometimes it is important to remember an association along with its constituent-order (AB versus BA). Indeed, many examples of order-sensitive associations exist in language, such as modifier-head relationships in compound words, PAN CAKE versus CAKE PAN, or HOUSE GUEST versus GUEST HOUSE (Caplan, Boulton, and Gagné, 2014, Dressler, 2006). However, memory for order has typically not been a focus in the experimental study of verbal association memory. Standard tests of association memory ask participants to study pairs of words (AB), followed by cued recall (given A, respond with B). Participants can respond with B when given A, and vice versa, without knowing the constituent-order of the pairing. Moreover, memory for order is typically studied with separate tasks such as serial recall (study A, B, C, D, recall the list in order).

Consequently, mathematical models of association memory are quite poor at accounting for constituent-order, either assuming that associations are stored with perfect order, or with no order at all. Models based on convolution (Kelly et al., 2013, Metcalfe Eich, 1982, Murdock, 1982, Plate, 1995), and recent models within the REM framework (Cox and Criss, 2017, Cox and Criss, 2020, Criss and Shiffrin, 2005), assume associations are stored with no order. Thus, AB is mathematically equivalent to BA. The face-value prediction is that memory for constituent-order will be at chance. However, given evidence that participants can remember constituent-order above-chance (Greene and Tussing, 2001, Kato and Caplan, 2017, Kounios et al., 2003, Kounios et al., 2001, Thomas et al., 2023, Yang et al., 2013), one might rescue convolution, and other symmetric models, by allowing for some additional source of information to support order judgments, such as an additional term in the model. The consequence of storing order separately from associations is that the models would predict that memory for constituent-order should be unrelated to memory for the pairing itself. The second type of prediction, that associations are stored with perfect order, comes from matrix models (Anderson, 1970, Humphreys et al., 1989, Osth and Dennis, 2015b, Pike, 1984) and models that concatenate the two item vectors (Hintzman, 1984, Shiffrin and Steyvers, 1997). These models can infer order with no ambiguity, predicting that memory for constituent-order (AB versus BA) should be perfect given that the association itself can be recalled.

Kato and Caplan (2017) tested these predictions with a task which we refer to as order recognition (Greene and Tussing, 2001, Kounios et al., 2003, Kounios et al., 2001, Thomas et al., 2023, Yang et al., 2013). Order recognition tests memory for constituent-order directly by presenting pairs in their original (AB) or reversed order (BA). Participants then provide a forced-choice judgment whether the probe is intact or reverse. One group of participants were tested with cued recall, and then order recognition for each studied pair, and compared to another group tested with associative recognition after cued recall instead.2 Matrix models predict that order recognition performance should be perfect for correctly recalled pairs. Convolution models predict that order recognition performance should be equivalent for correct and incorrectly recalled pairs. Contradicting both predictions, order recognition was significantly better when cued recall was correct, but well below maximum, and well below associative recognition for correctly recalled pairs.3 These results indicate that verbal associations are neither encoded with perfect order, nor are completely order-absent, inconsistent with assumptions in all models.4

Another clue about the representation of associations and their constituent order comes from double function lists in Rehani and Caplan (2011), where cued recall was direction-specific. Double function lists (Howard et al., 2009, Primoff, 1938, Rehani and Caplan, 2011, Slamecka, 1976), contain pairs, where each constituent item appears in two pairs, once in the left position, and once in the right position (AB, …, BC, …, CA, …). Consider a trial where B is presented as a cue on the left-hand side. Correctly responding with C requires knowledge of relative position/order, for example, that B appeared on the left in pair BC, but not AB. Performance is compared to single function pairs that do not share items (EF, …, GH, …, IJ, …). Because of their extreme assumptions about order, matrix and convolution models generate direct predictions about this task. A convolution model has no information to select between A and C. Thus, assuming the model guesses between two possible responses, convolution predicts cued recall accuracy for double function pairs will be one-half that of single-function pairs. In contrast, matrix-based models suffer no interference between AB and BC (see below). Therefore, the model predicts equal accuracy for double and single-function pairs. Contradicting both matrix and convolution model predictions, Rehani and Caplan (2011) found double-function cued recall accuracy was somewhat lower, but well above one-half of single-function accuracy, converging with evidence from the order recognition task that associations are neither stored order-absent, nor with perfect directionality.5

In sum, participants can discriminate AB versus BA during a word pair task (Greene and Tussing, 2001, Kato and Caplan, 2017, Kounios et al., 2003, Kounios et al., 2001, Thomas et al., 2023, Yang et al., 2013), and even use order/item-position information to aid cued recall (B ?) to solve AB versus BC interference (Rehani & Caplan, 2011). Taken together, this suggests that the constituent-order of verbal associations is explicitly stored, and in a way that is moderately dependent on memory for the pairing itself.

Despite evidence that associations are stored with moderate levels of order, there is also a sense in which verbal associations are rather symmetric. Initial support for idea, known as associative symmetry, arose from the stable tendency for forward cued recall accuracy (APPLE ?) and backward cued recall (? OVEN) accuracy to be equal on average (Asch and Ebenholtz, 1962, Horowitz et al., 1964, Kahana, 2002, Kato and Caplan, 2017, Murdock, 1962). However, Kahana (2002) showed that an asymmetric model could produce symmetry in mean cued recall accuracy, suggesting this result is not diagnostic of symmetric associations. Instead, Kahana (2002) proposed that associative symmetry should be tested at the pair level, with two cued recall trials for each word, and where test 1 and test 2 is either forward or backward cued recall. Indeed, multiple studies have returned a near-perfect correlation for incongruent conditions (forward–backward, backward–forward), that are remarkably close to what are essentially test–retest correlations for congruent conditions (forward–forward, backward–backward) (Kahana, 2002, Kato and Caplan, 2017, Rehani and Caplan, 2011, Rizzuto and Kahana, 2000, Rizzuto and Kahana, 2001, Sommer et al., 2008). These findings either suggest forward and backward cued recall are testing the same bi-directional association in memory, or, that there are distinct forward and backward associations for a given pair, but these are highly correlated in their strengths (Kahana, 2002).

We were particularly interested in associative symmetry here because of the potential paradox between association memory that is highly symmetric, yet supports memory for its constituent-order. As we elaborate below, it was especially challenging in previous attempts to modify matrix models to simultaneously produce moderate order memory and associative symmetry (Kato & Caplan, 2017). A strong account of association memory should be able to account for both constraints, and thus, we include this as an additional benchmark for all models.

Given that associations are symmetric, yet support a moderate ability to judge constituent-order, how do existing models account for the potential tension between these constraints?

Associations are encoded as follows, M=ab⊺, where M denotes the memory matrix, a and b represent item vectors, and ⊺ denotes transpose. Bold-face indicates column vectors. Cued recall is modelled with matrix multiplication, for example, Mb≈a+noise. Matrix multiplication is direction sensitive, meaning that b⊺M≈0+noise. By comparing the outputs of Mb and b⊺M, the model can unambiguously infer that item b appeared in the left position. For similar reasons, matrix models also have a perfect ability to solve double function interference. If two pairs that share an item are stored in memory, M=ab⊺+bc⊺, the direction specificity of forward and backward cued recall means that a given item vector b can cue completely different pairs in memory based on direction, Mb≈a and b⊺M≈c. One can eliminate this directionality by simultaneously storing the forward and reverse association, αfa⊺b+αbb⊺a, where αf and αb are scalar random values that represent variable encoding strengths. Assuming that αf and αb are perfectly correlated, and that Eαf=Eαb, this model can produce perfect associative symmetry (Kahana, 2002), but as a direct consequence, cannot discriminate AB from BA (Kato & Caplan, 2017) or solve double function lists (Rehani & Caplan, 2011). To regain some ability to disambiguate AB from BA, Eαf could be increased relative to Eαb, so that the forward association is stronger in memory; however, the model now produces a forward recall advantage violating associative symmetry, and predicts order recognition performance would positively correlate with the difference between forward and backward cued recall performance. Kato and Caplan (2017) found no evidence for the latter prediction; these correlations were not significant. Kato and Caplan (2017) also tested a matrix model that always stored a definite order, but sometimes encoded pairs in the incorrect order with probability prev. Increasing prev reduced the model’s order recognition performance, even to the moderate levels seen in behaviour. However, the model assumes that even wrong order judgments are made with perfect certainty, because they come from perfectly directional associations in memory. The resulting prediction is that participants should be unlikely to switch their response if they are tested twice for order recognition, correct–correct or incorrect–incorrect judgments should be most frequent. This prediction was also unsupported in Kato and Caplan’s (2017) data—participants did not stick with their order judgments more frequently than they switched their order judgments. Along with evidence from other analyses, order judgments seem to not be made with perfect certainty, but are rather more like uncertain, noisy decisions that are prone to change on retest.

Convolution models do not store order at all. Associations are stored as follows, m=a∗b, where a and b denote item vectors, m denotes the memory vector, and ∗ denotes circular convolution.6 If a and b are n-dimensional vectors, with each element sub-scripted from 0 to k−1, circular convolution is defined as follows, mi=∑k=0n−1akb(i−k)modnwhere mod denotes modulo, and m is an n-dimensional vector. Importantly, convolution is strictly commutative, a∗b≡b∗a. This property causes convolution to naturally produce associative symmetry (Kahana, 2002), but also means that there is no way to recover the constituent-order of the pair after encoding. To retain order information in a convolution model, one could permute the elements of item-vectors before encoding (Jones and Mewhort, 2007, Kelly et al., 2013, Plate, 1995, Recchia et al., 2010), expressed as follows, m=pl(a)∗pr(b), where p denotes permutation operator, and subscript l and r indicate the position-specific permutation pattern applied to each vector. Permutation allows convolution to encode order-sensitive relationships (Jones & Mewhort, 2007), along with other useful side-effects (Kelly et al., 2013). In published implementations, the whole vector is permuted which effectively implements a non-commutative operation, more like a matrix-outer product, pl(a)∗pr(b)≠pr(a)∗pl(b). For this reason, fully permuting item vectors may be incompatible with empirical data in a similar way as an unmodified matrix model. However, we do test this idea, with a small twist, below.

In sum, the concurrent empirical constraints of associative symmetry and moderate order memory prove difficult for all existing models. Convolution models and modified matrix models can produce perfect associative symmetry, but disregard order, while non-commutative versions of both matrix and convolution models over-predict the degree to which order is remembered. One could address these challenges with two possible approaches, either modify non-commutative models to have reduced order memory, or extend symmetric models to store order. In the present article we take the latter approach.

Our objective here is not to fundamentally alter basic model mechanisms, but design modifications that store order while preserving useful characteristics, like associative symmetry, that make convolution a rich account of verbal association memory. To this end, all of our four models (Illustrated in Fig. 1) are intentionally very simple, each having only three free parameters. Furthermore, each model parameterizes order discrimination ability with one free parameter, as we describe below.

Model A (Fig. 1a): Order is encoded as explicit associations between item vectors and “position” vectors, bearing some resemblance to positional-coding models of serial recall (Brown et al., 2007, Burgess and Hitch, 1999, Conrad, 1960, Farrell, 2012, Henson, 1998), or item-context associations in the Temporal Context Model (Howard & Kahana, 1999) but with just two unique position vectors. These two associations for the left and right positions are stored along with the item–item association, mA=∑i=1Lαi(fi∗l)+(gi∗r)+(fi∗gi)where fi, gi are n-dimensional item-vectors, and l and r are n-dimensional position vectors, and L denotes list length or number of pairs stored in the memory vector mA. Features values for all vectors are sampled from N(0,σ2), and then vectors are strictly normalized. Item-position, and item–item associations share an associative encoding strength αi, which is a scalar value sampled from N(μ,σα), and where σα, and μ are free parameters. Model A infers order by comparing a dot product between a correct item-position pair to the memory vector, ((fi∗l)+(gi∗r))⋅mA, and a dot product between an incorrect item-position pair and the memory vector, ((fi∗r)+(gi∗l))⋅mA. In our implementation of model A, we parameterize order discrimination ability by modifying the strength of item-position associations with a single parameter, the mean associative encoding strength μ. By modifying μ, we can increase or decrease the match of a correct item-position pair to memory. Finally, because item-position and item–item associations share an associative encoding strength αi, this ensures that memory for the association co-varies with memory for its order.

Model Σ (Fig. 1b): Similar to model A, position vectors are used to represent order but are instead added element-wise to each item before convolving, which is mathematically similar to extensions of TODAM (Murdock, 1995) that summed item vectors before convolving, mΣ=∑i=1Lαi((fi+l)∗(gi+r))where L, αi, fi, gi, l, and r are identical to their definitions in Eq. (2), and mΣ denotes the memory vector. Interestingly, by expanding the encoding equation, mΣ=∑i=1Lαi((fi∗gi)+(gi∗l)+(fi∗r)+(l∗r)), we can see that this model is equivalent to model A (Eq. (2)) with an additional noise term, l∗r. This equivalency means that we can parameterize order discrimination ability in the same way as model A, by modifying the strength of item-position associations with a single parameter μ. Thus, if the model infers order by comparing a dot product between a correct item-position pair, ((fi+l)+(gi+r))⋅mΣ, and incorrect item-position pair to the memory vector, ((fi+r)+(gi+l))⋅mΣ, modifying the mean associative encoding strength μ can modify the match of a correct and incorrect item-position pair to memory.

Model ϕ (Fig. 1c): Order is encoded by incorporating dedicated positional feature values into the item vector alongside item-unique features. This bears some resemblance to the ways in which numerous models have incorporated attributes such as list context as specialized features. All items in the left position receive the same set of positional feature values, and likewise for right position items, mϕ=∑i=1Lαi((fi⊕l)∗(gi⊕r))where L is defined as before, and l and r consist of np positional features that are concatenated (denoted by ⊕) onto item vectors fi and gi respectively, and mϕ denotes the memory vector. Encoding strength αi is drawn from N(1,σα), but note the following difference from models A and Σ; σα is a free parameter and μ is fixed at 1. This is because order discrimination is parameterized with the number of positional features np, rather than μ. Vectors fi and gi each consist of unique item features, and have n−np dimensions to ensure that resulting dimensions of the full vector, with position features, is always equal to n. All feature values, including position features, are independently sampled from N(0,σ2), and item vectors, with position features, are strictly normalized. The order discrimination ability of model ϕ is parameterized by a single parameter, the number of positional features np. The model can infer order by comparing a dot product between a pair of items with correct position features to the memory trace, (fi⊕l)∗(gi⊕r)⋅mϕ, and a dot product between pair of items with incorrect position features to the memory trace, (fi⊕r)∗(gi⊕l)⋅mϕ. Increasing np increases the difference between these two matches, and thus overall order discrimination ability.

Model Π (Fig. 1d): To encode order, item-unique feature values are shuffled or permuted in a pattern that is specific to the position of that item vector. This mechanism is inspired by the use of permutation in other models (Jones and Mewhort, 2007, Kelly et al., 2013, Plate, 1995), but the key difference in our implementation is that here, only a subset of features are permuted, rather than the entire vector, mΠ=∑i=1Lαi(pl(fi)∗pr(gi))where L is defined as before, and fi and gi are n-dimensional item vectors, of which n elements are independently sampled from N(0,σ2). Vectors are then strictly normalized, and mΠ denotes the memory vector. A distinct pattern of permutation is applied to every left position item, denoted by pl, and another pattern of permutation is applied to the right position item, denoted by pr. Just like model ϕ, encoding strength αi is drawn from N(1,σα), where σα is a free parameter and mean associative encoding strength is fixed at 1. The order discrimination ability of model Π is parameterized by a single parameter, the number of permuted features nperm. Thus, similar to model ϕ, and unlike models A and Σ, μ is not a free parameter. The model can infer order by comparing a dot product between a pair of items with the correct position permutations to the memory vector, pl(fi)∗pr(gi)⋅mΠ, and a dot product between a pair of items with incorrect position permutations and the memory vector, pr(fi)∗pl(gi)⋅mΠ. Increasing nperm increases the difference between these two matches, and thus overall order discrimination ability.

A major focus in this article is the challenge presented by order recognition data (Kato and Caplan, 2017, Thomas et al., 2023), which to our knowledge, has not been previously fit by models. To investigate whether each of our extensions of convolution can address this challenge, we fit each to both aggregate data and single participants, as two separate benchmarks. Single-participant data is better in the sense that it is less likely to arise from a mixture of mechanisms, and more likely to be model-pure, reducing the chance that the wrong model is favoured. The disadvantage is that each participant has less data, so they are, in principle, more noisy than aggregated data. By including both single participant and aggregate model-selection, we could look for broad agreement between both benchmarks, increasing the robustness of any conclusions we make.

We also evaluate whether each model can account for double function list performance (Rehani & Caplan, 2011). While double function lists also provide a challenge to models for similar reasons as order recognition data, our intention is not to provide a comprehensive account of this task. Instead, we used double function lists to help characterize conditions under which certain order-encoding mechanisms may be more preferable. Accordingly, we kept this section brief, opting to use algebraic arguments and simulations, rather than quantitative fits to data. Our evaluation of these models will proceed as follows. First, we simulate order recognition, cued recall, and associative recognition with each model, to show the relationship between performance and key model parameters. Next, we fit models to order recognition data at the aggregate level, to determine if each can produce a moderate relationship between order recognition and cued recall, while preserving the near-perfect correlation between forward and backward cued recall (benchmark 1a). Next, we fit models to order recognition data for individual participants (benchmark 1b). Finally, we evaluate each model against double function lists (benchmark 2).

留言 (0)

沒有登入
gif