Episodic memory is defined as a memory for an event experienced in a particular context (Tulving, 1983). This definition suggests that episodic memory calls for the storage of at least two types of information: (1) of the event itself, and (2) of the context in which said event took place. In the case of a typical memory experiment where participants study lists of words, each word item would correspond to an event, whereas context would refer to the characteristics surrounding its occurrence, such as the list in which it was studied (e.g., List 1, List 2), or its perceptual characteristics (e.g., color, font). The ability to remember contextual information is commonly referred to as source memory, in distinction from the ability to remember the item itself – item memory (Batchelder and Riefer, 1990, Johnson et al., 1993, Lindsay, 2008).1
Source memory plays an important role in everyday life: Suppose that you see somebody in the university cafeteria, and you are sure that you have met this person before but are unsure as to where exactly. There are several possibilities, such as the cafeteria itself, the library, or the psychology department where you spend most of your days. Given that you are seeing this person at the cafeteria, it is reasonable to indulge the possibility of having met them there before. But let us suppose that you end up figuring out (correctly) that the cafeteria is not where this previous encounter took place. In this example, the ability to recognize this person as someone that you have met before was based on the item information, whereas the cafeteria served as a source cue in the sense it offered a possible context for this previous encounter.
The investigation of episodic memory has led to the identification of a large body of empirical phenomena and the development of numerous formal models attempting to provide a theoretical account for them (e.g., Anderson and Milson, 1989, Davelaar et al., 2005, Dennis and Humphreys, 2001, Gillund and Shiffrin, 1984, Hintzman, 1988, Howard and Kahana, 2002, Humphreys et al., 1989, McClelland and Chappell, 1998, Murdock, 1997, Osth and Dennis, 2015, Raaijmakers and Shiffrin, 1980, Raaijmakers and Shiffrin, 1981, Shiffrin and Steyvers, 1997). But although contextual or source information plays a central role in many theoretical accounts (e.g., Anderson and Bower, 1972, Davelaar et al., 2005, Dennis and Humphreys, 2001, Howard and Kahana, 2002, Mensink and Raaijmakers, 1988, Sederberg et al., 2008), there has been a focus on item-memory judgments at the expense of source-memory judgments (for notable exceptions, see Glanzer et al., 2004, Osth et al., 2018, Starns and Ksander, 2016). To the point that one of the most prominent candidate models in the literature, REM (Retrieving Effectively from Memory; Shiffrin & Steyvers, 1997), which will be the focus of the present work, is currently unable to provide an account for source judgments that is commensurate with its achievements when it comes to item memory (e.g., Criss, 2006, Criss et al., 2011, Criss and Shiffrin, 2005, Diller et al., 2001, Kılıç et al., 2017, Malmberg et al., 2004, Malmberg and Shiffrin, 2005, see also Osth et al., 2018).
The goal of the present work is to contribute towards bridging this gap. After reviewing a number of relevant concepts and empirical findings, we will identify the shortcomings of REM in its current form and propose a revision that addresses them. We will then conduct a first evaluation of this proposal using data from three novel experiments that manipulate source strength in the same vein as earlier manipulations of item strength, whose results motivated the initial development of REM (Ratcliff et al., 1990, Shiffrin et al., 1990, Shiffrin and Steyvers, 1997). Lastly, we will scrutinize the general explanatory power of the theoretical account provided by the revised REM – local matching – by testing its predictions regarding the negligible appearance of output interference in source judgments. These predictions will be tested using one novel experiment and two previously published datasets.
Manipulations of study events (i.e., strengthening), either by manipulating exposure time or via repetition, have long been at the center of theoretical debates due to their ability to distinguish between different candidate explanations (e.g., Glanzer et al., 2009, Kellen and Klauer, 2015, Shiffrin et al., 1990). These experimental manipulations are designed to yield two classes of studied items – weak and strong. Lure items can also be classified as weak or strong depending on the characteristics that they share with their studied counterparts (e.g., perceptual, such as being presented in the same colors as weak/strong items) or the context in which they are tested (e.g., lures included at test after a study list of exclusively weak or strong items).
One of the key findings coming out of these strengthening manipulations is known as the strength-based mirror effect: Relative to weak items, the testing of lists of strong items results in a greater proportion of targets being recognized alongside a reduced proportion of recognized lures (Benjamin, 2001, Cary and Reder, 2003, Criss, 2006, Criss, 2009, Criss, 2010, Criss et al., 2014, Glanzer and Adams, 1985, Kılıç and Öztekin, 2014, Kılıç et al., 2017, Starns et al., 2010, Starns et al., 2012, Starns et al., 2012, Stretch and Wixted, 1998). There is general agreement that an increase in learning opportunities should improve the recognition of targets. But in the case of lures, there are disagreements on how to best explain the decrease in their recognition rates: The criterion-shift account argues that the increase in study opportunities affects metacognitive processes. Specifically, criteria are more stringent when judging test lists comprised of strong items than weak items (Cary and Reder, 2003, Stretch and Wixted, 1998, Verde and Rotello, 2007, Starns et al., 2010, Starns et al., 2012). In contrast, the differentiation account proposes that the additional study opportunities allow strong targets to be better differentiated from other items in general. One of the outcomes of this differentiation process is that lures become more distinct and less likely to be recognized (see Shiffrin and Steyvers, 1997, Criss, 2006, Criss, 2009, Criss, 2010, Criss et al., 2013, Kılıç et al., 2017, Koop et al., 2019).
Another key finding concerns the interactive effects (or lack thereof) between strong and weak items – the null list-strength effect (Hirshman, 1995, Murnane and Shiffrin, 1991, Ratcliff et al., 1990, Shiffrin et al., 1990, Yonelinas et al., 1992). This effect establishes that the propensity to recognize weak items is unaffected by their intermixing with strong items. This null effect contrasts with the case of free recall, where it is found that strong items benefit from their intermixing with weak items, whereas weak items are negatively affected (e.g., Malmberg and Shiffrin, 2005, Ratcliff et al., 1990, Wilson and Criss, 2017; see also Tulving & Hastie, 1972). Since its establishment, the null list-strength effect has been extensively studied, with numerous candidate accounts being proposed (e.g., Cary and Reder, 2003, Dennis and Humphreys, 2001, Murdock and Kahana, 1993, Shiffrin and Steyvers, 1997; see also, Osth and Dennis, 2014).
Although most investigations have focused on the effect of strengthening in terms of item judgments (was this item studied or not?), a small number of studies have turned their focus to its impact on source judgments. For instance, both Dobbins and McCarthy (2008) and Glanzer et al. (2004) reported higher source accuracy for deeply processed words (strong items) than shallowly processed ones (weak items). More recently, Starns and Ksander (2016) showed the strengthening of items through repetition also leads to increases in accuracy in both item and source judgments (for similar findings, see also Dobbins & McCarthy, 2008, Experiment 1; Glanzer et al., 2004, Experiment 1; Osth et al., 2018, Starns et al., 2013).
Starns and Ksander (2016) also investigated the effect of items occurring under more than one context or source. They found that the strengthening of one of these sources (through repetition) had a negative impact on the recognition of the other, non-strengthened sources. These results contrast with Dobbins and McCarthy’s (2008) earlier report, where their findings did not suggest any negative impact for weak sources encountered earlier in the study phase (see their Table 5). Importantly, both findings were also observed by Kim et al. (2012), which suggests that the negative impact observed by Starns and Ksander (2016) might be due to a poorer encoding of sources encountered later during study. More recently, Osth et al. (2018) reported null list strength effects in source judgments, the only exception being a study in which source judgments were not preceded by item judgments. Osth et al. attributed this discrepancy to the expectation of mnemonic evidence being unavailable for items that would not have been recognized in the first place (see Batchelder and Riefer, 1990, Hautus et al., 2008, Klauer and Kellen, 2010).
Osth et al. (2018) relied on their source memory results to motivate the revision of a global matching model proposed earlier (Osth & Dennis, 2015). The present work follows along similar lines, although it turns its focus on a different theoretical contender – the REM model (REM; Shiffrin & Steyvers, 1997). REM describes episodic memory in terms of memory traces representing our experiences in the world. Each trace is represented as a vector, with each element therein referring to a unique feature. For example, suppose one had breakfast yesterday with oatmeal, milk, apple, and honey. In this case, each food ingredient is represented as an individual vector that stores its properties (e.g., the apple is a fruit, red, and sweet). In typical implementations of REM, it is assumed that each vector is comprised of twenty feature values and that each feature value v is a positive integer randomly sampled from a geometric distribution with parameter g:P(v)=(1-g)v-1g,v=1,2,⋯,∞
When a memory trace is created, each feature is assumed to be stored with the probability u parameter. The vector elements associated with non-stored features all have a value of zero. With probability c, this storage is accurate. But when this is not the case, which is expected to occur with complementary probability 1-c, a random value is assumed to be stored instead (this random value is sampled from the same geometric distribution). Going back to our earlier example, the feature of being “red” for the apple may not be stored in memory at all or incorrectly stored, e.g., “green”. Altogether, this storage process is expected to yield memory traces comprised of correct, incorrect, and absent feature information.
According to REM, retrieval is based on a global matching process in which a probe item is compared to the existing episodic traces. This matching process quantifies the degree to which the features in the probe are the same as or different from those found in each trace out of a total of N traces. For a probe item j:λ(i,j)=(1-c)nq(i,j)∏v=1∞c+1-cg1-gv-1g1-gv-1nmv,i,j
with i indexing episodic traces, and v feature values, respectively. The number of non-zero features that mismatch is nq, whereas the number of non-zero features that match is nm. Features that do not contain information (i.e., features with a value of zero) are not considered. These matches are averaged across traces, which yields an odds ratio Φ:Φj=1N∑i=1Nλi,j
This ratio captures the relative support that the probe item was previously encountered. When Φ is higher than the decision criterion (typically set to 1), the item is endorsed as “old”. Conversely, if Φ is lower than the criterion, the probe item is not endorsed (i.e., it is judged to be “new”).
The original implementation of REM assumes that the mnemonic benefits from additional study opportunities (e.g., repetition of items during study) come from the storage of currently absent features. Although this assumption implies that incorrectly stored features will remain so (i.e., there is no updating), the end result is nevertheless a more complete and accurate representation of items, with memory traces becoming less similar to each other. This process, commonly referred to as differentiation, is illustrated in Fig. 1: As the number of study opportunities increases, so does the probability that items presented at test (targets and lures) are correctly judged. This is because lures become less similar to the existing traces, whereas targets become increasingly similar to the trace representing their previous occurrence and dissimilar from all other traces.
To make this process clear, let us walk through the example in Fig. 1:
i.Assume we have the exact same list of words, such as apple, honey, milk, and oatmeal, presented once (Study List 1: Pure Weak) or three times (Study List 2: Pure Strong).
ii.Each word is represented as a vector with five features for simplicity (sampled using Eq. (1)). After studying, these vectors contain positive integers or zeros, indicating available or missing information, respectively. The features are stored – correctly or incorrectly – probabilistically based on the u and c parameters.
iii.For items presented in the study phase, the model assumes the same kind of item-recognition judgment that takes place during the test phase. Now, let us suppose that a presented item is identified as previously studied. In that case, the best-matching trace (the one with the highest likelihood ratio) is updated. When items are repeated at study (e.g., Study List 2), each time they are presented constitutes an opportunity for additional features to be stored, as highlighted in bold in the figure.
iv.During the test phase, the test probes, including targets and foils like apple, point, and milk, are compared to each item trace, generating a likelihood ratio for each comparison (e.g., λ1 is the likelihood ratio from comparing the probe with the first item apple, and so on). These likelihood ratios are averaged (Φ) and compared to the criterion of 1 for “old”-“new” decisions.
v.Additional study opportunities for target items result in stronger matches between their probes and their respective memory traces (e.g., the word “apple” produces λ1 = 7.77 vs. λ1 = 58.9 after its single vs. repeated presentations) and a weaker match with other traces (e.g., the word “apple” produces λ1 = 0.3 vs. λ1 = 0.03 after the single vs. repeated presentations of the word “honey”).
The explanatory power of REM’s trace updating processes and the differentiation that follows are not limited to strengthening effects. Given that these processes are also set to take place during testing, where target and lure items are re/encountered, they yield new testable predictions. Among these is the effect known as output interference, which is observed in item recognition as a decrease in recognition of targets throughout the test, while false endorsement of lures either slightly increases or remains unchanged (Annis et al., 2013, Criss et al., 2011, Criss et al., 2017, Kılıç et al., 2017, Koop et al., 2015, Malmberg et al., 2012).
Criss et al. (2011) suggested that incorporating the differentiation process into the REM model during testing can explain the output interference observed in item recognition. As depicted in Fig. 2, when a probe item is judged to be old, the best-matching trace gets updated using the information provided by the probe. However, if the probe is a lure, this leads to an incorrect update of an existing trace representing a target’s previous encounter, causing impaired recognition of that target later in testing. On the other hand, when a probe item is judged to be new, a new memory trace forms, increasing the number of items to compare and the overall noise.
Again, let us walk through the example in Fig. 2:
i.Assume the exact same word list used in the previous example. The target word “apple”, when presented at the test, has a Φ of 2.11, surpassing the criterion of 1. Consequently, the best-matching trace is updated – in this instance, the target trace “apple” happens to be the best-matching trace and is updated with the storage of an additional feature, the integer “4”, highlighted in bold.
ii.Continuing with the presentation of a lure item, “point”, its comparison yields a Φ of 1.53, once again surpassing the criterion. But since this is a lure item, the best-matching trace is the one associated with the word “oatmeal”. This, in turn, leads to an incorrect updating of this memory trace, which eventually leads to the word “oatmeal”, when presented at the test, not being recognized.
iii.Lastly, let’s examine the target item, “milk”, and the lure item, “turtle”, each judged to be new at the test. Given these judgments, both items are stored as new traces, shown as bold vectors under “Study List Updated”. The inclusion of these new traces increases the length of the list of memory traces, introducing additional noise. One of its consequences is failure to recognize target items tested later on, such as “honey”.
An alternative version of REM included in its original proposal (REM.4; see Shiffrin & Steyvers, 1997) introduced the concept of context features that vary as time passes, alongside a threshold used to discriminate a list of items encoded in one context from lists learned in other contexts (context threshold). These context features are represented in context vectors appended to the vectors already postulated by REM to represent item features. Any two appended vectors stand as the mnemonic representation of a specific item encountered in a specific temporal context.
According to this REM version, retrieval follows a two-step process that begins with the activation of memory traces as a function of the similarity between their context features and the context probe. This is followed by a matching process that is circumscribed to the items whose context-feature activation was above the context threshold. Fig. 3 illustrates how context (e.g., breakfast in the morning) is appended to each item.2 Returning to the example at the beginning of the section, let us ask whether one had an apple at breakfast. According to REM (or REM4, to be more specific), the context features (in this case, “breakfast”) would first be used to activate the images of food eaten at breakfast, which would then be compared with the item probe “apple”. However, note that this first activation is imperfect: On one hand, it is possible that memory traces representing the food eaten at breakfast might not pass the first context threshold. On the other, traces representing food encountered in different contexts (e.g., yesterday’s breakfast) may be erroneously activated if their context is similar enough to the probe context. Altogether, this REM model expects attempts to recognize events that took place in a given context to be formed as a function of memory traces that include events that took place in said context but also events that took place elsewhere (for further details, see Shiffrin & Steyvers, 1997).
The two-step process postulated by this REM variant was originally motivated by a desire to demonstrate how the model could be more efficient in retrieving items that occurred in a specific context (e.g., the study list, see Shiffrin & Steyvers, 1997, pp. 155). However, this proposal is insufficient in the sense that it does not define how the context(s) of an individual item would be retrieved from memory. In short, REM, in its current form, is unable to account for source-memory judgments.
For REM to describe source-memory judgments, we first need to establish how source information is stored. We begin by assuming that – similar to their ‘item’ counterparts – there is an imperfect storage of source features, which can take on incorrect values or be absent altogether (i.e., take on value zero). For simplicity, it is assumed that this storage process is assumed by the same probabilities u and c. The resulting source vector is appended to the item vector. Moreover, we will also assume that encountering the same item multiple times across different sources results in multiple source vectors being appended to the same item vector (see Fig. 4).
When an item probe is presented for item recognition, the model ignores the source features, using only the item features to determine if the item was studied in the most current list. As discussed earlier, this is achieved by comparing the features in the probe with those contained in the item traces (see Eq. (2)). In turn, when an item is endorsed as “old”, the model uses the source features of the best-matching trace to determine its source (see Fig. 4). This is achieved by comparing the features in the source probe with those in the source traces of the item as follows:λ(i,s,r)=(1-c)nq(i,s,r)∏v=1∞c+1-cg1-gv-1g1-gv-1nmv,i,s,r
where i indexes the best-matching item trace, s the source trace(s), r the source probe, and v is the feature value in the source memory trace. Features that contain no information do not contribute to the decision process, as in the case of “old”-“new” judgments. If an item was studied in more than one source, the resulting likelihood ratios λ are averaged and converted into an odds ratio:Φr=1Ns∑s=1Nsλs,r
where Ns is the number of source memory traces appended to the item trace. Like in item recognition, if the odds ratio Φ is higher than a criterion, the source is endorsed; otherwise, it is rejected.
When items are encountered repeatedly, it is assumed that there is an evaluation of whether they were previously encountered and, if so, whether this encounter took place in the same source (see Eqs. (2)–(5)). If the just-encountered source is not deemed to be novel, then the best-matching source trace is updated. Otherwise, a new source trace is appended to the item trace (see Fig. 4). But if an item is deemed to be novel (e.g., not a repetition), a new item trace and associated source trace are introduced.
Once again, let us walk through the example in Fig. 4:
i.Consider two items studied under different conditions: one involving multiple sources, like apple studied in multiple contexts, such as breakfast, lunch, and dinner, and the other involving a single context, like oatmeal studied in the context of breakfast.
ii.Whenever an item is introduced during the study phase, the model initially evaluates the item probe against the item traces previously stored in memory. It then proceeds to compare the source probe with the source(s) linked to the best-matching item trace. In this example, each repetition of the item, whether within the same context or a different one, results in an update of the correct memory trace. This update involves the probabilistic incorporation of new features into the vector – indicated by bold integers. Similarly, if an item is repeated within the same context, the corresponding source trace is probabilistically updated. On the contrary, when an item is repeated in new contexts, these contexts are appended to the existing item trace.
iii.Note that if the reiterated context is not sufficiently similar to what has been stored, an update might not occur. In that case, the reiterated context could be appended to the item vector as if it were a novel context. Similarly, if a novel context closely resembles the stored source traces, it could erroneously update one of the existing context traces. The example is simplified as everything goes correctly.
iv.Transitioning to the test phase, the figure depicts two tasks: (1) item recognition and (2) source recognition, in which an item is presented in one of the potential sources. In the initial task, an item probe is presented without an associated context. The model assesses this item probe by comparing it to the existing item memory traces, similar to the original REM model, without considering context traces. In the subsequent task, the model evaluates the source probe by comparing it to the source vector(s) of the best-matching item trace. In either task, the resulting odds ratios are compared with a set criterion for the judgments related to item and source.
The aforementioned processes of updating and introducing new source traces imply the occurrence of source differentiation (analogous to the differentiation found in item judgments), a prediction that can be empirically tested. In the first series of experiments, we tested this prediction by means of study-repetition manipulations that vary ‘source strength’ while keeping ‘item memory strength’ constant, analogous to some of the previous studies that motivated the development of REM (Ratcliff et al., 1990, Shiffrin et al., 1990). These experiments will enable us to evaluate the empirical adequacy of source differentiation as well as the alternative—although not mutually exclusive—explanation provided by a criterion-shift account (see Cary and Reder, 2003, Stretch and Wixted, 1998, Verde and Rotello, 2007, Starns et al., 2010, Starns et al., 2012).
留言 (0)