Word length and frequency effects on text reading are highly similar in 12 alphabetic languages

Cross-linguistic studies of reading that use eye-tracking have so far accounted for a relatively small share of the thriving field of reading research (see the bibliometric analysis by Siegelman et al., 2022). Yet the need for cross-linguistic examination is recognized as essential for identifying universal and specific features of reading behavior, as well as describing the links between that behavior and properties of the reader, the text, the language, and the writing system (Frost, 2012, Li et al., 2022, Liversedge et al., 2016, Perfetti and Harris, 2013, Rayner, 1998, Seidenberg, 2011, see also Chomsky, 1993). Perhaps the most elaborate recent articulation of both the methodological goals and challenges of cross-linguistic research of reading is found in Laurinavichyute et al. (2019), which calls for the development of a common protocol for eye-movement corpora stimuli with a tightly matched coordination and administration of experimental materials, data collection procedures, and statistical analyses. One of the stated goals for this program of research is cross-linguistic quantification of the role played by the three major “benchmark” predictors of oculomotor control during reading – word frequency, word length, and word predictability in context (Laurinavichyute et al., 2019).

These benchmark predictors stand out of a much broader range of the word- and context-level linguistic properties that affect reading behavior (see reviews by Kliegl et al., 2004, Radach and Kennedy, 2013, Rayner, 1998, Rayner et al., 2012, among others). One reason is that they are robustly observed across studies and populations of different age and ability level, see reviews cited above and early reports (Balota et al., 1985, Drieghe et al., 2005, Inhoff and Rayner, 1986, Rayner and Well, 1996, Schilling et al., 1998, Schroeder et al., 2015, Vitu, 1991, Vonk et al., 2000, Zola, 1984). More frequent, shorter and more predictable words are processed with less cognitive effort (e.g., elicit more skips and fewer regressions, and are fixated fewer times and for shorter durations) than less frequent, longer or less predictable words. The effects of these benchmarks are also robust across languages and writing systems, see e.g., Laurinavichyute et al. (2019) for the review of evidence from Chinese, Dutch, French, Hindi, Japanese, Spanish, and Uighur. A second, related reason for the focus on word frequency, length, and predictability is that all or some of these three linguistic parameters are part and parcel of all known computational models of reading (e.g., Coltheart et al., 2001, Perry et al., 2007, Seidenberg and McClelland, 1989), including models of eye-movement control in reading (e.g., Engbert et al., 2005, Rayner et al., 2004, Reichle, 2021, Reichle et al., 1998, Reilly and Radach, 2006). A typical model relies on empirical data from one language and seeks to simulate linguistic effects as found in that language. Understanding how the benchmark effects manifest cross-linguistically sheds light on whether or not the model’s architecture can readily generalize over multiple languages without language-specific adjustments.

This paper reports a comparative cross-linguistic study on the presence and variability of two lexical benchmark effects – word frequency and length – on reading in 12 typologically diverse alphabetic languages. In the remainder of the Introduction, we provide a brief overview of research done on lexical benchmark predictors of oculomotor control in reading, highlight the role that they play in several theoretical accounts of word recognition and reading, and define the aims and the analytical approach of this study.

Word frequency of occurrence is an estimate of experience that the reader has with the given word. Perhaps the most common view of the word frequency effect is that it is a learning effect (Brysbaert et al., 2018). The general direction of the word frequency effect is well-attested across languages: The more exposure the reader has to the word, the less effort is required for its recognition (see reviews by Brysbaert et al., 2011, Brysbaert et al., 2018, Kennedy et al., 2013, Rayner, 1998, Rayner and Raney, 1996). Our cross-linguistic inquiry into the magnitude of the word frequency effect on reading behavior is equivalent to asking whether a unit of exposure to the word grants the same processing benefits to readers of different languages. We ask whether the link between reading experience and reading effort is universally strong or is modulated by specifics of the language and the writing system.

Both positions – universality and language-specificity of reading behavior – are supported by prior literature. On the one hand, readers of all writing systems have equal cognitive capacity for learning, perception, inference and equal (neuro)physiological capacity for visual uptake and information processing (Seidenberg, 2011). Also, the process of reading is universal in the sense of inferring meanings from print that encodes oral language (Verhoeven & Perfetti, 2022). This equality is aligned with the null statistical hypothesis: An increase in one unit of word frequency would lead to the same average decrease in the word’s processing effort in adult proficient readers of all languages under comparison.

Alternatively, it is possible that differences in organizational principles of writing systems (e.g., with a unit of writing mapping to a phoneme, a morpheme, a syllable, or a mixture of these units) can affect learning differently (Verhoeven & Perfetti, 2022). Even within alphabetic systems, wide cross-linguistic variability is observed in orthographic depth, i.e., how regular and consistent the mappings are between letters and phonemes of the given written language (Ziegler & Goswami, 2005). A classic example of this variability and its connection to learning is the extreme differences in the accuracy and fluency of word and nonword reading among elementary school students in European countries. In their seminal study of 13 alphabetic languages, Seymour et al. (2003) observed that learners of orthographically deep languages where the mapping between sounds and letters is complex and inconsistent (e.g., English) develop their reading at about half the speed compared to learners of orthographically shallow languages where such mapping is highly regular and simple (e.g., Finnish). Children in Seymour et al.’s study were matched on their grade level and thus were likely to have a similar exposure to printed materials in their respective languages, e.g., similar written word frequency counts. Yet the negative correlation between orthographic depth and basic reading performance in children was observed even when controlling for the years of education and socio-economic status (see also Ziegler et al., 2010). Below we review theoretical accounts that link cross-linguistic variability in orthographic depth with both word frequency and word length effects.

The analytical strategy for a comparative examination of the word frequency effect is as follows. First, we estimate this effect in regression models fit to the joint dataset including all languages: Each model has one of the select eye-movement measures as a dependent variable and several independent variables (defined below). The critical interaction between language and word frequency in the regression model enables us to both quantify the effect size for each language and formally estimate whether this effect size varies across languages. In these comparisons, we place more weight on the practical importance of variability in eye-movement measures rather than its statistical significance, in line with Gelman and Stern (2006). Given the very large sample sizes in our data, the confidence intervals in the models are very narrow and statistically significant differences between compared quantities are at times indicated even when these differences are close to the threshold of the eye-tracker’s accuracy and are of no practical consequence.

If we observe cross-linguistic differences in effect sizes that are practically unimportant, we conclude that a similar amount of exposure and the respective learning lead to the same advantage in the reading effort for readers of all languages under comparison. In this scenario, limited to proficient readers of alphabetic languages, the impact of word learning on reading behavior would emerge as universal not only at the conceptual level, but also numerically. It would indicate that identical availability of cognitive and physiological resources for readers of all languages afford a similar advantage to linguistic experience, which transcends theoretical and empirical differences between structures of oral and written languages. If cross-linguistic differences in word frequency effects are substantial, we reject the null hypothesis and conclude that the amount of learning through experience is contingent on the individual written language. Exploring this scenario further would involve correlating the effect sizes per language with the language-specific estimates of orthographic depth and other relevant characteristics. A final possibility is that variability in the word frequency effect size across languages is both substantial and non-systematic (random). The nature of this variability would determine what recommendations we can make as to how to adjust the computational model’s parameter for written languages with specific characteristics.

Across all languages studied by researchers of reading, longer words encode more linguistic units (visual features, characters, sounds, or morphemes) and thus take more effort to process, resulting in longer reading times, increased number of fixations, refixations and regressions, and a lower skipping rate (see reviews by e.g., Barton et al., 2014, Hautala and Loberg, 2015, New et al., 2006). Yet how much additional processing complexity an average character represents in a given written language is strongly determined by the relationship between the language’s orthographic, phonological and morphological units. A typical character encodes a morpheme or a word in a logographic system (e.g., Chinese, Japanese Kanji), a syllable in an alpha-syllabary or abugida systems (e.g., Devanagari scripts including Hindi), a consonant in an abjad (e.g., Arabic, Hebrew), or either a consonant or a vowel in an alphabet (e.g., Finnish, English; Daniels & Bright, 1996). The size of the phonological unit that a character encodes in the given system determines both the number of different characters to be learned (the largest in logographic scripts and the smallest in alphabetic) and orthographic length of the word (the smallest in logographic scripts and the longest in alphabetic), see statistics in Perfetti and Harris (2013).

Furthermore, a long-standing and fruitful research line that focuses on the relationship between oral and written language strongly suggests that cross-linguistic differences reflected in word lengths are not accidental (e.g., Frost, 2012, Perfetti and Harris, 2013, Piantadosi et al., 2011, Seidenberg, 2011, Verhoeven and Perfetti, 2022, Ziegler and Goswami, 2005). Much of this research expresses the idea that “languages get the writing system they deserve” (coinage of this phrase is attributed by Perfetti and Harris, 2013, to Halliday, 1977). Oral and written languages are intertwined in a way that approaches the grapholinguistic equilibrium (Seidenberg, 2011), wherein relative complexity of the oral language (mainly expressed in the complexity of its morphological system) is compensated by relative complexity of decoding the written language (i.e., orthographic depth). Languages with highly developed inflectional morphology (e.g., Turkish, Finnish) have shallow orthographies: The complexity of learning and processing thousands of morphological variants derivable from the same stem is compensated by the ease of deriving sounds of such languages from their print. Conversely, languages with deep orthographies (e.g., English, Chinese) tend to have impoverished morphological systems: The effort of decoding characters into sounds using inconsistent and complex mappings is compensated by the ease of learning a very small pool of morphological variants (e.g., Seidenberg, 2011; see also Frost, 2012, and related comments). Since richness of inflectional morphology often reflects in longer words (due to affixation), the trade-offs described above tie orthographic depth of a written language with orthographic word lengths in that language.

The resulting cross-script and cross-linguistic differences in distributions of word lengths have well-documented implications for attentional and oculomotor aspects of reading, see for instance Liversedge et al. (2024) and Liversedge et al. (2016) for a comparison of Chinese, English, and Finnish. Thus, readers of languages written in, say, the alphabetic Roman script tend to show a larger perceptual span, land further into the word and have longer saccades than the readers of the Hebrew abjad and especially than the readers of logographic scripts like Chinese (for review see e.g., Rayner, 1998). Cross-linguistic differences in word length distributions have behavioral expressions even when considering alphabetic languages only. In this context, we propose a cross-linguistic comparison of the word length effect size as a study of whether letter as a unit of orthographic complexity (and the corresponding phonological and morphological complexity) affects the processing effort in readers of different languages to the same degree.

The present study only examines two types of writing systems – abjad (unpointed Hebrew) where most vowels are not expressed orthographically, and alphabetic (remaining 11 languages in Latin, Cyrillic, or Greek scripts). Yet even within written languages sharing the alphabetic principle, differences in word length distributions are behaviorally consequential. Languages with longer words (e.g., Finnish, Turkish) have lower rates of word skipping (e.g., words that are not fixated by the eye-gaze during reading; Siegelman et al., 2022) and elicit saccades that are both longer and land further into the word (Kuperman, 2022), as compared to languages with shorter words (e.g., English or Spanish). But behavioral patterns that are driven by the language-wide word distributions are logically and mathematically separate from the behavioral effects related to differences in word lengths within a language. For instance, Finnish has lower word skipping rates than English but one additional letter in a Finnish word may increase the skipping rate by the same amount as an additional letter in an English word. Mathematically, this example corresponds to the difference between language-specific intercepts and slopes of the word length effect. Regression analyses in the present study estimate both intercepts and slopes of this effect within and across languages.

Our present examination follows the same rationale as in the analysis of word frequency. A regression model that contains the critical interaction between word length and language (along with other predictors) enables us to quantify both the estimated word length effect for each language and the degree of cross-linguistic variability in this effect size. As argued above, we place more weight on the practical importance of an effect rather than their statistical significance. If an increase in one letter corresponds to an increase in the processing effort that is similar in practical terms across languages (e.g., a similar increase in reading times, a similar decrease in skipping rate), the conclusion is that orthographic complexity has the same impact on readers of all languages under examination.

Yet, as discussed above, prior literature lends some support for the alternative hypothesis, i.e., the word length effect size varies across languages. First, there may be a correlation between the intercepts and the slopes of the word length effect. It is expected and known that readers of languages that feature longer words spend more effort reading an average word of their language than their counterparts from a shorter-word language (the difference in intercepts). It may also be the case that they need to invest less effort into each additional letter because of their experience with longer words than the readers of shorter-word languages (the difference in slopes). Also, some models of word recognition (discussed in detail below) suggest that the word length effect is stronger in orthographically deep languages compared to languages with shallower orthographies.

As stated above, both word frequency and length effects are part and parcel of leading theories of isolated word recognition and reading. Below, we consider predictions of these models for cross-linguistic variability in benchmark effects and the utility of our analytical approach for validating these models.

Ample empirical evidence suggests that differences in orthographic depth cause the magnitude of the word frequency and length effects to systematically differ between languages, for evidence from children and adults see (e.g., Ellis et al., 2004, Katz and Frost, 1992; Liversedge et al., 2024; Liversedge et al., 2016, Schmalz et al., 2015, Schroeder et al., 2022). Several theories provide a theoretical background for this link. Perhaps the most clear-cut exposition comes from the influential Dual Route Cascaded (DRC) model of visual word recognition and reading aloud (Coltheart et al., 2001). The DRC model posits that readers can access sounds and meanings of known words by “sight”, that is, by looking those up in the long-term lexical memory: This is the lexical route to word recognition. While unknown words are not represented in the reader’s memory and cannot be looked up, they can be decoded from print and read aloud. The sublexical route that underlies this ability relies on the grapheme-to-phoneme conversion rules or statistical correspondences that the reader learns while acquiring the written language. Applying these rules makes it possible to assign a phonological value to unseen orthographic items.

The hallmark of the lexical route is the word frequency effect. Words that are encountered or used more frequently have a higher activation level in one’s memory and are easier to retrieve by sight. In the architecture of the DRC model (Coltheart et al., 2001) and its successor, the connectionist dual process model CDP+ (Perry et al., 2007), frequency of occurrence of each word is stored with the respective lexical node, and thus is only accessible if the reader selects the lexical route to accessing sounds and meanings of the word. Conversely, what characterizes the sublexical route is the word length effect. Longer words have more graphemes to map onto phonemes, and typically, in complex non-linear ways, they thus elicit more effort. Thus, effects of word frequency and length are critical and separable indices for the proposed two routes of word processing, so much so that their presence in experimental data is commonly taken as a signature of respective component processes of word identification (Hasenäcker and Schroeder, 2022, Hawelka et al., 2010).

Importantly for our purposes, reliance of the reader on one route vs the other is not categorical. The relative use of the two routes has been argued to vary depending on overall language characteristics, and primarily its orthographic depth (Katz and Feldman, 1983, Ziegler et al., 2001). Specifically, the dual-route perspective predicts that orthographically shallower languages typically show stronger word length effects because sublexical information is more reliable and phonological information is easier to retrieve from print. Conversely, word frequency effects and the use of the visual lexical route is stronger in orthographically deep languages, since the sublexical route leads to increased processing effort due to inconsistent and complex letter-phoneme correspondences (see detailed theoretical motivation in Perry & Ziegler, 2002). On this hypothesis, one expects reading behavior to demonstrate cross-linguistically different sizes of the word frequency effect, systematically related to orthographic depths of the written languages under study. Finding no practically important variability in the word frequency effect size in naturalistic reading of connected texts would go against influential models of word recognition based on single-word reading tasks. Similarly, the DRC model of word recognition (Coltheart et al., 2001) proposes word length as a marker of its sublexical route, e.g., serial decoding of unknown or less known words from letters into sounds. Orthographic depth of the written language is argued to shift the reliance on the phonologically driven sublexical route (preferred for more shallow orthographies) and visually driven lexical route (preferred for deeper orthographies, see Perry & Ziegler, 2002). Thus, the DRC model predicts cross-linguistic differences in the word length effect size that are systematically related to orthographic depths of languages under consideration. Whether or not this prediction holds when words are recognized as part of connected text rather than in isolation is a relevant test for the model’s ecological validity.

Moving beyond recognition of isolated words, we expect this cross-linguistic study of the word frequency and length effects to advance computational models of eye-movement control in reading. While typically trained on one language, it is important to test whether architectures underling such models can be generalized over other languages and how such generalization would change their parameter space. For example, the E-Z reader serial-processing model of eye-movement control (Reichle et al., 1998) incorporates word frequency in the estimation of reading times for both the first (L1) and the second (L2) stages of lexical processing for word n that is not predicted from context before it is foveatedt(L1)=α1+α2lnfrequencyn+α3predictabilitynt(L2)=Δ[α1+α2lnfrequencyn+α3predictabilityn]where α1, α2, and α3 are free parameters of the model.

The parallel-processing model of eye-movement control SWIFT (Engbert et al., 2005) employs a different architecture but assigns a similar role to word frequency as a predictor of word difficulty (Ln):Ln=α(1+βlogfrequencynF)where α is the intercept, β the coefficient of the frequency effect slope, and F a scaling constant.

Within these two (and mathematically similar) architectures, the present investigation can help determine whether a single value for the E-Z Reader model’s parameter α2 of the SWIFT model’s parameter β will be satisfactory for every alphabetic language that the respective model is applied to, or whether each written language will require an individual parameter adjustment. It can also suggest whether such an adjustment is systematic, possibly representing a function of the language’s orthographic depth or other structural characteristics, see argument above.

Furthermore, both the serial-processing architecture (E-Z Reader) and the parallel-processing one (SWIFT) incorporate word length in their calculations of eccentricity (distance) of a word to the current fixation position (equations not shown in the interest of space). In both architectures, eccentricity determines availability of the word’s letters for a foveal inspection or, for more remote letters, a parafoveal preview and contributes to the latency of word recognition and saccadic planning and execution. Also, in both architectures the effect of word length-related eccentricity is modulated by one (E-Z Reader) or two (SWIFT) free parameters (which has separate parameters for left and right eccentricity). Thus, if the respective models were to generalize over multiple languages, this study can determine whether this parameter can have a constant value or needs to be adjusted (possibly, in a systematic way) for each individual language.

This paper reports a cross-linguistic study of two lexical benchmark predictors – word frequency and length – on reading in 12 typologically diverse alphabetic languages, based on the Multilingual Eye Movement Corpus (MECO, see Siegelman et al., 2022). Specifically, we make use of the MECO eye-movement record on text reading in the participant’s first language (L1). At the core of this paper are comparative analyses of the effects that word frequency and length demonstrate in eye-movements during reading within and across languages. We tackle these two benchmark effects because they go to the core of the coordinated interplay between physiological, psychological and cognitive abilities and processes implicated in reading. Word frequency effect is a metric of the reader’s familiarity with the word and the amount of learning that the reader may have had through exposures to the word. Word length effect taps into the impact that linguistic complexity (approximated through the number of orthographic units) has on the processing effort. In this regard, the project is a descendant of the foundational empirical work initiated in the field of eye-movement control in reading some 40–50 years ago and further replicated and developed since. The early studies (including those cited above) aimed to establish whether lexical properties such as word length and frequency influence oculomotor patterns when reading written language, which required demonstrating that oculomotor control has a cognitive component and is partly driven by the properties of the read text. This influence has been firmly established. The current paper contributes to a more recent research program (Laurinavichyute et al., 2019, Li et al., 2022, Liversedge et al., 2016; Liversedge et al., 2024), which gives this inquiry a cross-linguistic spin. We ask questions about the interplay between characteristics of the reader, the read text and the language: Do structural properties of the written language modulate the impact of experience and of linguistic complexity on the readers of that language? If they do, what are these properties?

To our knowledge, this question has not yet been asked in the eye-tracking literature. Yet it has consequences both for attaining cross-linguistically generalizable computational models of eye-movement control in reading and telling apart the universal and language-specific facets of the link between language structure and language behavior (Frost, 2012, Li et al., 2022, Liversedge et al., 2016, Perfetti and Harris, 2013, Seidenberg, 2011, Siegelman et al., 2022). In particular, we aim to supply the modeling field with numeric estimates of effect sizes which can determine whether the parameter space defined in the current models requires a language-specific adjustment or may remain the same. This study is also relevant for testing cross-linguistic generalizability of influential models of word recognition (Coltheart et al., 2001, Perry and Ziegler, 2002) using both more ecologically valid stimuli than isolated words and a larger range of written languages.

In addition, we argue that this multi-lab project satisfies the methodological requirements summarized in Laurinavichyute et al. (2019) by using highly comparable populations (undergraduate students in respective countries), texts (either translated or controlled for genre, readability and complexity), as well as the shared procedure (with minor variations) of data collection, automated data cleaning, and data processing and analysis, see the Methods for details. As a by-product of this study, we supply the open-access MECO database with estimates of length and frequency for all words in the 12 language samples.

留言 (0)

沒有登入
gif