What could have been said? Alternatives and variability in pragmatic inferences

Interpreting natural language meanings often involves not only interpreting what was said, but also what was left unsaid. A classic example of this kind of phenomenon is scalar inference (SI), exemplified in .

The literal meaning of an utterance like is . But such an utterance also brings to mind an alternative that could have been said: Mary ate all of the cookies. This serves as an alternative because some and all form a scale: all is informationally stronger than some, since a sentence like Mary ate all of the cookies entails Mary ate some of the cookies, but not vice versa (Horn, 1972). Hearers assume that speakers are trying to be maximally informative (following the Maxim of Quantity); therefore, if the alternative Mary ate all of the cookies were true, the speaker would have said that. Because she did not say it, hearers can infer its negation (following the Maxim of Quality). This reasoning process, combined with the utterance’s literal meaning, leads to the SI-enriched meaning in (Grice, 1967).

Going beyond the classic <some, all> example, a growing body of work looks at a larger range of lexical items that form a scale and potentially lead to SI. One such example, based on the <good, excellent> scale, is given in .

Similarly to , can also lead hearers to go beyond its literal meaning , via the same process of reasoning about an informationally stronger alternative. Specifically, hearers may reason about the stronger alternative The movie is excellent; because the speaker chose not to say this alternative, hearers can conclude that she must believe it to be false—leading to the SI in . But an influential experimental result from recent literature has revealed that lexical scales such as <some, all> vs. <good, excellent> differ substantially in how likely they are to lead to SI calculation: hearers calculate the some but not all SI much more robustly than the good but not excellent SI. In the first large-scale investigation of this inter-scale variation, van Tiel et al. (2016) tested 43 different lexical scales and found that the rate of SI calculation ranged from 4% to 100% (see also Baker et al., 2009, Beltrama and Xiang, 2013, Doran et al., 2012, Simons and Warren, 2018).

The finding of such robust variation is significant as it challenges the so-called uniformity assumption (van Tiel et al., 2016 p. 139): a (tacit) assumption in prior literature that since all instances of SI are derived via the same mechanism—e.g., by hearers’ reasoning that an informationally stronger unsaid alternative is not true—there should be no differences across scales. As van Tiel et al. (2016) have shown, however, instead of uniformity, we find scalar diversity. The finding of scalar diversity has since given rise to a research program of identifying parameters along which lexical scales differ, which can then predict how likely a scale is to lead to SI calculation, ultimately explaining inter-scale variation. Factors that have been put forward to explain scalar diversity make reference to properties of scales such as the distinctness of the weaker and stronger scale-mates (van Tiel et al., 2016), their semantic relatedness (Westera & Boleda, 2020), the polarity of adjectival scales (Gotzner et al., 2018), or whether the stronger alternative is an extreme adjective (Beltrama and Xiang, 2013, Gotzner et al., 2018). Scalar diversity has also been related to other semantic-pragmatic processes such as negative strengthening (Gotzner et al., 2018) or propensity for local enrichment (Sun et al., 2018); or to properties of the context, broadly construed (Pankratz and van Tiel, 2021, Ronai and Xiang, 2021a). However, most of the predictors identified in previous studies still only explain a small amount of the “diversity”. For example, van Tiel et al. found that the two components of distinctness, semantic distance and boundedness, accounted for 3% and 10% of the observed variance respectively.

One interesting hypothesis about scalar diversity focuses on the observation that there is substantial uncertainty regarding alternatives. To derive an SI from an utterance, hearers need to identify what is a relevant alternative to what was uttered, and then they also need to have good reasons to believe that the stronger alternative needs to be ruled out. Several authors have observed that it is not always obvious what the relevant unsaid alternatives are, whose negation can be inferred. Van Tiel et al. (2016) originally put this hypothesis in terms of alternative availability: for SI to arise, it has to be the case that the stronger alternative was available to the speaker, so she could have actually considered using it. While none of the empirical measures of alternative availability considered by van Tiel et al. turned out to be significant predictors of SI rates across scales, later work has provided supportive evidence for the role of alternative availability or uncertainty. In a recent study, Hu et al. (2023) (see also Hu et al., 2022) used large language models to test the hypothesis that hearers maintain cue-based expectations over alternatives (Degen and Tanenhaus, 2015, Degen and Tanenhaus, 2016). They found variation in how expected an alternative is as a scale-mate, and that these differences can predict both intra-scale (Degen, 2015), and most importantly for the present paper, inter-scale variation in SI rates. Concretely, the authors show that the more expected the stronger scale-mate, given a weaker scalar term and the sentential context, the higher the SI rate. Similarly, Ronai and Xiang (2022) have provided experimental evidence for van Tiel et al.’s notion of alternative availability. The authors conducted a cloze task in a discourse context, where participants saw dialogues such as “A: The movie is good.”; “B: So you mean it’s not BLANK.” and were asked to fill in the blank. The frequency with which the stronger alternative (here, excellent) was provided was taken to index the accessibility of alternatives, that is, how strongly the weaker scalar evokes the stronger alternative. Inter-scale variation in alternative accessibility was shown to predict inter-scale variation in SI rates. Moreover, Hu et al. (2023) found that this experimental data was significantly correlated with language model-based measures of alternative uncertainty, suggesting that “models and humans are aligned at the level of predictive distributions over alternatives” (p. 8).

Informative as they are, previous studies on how alternative uncertainty impacts inference calculation still leave open questions about the sources of uncertainty. One possibility we will examine in the current paper is that contextual relevance constrains the space of possible alternatives, and thus directly modulates the extent of scalar diversity. As we will discuss below, the effect of context is well-documented for SI calculation in general (Degen, 2013, Degen, 2015, Matsumoto, 1995, Van Kuppevelt, 1996). Pertaining to scalar diversity, there have also been suggestions that alternative uncertainty is contextually driven. McNally (2017), for instance, notes that due to polysemy, there might be context-based variation in what counts as a stronger alternative. While <warm, hot> form a scale based on asymmetric entailment, and hot is indeed a relevant stronger alternative to warm in the context of The weather is warm, this might not always be the case for The soup is warm (McNally, 2017 p. 23–24). This is because, as McNally argues, in the latter case, the choice of the adjective warm can be interpreted as referring to a kind of soup, in contrast to cold soups as a kind. Consequently, The soup is warm here means that it is a soup consumed warm (or possibly hot), not cold. Since hot is not necessarily a relevant alternative to warm in this case, the not hot SI may not arise. An informative study here is Pankratz and van Tiel (2021), which examined the contextual relevance of different SIs and found it to be a predictor of scalar diversity. The authors developed a corpus-based measure of relevance, whereby the more relevant an SI is, the more likely it is to occur in so-called scalar constructions, e.g., good but not excellent, good rather than excellent, or good, if not excellent. Since these all contain explicit mention of the stronger alternative, Pankratz and van Tiel’s measure is likely also tapping into the contextual relevance of the alternatives themselves; in fact, Hu et al. (2023) also relied on scalar constructions in their operationalization of alternative uncertainty. Further, even though Pankratz and van Tiel’s work aims to model “general relevance”, it follows the usage-based assumption that this can be approximated by averaging over different individual contexts found in a corpus. Their findings can therefore be interpreted as evidence that the contextual relevance of alternatives matters for (the variation in) SI calculation.

In this paper, we approach the issue of alternative uncertainty by manipulating the Question Under Discussion (QUD, Roberts, 1996/2012) prior to a target utterance. An explicit QUD makes the contextually relevant alternatives highly salient to comprehenders and therefore reduces the uncertainty associated with the identity of alternatives. As we will show in our experiments, this indeed substantially reduces the inter-scale variation in SI calculation. It is interesting to note, however, that reducing uncertainty about what a relevant alternative should be does not completely eliminate scalar diversity, raising questions about other sources of uncertainty. We argue that even when relevant stronger alternatives are made salient, the step of excluding those stronger alternatives does not automatically follow. As mentioned, under standard (neo-)Gricean accounts of SI, a hearer reasons about the relevant informationally stronger alternatives and then in the appropriate contexts makes the pragmatic move to exclude those alternatives. But depending on the context, there are other legitimate pragmatic moves that would not require the exclusion of the stronger alternative. That is to say, for the pragmatic calculation of SI, there could be uncertainty associated with both the identity of the relevant alternatives and the necessity of excluding stronger alternatives. To examine whether reinforcing the step of alternative exclusion would also lead to reduced scalar diversity, we will make use of the focus particle only, which grammatically requires the exclusion of alternatives (Krifka, 1999, Rooth, 1985, Rooth, 1992). We show that when both types of uncertainty are removed, scalar diversity can indeed be reduced to the minimum. In the next section, we elaborate on our two experimental manipulations that aim to reduce uncertainty.

留言 (0)

沒有登入
gif