Repeated rock, paper, scissors play reveals limits in adaptive sequential behavior

People’s ability to adapt to others in adversarial interactions lies at the heart of sports and games and is a hallmark of popular stories; at a larger scale, it is crucial to negotiations and international relations. In these settings, examples of people’s creativity, flexibility, and strategic sophistication abound. For instance, tennis star Andre Agassi famously beat world-class opponent Boris Becker by recognizing that every time he served the ball, Becker unknowingly stuck his tongue out in the direction he was about to serve.1 On the other hand, adversarial reasoning often poses a number of challenges such as remembering previous decisions (Rapoport & Budescu, 1997), recursive reasoning about others (Stahl & Wilson, 1995), or planning many steps ahead (van Opheusden, Kuperwajs, Galbiati, Bnaya, Li, & Ma, 2023). These challenges have allowed artificial intelligence systems to beat human competitors in a range of adversarial settings, even those once thought to be far beyond the reach of strategic algorithms (Silver et al., 2016). How do people construct predictive models of an opponent in adversarial settings and what are the limitations of this ability? In the current work, we investigate a fundamental component of this problem: the ability to recognize exploitable patterns in an opponent’s actions over time. We ask what sort of structured behavior patterns people can exploit in others and which of these same patterns people can change in their own behavior to avoid exploitation.

The problem of predicting those around us extends beyond adversarial interactions; it is a core part of everyday social behavior. To make sense of others’ actions, people rely on an intuitive theory of other minds in which behavior is understood as arising from an actor’s underlying goals, desires, and beliefs about the situation (Dennett, 1989, Goodman et al., 2006, Gopnik and Wellman, 1992, Premack and Woodruff, 1978). Recent work has argued that a core mechanism for theory of mind reasoning is a principle of efficient action (Baker et al., 2017, Baker et al., 2009); we predict that others will act in ways that maximize rewards while minimizing costs and we infer their corresponding goals and beliefs on the assumption that they choose utility-maximizing actions (Jara-Ettinger et al., 2016, Jara-Ettinger et al., 2020). While this work has primarily involved predicting and reasoning about behavior in cooperative settings, the assumption that others are approximately rational planners can also facilitate determining who is a friend and who is a foe in settings where helping and hindering are both possible (Kleiman-Weiner et al., 2016, Serrino et al., 2019, Ullman et al., 2009).

However, adversarial contexts in which one person seeks to gain at the other’s expense pose a challenge for traditional theory of mind reasoning. In its simplest form, this challenge is often explored using zero-sum economic games (Schelling, 1960). In these settings, an actor’s utility-maximizing choice is tied to what they believe their opponent will do; but the opponent’s strategic choice depends in turn on the predictions they make about their own opponent. In cooperative tasks this reasoning may allow people to quickly align on their best option, but in adversarial interactions, this process of guessing the opponent’s beliefs can be infinitely recursive (Binmore, 1987, Stahl and Wilson, 1995). A standard solution to this problem has been to propose that people rely on limited levels of recursive reasoning about an opponent’s beliefs when trying to predict them (for example, what does my opponent think that I think is the best move?). Models in which individual decision-makers are presumed to vary in their depth of recursion capture behavior in a range of adversarial games and suggest that people rarely reason past one or two levels of recursion (Camerer et al., 2003, Camerer et al., 2004, Costa-Gomes et al., 2001, Stahl and Wilson, 1995).

These results further suggest that when trying to anticipate an opponent’s behavior, there may be persistent features of their adversarial reasoning such as their recursion depth that support future prediction (Guennouni and Speekenbrink, 2022, Ho et al., 1998). Broadly, repeated games, in which players face off against a stable opponent over multiple rounds, present an opportunity for each player to learn a predictive model of their opponent based on past behavior (Akata et al., 2023, Aumann, 1985, Camerer, 2011, Mertens, 1990). While repeated games have been critical to theories of human cooperation (Axelrod, 1984, Mertens, 1990, Rand and Nowak, 2013), they may also inform our understanding of adversarial opponent modeling. For example, Guennouni and Speekenbrink (2022) find evidence that people not only infer an opponent’s recursion depth over repeated interactions but use this information to predict their later actions in a similar game. Further, paired with a stable opponent in games with varying reward structures (for example, Prisoner’s Dilemma and Stag Hunt), people infer abstract motives such as greed and risk-aversion that allow them to predict their opponent’s moves across games (van Baar, Nassar, Deng, & FeldmanHall, 2022). Thus, when trying to predict an opponent’s behavior in repeated adversarial interactions, people may draw on generalizable features of their prior actions (such as their recursion depth or greediness) that support prediction in the current context.

However, abstractions such as recursion depth are not the only source of predictive information available to reasoners trying to outwit an opponent. In repeated games, a key source of predictive structure may come from sequential patterns in an opponent’s actions themselves. Consider the example at the outset in which Andre Agassi learned to predict the direction of Boris Becker’s serve using a reliable cue in the moments before serving. In fact, efforts to predict an opponent using sequential patterns in their behavior—and corresponding counter-measures to evade such prediction—are widespread at the highest levels of competition: baseball players look for patterns in how a pitcher will throw the ball based on their previous pitches, poker players make inferences about other players’ hands from their nonverbal behaviors, and hockey players watch for telltale signals that players on the other team may be nursing injuries based on the way they skate. In any setting where behavior can be predicted using patterns in prior actions, efforts to detect and exploit this signal in an opponent’s behavior while avoiding it in one’s own actions may be central to adversarial reasoning.2

This ability to learn from structured, sequential patterns has been studied in a range of domains outside adversarial reasoning about others. For instance, early language acquisition is thought to be supported by statistical learning of transition patterns in phonemes beginning as infants (Saffran et al., 1996, Saffran et al., 1999); adults exhibit similar learning in more abstract auditory and visual domains (Frost et al., 2019, Turk-Browne et al., 2005). Given repeated practice, people can learn motor patterns consisting of as many as 10 items without advance knowledge that the stimulus contains a repeating pattern (Clegg et al., 1998, Nissen and Bullemer, 1987). Most importantly, our commonsense understanding of everyday behavior seems to draw in part on observable sequential patterns. First, people have an intuitive understanding of when an action is best explained by habit that relies on the frequency of the behavior and whether it is adaptive to the context at hand (Gershman, Gerstenberg, Baker, & Cushman, 2016). Beyond tracking mere repetition of behavior, people are able to predict others’ actions and emotional states on the basis of previous actions and emotions (Thornton and Tamir, 2017, Thornton and Tamir, 2021). Thus, the ability to predict actions and events in the world around us using learned sequential patterns has been richly investigated in a range of cognitive domains. However, the role of such inferences in adversarial reasoning remains poorly understood. What kinds of sequential patterns can people use to exploit an opponent and how well can people avoid such patterns in their own actions?

Mixed strategy equilibrium (MSE) games offer an ideal paradigm for investigating the ability to adapt to an opponent based on their sequential actions (Brockbank and Vul, 2021a, Guennouni and Speekenbrink, 2022, Shachat and Todd Swarthout, 2004, Spiliopoulos, 2013). In MSE games, each strategy or action is dominated by another; a player who persists predictably with any one action or sequence of actions can be exploited based on that predictability. For this reason, Nash Equilibrium (Nash, 1950) play in repeated MSE games entails choosing among available moves randomly; any non-random dependency in one’s behavior is exploitable by a rational opponent. The most well-known MSE game, and the one we focus on in the current work, is the children’s game of rock, paper, scissors (RPS), in which “rock” loses to “paper”, “paper” loses to “scissors”, and “scissors” loses to “rock”. In repeated games of RPS, a player’s task is to develop a predictive model of their opponent; if their opponent chooses randomly, this is (by definition) impossible in the long run. But against any non-random opponent, the ability to win systematically depends on how well the player can identify exploitable patterns in their opponent’s moves while avoiding similar exploitation based on their own choices (Brockbank and Vul, 2021a, Guennouni and Speekenbrink, 2022).

A large body of work has sought to outline the kinds of patterned regularities people exhibit when attempting to behave randomly. When asked to generate example sequences of random events like coin tosses or evaluate sequences for their randomness, people appear to rely on simple biases like an over-representation of alternations relative to repeats (Bar-Hillel and Wagenaar, 1991, Lopes and Oden, 1987, Tversky and Kahneman, 1972). These biases of subjective randomness are so ingrained they arise even in decisions by professional athletes, who are highly incentivized to avoid such predictability (Palacios-Huerta, 2003, Walker and Wooders, 2001). Given this finding, it is perhaps unsurprising that in repeated MSE games, people show the same underlying biases in move selection (Budescu and Rapoport, 1994, Rapoport and Budescu, 1992). In fact, recent work has shown that the patterns exhibited by human players paired with other humans over many rounds of rock, paper, scissors extend far beyond those associated with subjective randomness; people’s moves show predictable regularity based not only on their own previous moves, but on their opponent’s moves and on previous outcomes (Batzilis et al., 2019, Brockbank and Vul, 2020). Because people exhibit robust sequential patterns in rock, paper, scissors and other MSE games, this offers an ideal venue in which to explore the corresponding ability to adapt to and exploit such patterns in adversarial interactions.

However, determining which patterns people detect in others’ actions or their own can be hard to isolate in play between humans because of the dynamic nature of dyadic play (Spiliopoulos, 2013). Instead, repeated matches between humans and algorithmic bot opponents that exhibit stable patterns in their moves provide a controlled environment for testing people’s ability to exploit such patterns (Brockbank and Vul, 2021a, Zhang et al., 2021). For example, when paired with rock, paper, scissors opponents that favor a particular move (e.g., playing “rock” in 70% of rounds), people typically learn to exploit them so long as the bias is sufficiently strong (Kangas et al., 2009, Lie et al., 2013). Recent work has shown that people can also exploit opponents that exhibit patterns in their choices based on their own previous move or their human opponent’s previous move (Guennouni and Speekenbrink, 2022, West and Lebiere, 2001). Beyond previous moves alone, people are sensitive to the role that the prior outcome plays in determining an opponent’s moves (Dyson et al., 2020, Zhang et al., 2021), though adaptation to these patterns in prior work has been somewhat limited. Finally, this behavior appears to reflect ongoing adaptive reasoning, even as opponent strategies or the games themselves change (Guennouni and Speekenbrink, 2022, Stöttinger et al., 2014). Taken together, these results indicate that in repeated MSE games like rock, paper, scissors, people exhibit flexible strategic reasoning aimed at exploiting patterns in opponent behavior.

However, these findings paint an incomplete picture of the scope of people’s adaptive reasoning about sequential opponent behavior. For one, prior research using bot opponents in MSE games has addressed different questions from the one we focus on here, such as whether responses to gains and losses show different neural and behavioral signatures (Dyson et al., 2020, Dyson et al., 2018, Forder and Dyson, 2016). For such questions, it has not been necessary to compare behavior against a broad swath of bot strategies and existing work has employed only one or two distinct opponent strategies. Furthermore, findings across these studies are based on varied experimental conditions, making comparison difficult. Thus, prior work has not systematically explored the range of sequential opponent behaviors that a player might adaptively respond to. Finally, exploiting patterns in an opponent’s behavior is only part of the puzzle. A central challenge for players in repeated MSE games is avoiding any detectable patterns in their own moves. Once again, this question can be fruitfully investigated by pairing participants with bot opponents that exploit patterns in the participants’ moves (Brockbank and Vul, 2021b, Eyler et al., 2009, Moisan and Gonzalez, 2017, Spiliopoulos, 2013, West and Lebiere, 2001), yet prior work using such a paradigm has been limited and has not explained how such behavior informs broader questions about adversarial reasoning.

The current work uses rock, paper, scissors to develop a systematic and comprehensive account of how people adapt to sequential patterns in an opponent’s behavior and their own. Which patterns in their opponent’s actions can they successfully learn and which ones are out of reach? And how well can people avoid being similarly exploited? Rock, paper, scissors represents an idealized environment for addressing these questions. The patterns that a player exhibits in their moves can be precisely spelled out, the complexity of these patterns can be formally described, and the extent to which a player exhibits any given pattern in their moves can be quantified (Brockbank and Vul, 2021a, Dyson, 2019). Further, unlike other adversarial games in which people may exhibit exploitable patterns in their sequential behavior (e.g., chess), RPS involves little expertise. Instead, a player’s success at exploiting patterns in an opponent’s moves is a result of adaptive reasoning about the causes of their behavior within the immediate interaction context. Despite these advantages, no prior work has provided a systematic account of the patterns people can and cannot recognize and adapt to in this setting.

Here, we aim to address this shortcoming by investigating adaptive behavior over repeated rounds of rock, paper, scissors against an algorithmic “bot” opponent. In experiment 1, we pair participants with one of seven stable bots, each of which exhibits a different sequential dependency in its move choices. These dependencies vary in their underlying complexity, allowing us to precisely assess the degree to which people exploit different behavioral patterns dictating an opponent’s moves. We find that people are highly adaptive against opponents that exhibit simple transition patterns but show minimal adaptation to more complex opponents. In experiment 2, we ask whether these same limits hold for avoiding exploitable patterns in one’s own behavior. Participants were once again paired with a bot opponent, but this time each bot chose its moves by trying to exploit a unique pattern in the participant’s moves. Here, we examine people’s ability to adapt to their adaptive opponent. We find that people are successful against bots that track simple transition patterns in participant moves, but show little flexibility otherwise. Together, our results suggest that the hypothesis space of behavioral patterns people draw on in this setting to understand their opponent’s moves or their own is limited, but that adaptive reasoning is flexible within these limits.

留言 (0)

沒有登入
gif