Optogenetic Disruption of the Prelimbic Cortex Alters Long-Term Decision Strategy but Not Valuation on a Spatial Delay Discounting Task

An animal’s success in foraging depends on its ability to process information from the environment and to use that information to best direct its actions toward reward. Current theories suggest that behavior arises from the interaction of multiple decision-making systems, each with its own complex computational processes and neural components (O’Keefe & Nadel 1978; Daw, Niv, & Dayan 2005; Redish 2013; van der Meer et al. 2012). Two of these systems are the deliberative system (also known as model-based or goal-directed decision-making, Gilbert and Wilson, 2007, Johnson and Redish, 2007, Niv et al., 2006, Redish, 2016) and the procedural system (also known as model-free or habitual decision-making, Hull, 1943, Daw et al., 2005, Niv et al., 2006, Graybiel, 2008). Deliberative processes entail planning strategies that consider future outcomes (Redish 2016), while procedural processes use learned associations between situations and actions (Graybiel, 1998, Dickinson, 1994, Barnes et al., 2005). As such, deliberative/planning systems are sensitive to contingency and are better suited to drive behavior under conditions that are flexible and changing, whereas procedural/habitual systems are sensitive to action-contiguity and drive behavior in familiar, stable conditions (Balleine et al., 2007, McLaughlin et al., 2021). However, the ways in which these distinct decision systems interact to produce dynamic behavior remains an ongoing area of research.

The combination of sophisticated task paradigms with neuromodulation techniques have spurred new directions for connecting complex behavior to neural mechanisms. An important strategy marker, the behavioral event of vicarious trial and error (VTE) has been found to correlate with neurophysiological deliberation during decision-making (Johnson and Redish, 2007, Amemiya and Redish, 2016, Papale et al., 2016, Kay et al., 2020) and to vanish with behavioral automation evidenced by habit learning (van der Meer et al., 2012, Gardner et al., 2013; Smith & Graybiel 2013). The medial prefrontal cortex (mPFC) has long been associated with both the deliberative (Fuster, 1997, Killcross and Coutureau, 2003, Rich and Shapiro, 2007, Kesner and Churchwell, 2011) and procedural (Jog et al., 1999, Coutureau and Killcross, 2003, Barnes et al., 2005, Smith and Graybiel, 2013b, Barker et al., 2017;) decision systems as well as in their interaction (Ragozzino et al., 1999, Miller and Cohen, 2001, van Aerde et al., 2008, Heidbreder and Groenewegen, 2003). Previous work suggests a broad role for mPFC in modulating complex, dynamic decision-making (Dalley et al., 2004, Kesner and Churchwell, 2011, Laubach et al., 2018, McLaughlin et al., 2021).

The intricate anatomical connectivity mPFC has with areas such as hippocampus (Jones and Wilson, 2005, Peyrache et al., 2009, Adhikari et al., 2010, Bett et al., 2012, Hok et al., 2013, Ito et al., 2015, Guise and Shapiro, 2017, Schmidt et al., 2019), ventral striatum (Floresco et al., 1997, Euston et al., 2012) and orbitofrontal cortex (Chudasama and Robbins, 2003, Sul et al., 2010) makes it a key region of interest for studying reward valuation, action selection, and task representation. Behavioral experiments have investigated its functional and anatomical complexity in rodents through economic tasks that use indicators, like VTE (Bett et al., 2012, Gardner et al., 2013, Schmidt et al., 2013, Redish, 2016, Kidder et al., 2021) and habit formation (Coutureau and Killcross, 2003, Smith and Graybiel, 2013a) to distinguish deliberative from procedural strategies (Papale et al., 2012, Powell and Redish, 2016, Sweis et al., 2018) while targeting specific subregions within the mPFC. Of its three known subregions (ACC, PL, and IL), prelimbic disruption has been associated with deficits in goal-directed behavior (most notably in VTE reduction (Schmidt et al., 2019, Kidder et al., 2021), and recordings from the prelimbic region suggest a role in processing information relevant to environmental changes indicative of recognizing a need for a behavioral strategy shift (Peyrache et al., 2009, Durstewitz et al., 2010, Powell and Redish, 2016, Barker et al., 2017). By selectively disrupting PL at the choice point of a spatial decision-making task that can measure valuation and delays to reward, we aimed to investigate how deliberative deficits contribute to strategy changes across different time scales.

Delay Discounting describes a reduction in the perceived value of a reward as the temporal distance (the delay) to that reward increases. Increases in delay discounting have been linked to impulsivity and found to be a risk factor for addiction and other disorders (Mischel et al., 1972, Madden et al., 1997, Giordano et al., 2002, Odum et al., 2002, Mitchell, 2004, Madden and Bickel, 2010, Lempert et al., 2019), making it an interesting metric to investigate maladaptive decision-making behaviors. Moreover, behavioral training designed to reduce discounting rates has been found to reduce addictive relapse (Stein, Daniel, Epstein, & Bickel 2015).

The Spatial Adjusting Delay Discounting task adapts the Mazur adjusting delay procedure (Mazur 1997) to a T-maze that builds on the naturalistic behavior of rats to alternate between foraging options (Papale et al. 2012). The Spatial Adjusting Delay Discounting task is a neuroeconomic task that requires subjects to repeatedly make left vs. right choices between a small, immediate reward on one side and a larger one that will only be delivered after a variable delay once the subject arrives at the reward location on the other (Fig. 1A). In this task, the delays change based on the rat’s choices – choosing the smaller-sooner reward decreases the delay to the larger-later reward by 1s, while choosing the larger-later reward increases its delay by 1s (Fig. 1B). Thus, the subject can control the delay ‘cost’ of the larger reward by making choices on this task - titrating the delay up or down depending on its left/right choice proportion.

Well-trained rats exhibit three distinct phases over the course of a session in the Spatial Adjusting Delay Discounting Task (Papale et al., 2012, Bett et al., 2015, Kreher et al., 2019). Rats first show an investigation phase marked by alternation between the two sides, presumably to assess the parameters of the task for a given session (which side is delayed, how much reward is delivered on the delayed side, and what is the start delay). Rats then typically show a titration phase, in which they repeatedly run more laps to one side or the other, which drives the delay up (consecutive delay side laps) or down (consecutive non-delay side laps). Once the delay has reached the rat’s individual willingness to wait for the larger reward, rats typically alternate between sides, holding the delay constant, which we identify as a maintenance phase (Fig. 1C-D). Importantly, the task does not enforce these phases on subjects; rather, these phases describe the patterns of behavior that well-trained rats typically exhibit.

Both deliberative and procedural decision-making systems can solve this task, but rats typically deliberate when titrating on the task, and then automate (proceduralize, automate into a habit) when maintaining, as evidenced by hippocampal involvement (Bett et al. 2012), changes in behavioral deliberation markers and the regularity of the path taken (Papale et al., 2012, Bett et al., 2015, Papale et al., 2016, Kreher et al., 2019), and by hippocampal (Papale et al. 2016) and orbitofrontal and ventral striatal firing patterns on this task (Stott and Redish 2014). Analyzing the trajectory through the choice point on each lap provides a way to measure VTE and infer deliberation (Fig. 1E-F). The reward economy and strategy dynamics on this task are covert and internally driven by the subject (Powell & Redish 2016). This provides a useful way to investigate an animal’s valuation algorithms (Fig. 1G) and self-guided shifts in strategy (Fig. 1H-I).

The three-phase structure from the Spatial Adjusting Delay Discounting task makes it a powerful tool for identifying changes in strategy between deliberation and procedural decision-making. Prelimbic (PL) firing patterns show strategy-related representations that align with the three phases (exploration, titration, maintenance), changing ensemble-alignments a few laps before a rat changes strategy (Powell & Redish 2016). On other tasks, previous research has shown that mPFC disruptions yield deficits in strategy changes that appear to follow subregional specificity —most notably, linking deficits in goal-directed (deliberative) behaviors with PL disruption (Ragozzino et al., 1999, Rich and Shapiro, 2007, Tran-Tu-Yen et al., 2009, Dalton et al., 2016, Riaz et al., 2019, Schmidt et al., 2019, Kidder et al., 2021) while disruption of IL activity appears to inhibit habit formation (Coutureau and Killcross, 2003, Killcross and Coutureau, 2003, Smith and Graybiel, 2013a). Given the known relationships between PL ensembles and within-task behavioral phases, we set out to examine the consequences of prelimbic mPFC disruption on the Spatial Adjusting Delay Discounting task. Given the known effects on choice behavior of PL disruption at a choice-point (Kidder ed al. 2021), we targeted this disruption through optogenetic manipulation on a subset of laps specifically at the choice point.

留言 (0)

沒有登入
gif