Complementary task representations in hippocampus and prefrontal cortex for generalizing the structure of problems

Mice generalize knowledge between problems

Subjects serially performed a set of reversal learning problems that shared the same structure but had different physical layouts. In each problem, every trial started with an ‘initiation’ nose-poke port lighting up. Poking this port illuminated two ‘choice’ ports, which the subject chose between for a probabilistic reward (Fig. 1a). Once the subject consistently (75% of trials) chose the high reward probability port, reward contingencies reversed (Fig. 1b). Once subjects completed ten reversals on a given port layout (termed a ‘problem’), they were moved onto a new problem where the initiation and choice ports were in different physical locations (Fig. 1c). All problems, therefore, shared the same trial structure (initiate then choose) and a common abstract rule (one port has high and one has low reward probability, with reversals) but required different motor actions due to the different port locations. In this phase of the experiment, problem switches occurred between sessions, and subjects completed ten different problems.

Fig. 1: Transfer learning in mice.figure 1

a, Trial structure of the probabilistic reversal learning problem. Mice poked in an initiation port (gray) and then chose between two choice ports (green and pink) for a probabilistic reward. b, Block structure of the probabilistic reversal learning problem. Reward contingencies reversed after the animal consistently chose the high reward probability port. c, Example sequence of problems used for training, showing different locations of the initiation (I) and two choice ports (A and B) in each problem. d, Example behavioral session late in training in which the animal completed 12 reversals. Top sub-panels show animals’ choices, outcomes they received and which side had high reward probability; bottom panel shows exponential moving average of subjects’ choices (tau = 8 trials). e, Mean number of trials after a reversal taken to reach the threshold to trigger the next reversal, as a function of problem number. f, Probability of choosing the new best option (the choice that becomes good after the reversal) on the last ten trials before the reversal and the first ten trials after the reversal split by the first problem and the last problem. The P value refers to the difference between the slopes after the reversal point in early and late training (paired-sample t-test, two-sided). g, Mean number of pokes per trial to a choice port that was no longer available because the subject had already chosen the other port, as a function of problem number. h, Mean number of pokes per trial to a choice port that was no longer available as a function of reversal number on the first five problems and the last five problems during training. The P value refers to the difference in the log of the time constants from fitted exponential curves in early and late training (paired-sample t-test, two-sided). i,j, Coefficients from a logistic regression predicting current choices using the history of previous choices (i), outcomes (not shown) and choice × outcome interactions (j). For each problem and predictor, the coefficients at lag 1–11 trials are plotted. k,l, Coefficients for the previous trial (lag 1, left) and average coefficients across lags 2–11 (right), as a function of problem number (P values derived from repeated-measures one-way ANOVAs with problem number as the within-subjects factor). Error bars on all plots show mean ± s.e.m. across mice (n = 9 mice). P values in e and g are from the two-way repeated-measures ANOVAs with problem number and reversal number as within-subjects factors.

We first asked whether subjectsʼ performance improved across problems, consistent with their generalizing the problem structure (one port is good at a time, with reversals) (Fig. 1b). Mice took fewer trials to reach the 75% correct threshold for triggering a reversal within each problem (F9,72 = 3.52, P = 0.001; Extended Data Fig. 1a) and, crucially, also across problems (F9,72 = 3.91, P < 0.001; Fig. 1e), consistent with generalization. Improvement across problems tracking the good port might reflect an increased ability to integrate the history of outcomes and choices across trials. To assess this, we fit a logistic regression model predicting choices, using the recent history of choices, outcomes and choice × outcome interactions. Across problems, the influence of both the most recent (F9,71 = 5.08, P < 0.001; Fig. 1j,l) and earlier (F9,71 = 5.46, P < 0.001; Fig. 1j,l) choice × outcome interactions increased. Subjects’ choices were also increasingly strongly influenced by their previous choices (F9,71 = 11.77, P < 0.001; Fig. 1i,k), suggesting a decrease in spontaneous exploration.

We also asked whether subjects generalized the trial structure (initiate then choose; Fig. 1a) across problems, by assessing how often they made nose-pokes inconsistent with this sequence (that is, pokes to the alternative choice port after having made a choice, instead of returning to initiation). Mice made fewer out-of-sequences pokes across reversals within each problem (F9,72 = 17.82, P < 0.001; Extended Data Fig. 1b) but, notably, also across problems (F9,72 = 18.29, P < 0.001; Fig. 1g). This improvement was not just driven by animals’ poor performance on the first problem but continued throughout training (F9,64 = 9.36, P < 0.001). To assess whether it was driven simply by learning to follow port illumination, we examined behavior on ‘forced choice’ trials where only one choice port illuminated, and the other was inactive. Animals did not just follow the light and were equally likely to poke the high reward probability choice port as the choice port that was illuminated, demonstrating that their behavior was influenced by their belief about reward availability and not just the port illumination (Extended Data Fig. 2i,j), although it remains possible that they used port illumination while acquiring a new problem.

This observed improvement across problems is consistent with meta-learning (or ‘learning to learn’). In line with this, on early problems mice learned the new poke sequences necessary to execute trials gradually over many reversals, suggesting instrumental learning. However, at the end of the training, they acquired the new poke sequence in a single reversal, suggesting that they ‘learned how to learn’ the sequence (t17 = 2.81, P = 0.023; Fig. 1h). Similarly, animals adapted to reversals faster at the end of training compared to the beginning of training (t17 = 5.04, P = 0.001; Fig. 1f). Therefore, they had also ‘learned how to learn’ from reward.

These data demonstrate generalization but do not provide a mechanism. A possible mechanism is task abstraction, whereby the brain uses the same neuronal representation for different physical situations that play the same task role. To investigate whether such representations existed, we next examined cellular responses in mPFC and hippocampus.

Abstract and problem-specific representations in PFC and CA1

We recorded single units from dorsal CA1 (345 neurons, n = 3 mice, 91–162 neurons per mouse) and mPFC (556 neurons, n = 4 mice, 117–175 neurons per mouse) (Supplementary Fig. 1 and Fig. 2) in separate animals using electrophysiology. For recordings, we modified the behavioral task such that changes from one problem to the next occurred within session, with the problem transition triggered once subjects completed four reversals on the current problem, up to a maximum of three problems in one session. Subjects adapted well to this and, in most recording sessions, performed at least four reversals in three problems, allowing us to track the activity of individual units across problems (Fig. 2c). Cross-problem learning reached asymptote before starting recordings—that is, during recording sessions, mice no longer showed improvement across problems (Extended Data Fig. 2), and there were no differences in behavioral performance between CA1 and PFC animals (Extended Data Fig. 2c,f).

Fig. 2: Recording units across multiple problems in a single session.figure 2

a, Silicon probes targeting hippocampal dorsal CA1 and mPFC were implanted in separate groups of mice. b, Diagram of problem layout types used during recording sessions. c, Example recording session in which a subject completed four reversals in each of three problems. Top panel shows the ports participating in each problem color-coded by layout type. Bottom panel shows the exponential moving average of choices, with the choices, outcomes and reversal blocks shown above. d, Example PFC neurons. Cell 1 in PFC fired selectively to both choice ports (but not initiation) in each problem, even though the physical location of the choice ports was different both within and across problems. Cell 2 fired at the initiation port in every problem, even when its physical location changed. Cell 3 fired at B choice ports in all problems but also gained a firing field when initiation port moved to the previous B choice port (showing that PFC does have some port-specific activity). Cell 4 responded to reward at every choice port in every problem. Cell 5 responded to reward omission and had high firing during the ITI. Cell 6 responded to reward at B choice port (that switched location) in each problem. e, Example CA1 neurons. Some CA1 cells also had problem general firing properties (cells 1 and 2). Cell 1 fired at B choice that switched physical location between problems. Cell 2 responded to the same port in all problems and modulated its firing rate depending on whether it was rewarded or not. Cell 3 fired at the same port in all layouts. Cell 4 switched its firing preference from initiation to B choice that shared physical locations, analogous to ‘place cells’ firing at a particular physical location. This port selectivity was more pronounced in CA1 than PFC (Extended Data Fig. 4). Cells 5 and 6 ‘remapped’, showing interactions between problem and physical port. Cell 5 fired at a given port in one layout but not when the same port was visited in a different layout. Cell 6 fired at choice time at a given port in one layout and changed its preferred firing time to pre-initiation in a different layout. In all plots, average firing rates are arranged by layout types 1, 2 and 3, but the order in which they were experienced is plotted in the ‘Experienced layouts’ sub-panel. Error bars show firing rates ± s.e.m. across trials.

During recording sessions (7–16 sessions per mouse, 341–650 trials per session), we used ten different port layouts, but, to simplify the analysis, they were all reflections of three basic layout types (Fig. 2b), each of which occurred once in every session in a randomized order. In the first layout type, the initiation port (I1) was the top or bottom port, and the choice ports were the far left and far right ports. One of these choice ports remained in the same location in all three layouts used in a session and will be referred to as the A choice. This acted as a control for physical location, allowing us to assess how the changing context of the different problems affected the representation of choosing the same physical port. Both the other choice port (B choice) and the initiation port moved physical locations between problems. In the second layout type, both the initiation port (I2) and the B choice port (B2) were in locations not used in layout type 1. In the third layout type, the initiation port was the same as the initiation port in layout type 1 (I3 = I1), and the B choice port was the same as the initiation port from layout type 2 (B3 = I2). Hence, in every recording session, we had examples of (1) the same port playing the same role across problems, (2) different ports playing the same role across problems and (3) the same port playing different roles across problems.

As animals transferred knowledge of the trial structure across problems, we reasoned that neurons might exhibit ‘problem-general’ representations of the abstract stages of the trial (initiate, choose and outcome) divorced from the sensorimotor specifics of each problem. On inspection, such cells were common in PFC (Figs. 2d and 3a and Extended Data Fig. 3a). Although some problem-general tuning was observed in CA1, activity for a given trial event (for example, initiation) typically varied more across problems in CA1 than in PFC (Figs. 2e and 3b and Extended Data Figs. 3b and 4). Some CA1 neurons fired at the same physical port across problems even though its role in the task had changed. Other CA1 neurons ‘remapped’ between problems, changing their tuning with respect to both physical location and trial events.

Fig. 3: Example neurons in physical space and behavioral task.figure 3

a, Example PFC neurons. For each cell, left panels show trajectories of animals’ nose (gray) and locations where spikes occurred (red) in a 2D space corresponding to the view of a camera positioned above the box looking at the ports, affine transformed to correct for the oblique view of the ports. Middle panels show firing rate heat maps in this same 2D space. Right panels show average firing rates across the trial for each trial type. Layout types are indicated by the color of boxes (blue, yellow and purple). Cell 1 fired at the initiation port in every problem even when its physical location changed. Cell 2 fired at all choice ports in all problems. For choice port selective cells (PFC and CA1 cell 2), we split the firing rate maps by whether the within-choice port spikes (and occupancies) occurred at times before outcome (left) or during reward consumption (right) to further show that these cells are selective to trial events. b, Example CA1 neurons. Cell 1 fired at the bottom initiation port in layout type 2 but not when this same port acted as a B choice in layout type 3 or when the port was not a part of the current problem but was visited in layout 1. Cell 2 fired at one of the B ports in one layout type 3 and had no selectivity to the same port in layout type 2 when this port was an initiation port and, instead, fired at a different B choice in layout type 2. Error bars show firing rates ± s.e.m. across trials.

These single-unit examples suggest that problem-general representations may be more prominent in PFC, while both tuning to physical location, and complete remapping between problems may be more common in CA1.

Representations generalize more strongly in PFC than CA1

To assess whether our single-unit observations hold up at the population level, we sought to characterize how neural activity in each region represented trial events and how these representations generalized across problems.

We first assessed the influence of different trial variables in each region using linear regression to predict spiking activity of each neuron, at each timepoint across the trial, as a function of the choice, outcome and outcome × choice interaction on that trial (Fig. 4a). As the task was self-paced, we aligned activity across trials by warping the time period between initiation and choice to match the median interval (for more details, see ‘Time warping methods’ and Supplementary Fig. 2). We then quantified how strongly each variable affected population activity as the population coefficient of partial determination (CPD) (that is, the fraction of variance uniquely explained by each regressor) at every timepoint across the trial (Fig. 4b). This analysis was run separately for each problem in the session, and the results were averaged across problems and sessions. Both regions represented current choice, outcome and choice × outcome interaction, but there was regional specificity in how strongly each variable was represented. Choice (A vs B) representation was more pronounced in CA1 than PFC (peak variance explained—CA1: 8.4%, PFC: 4.8%, P < 0.001), whereas outcome (reward vs no reward) coding was stronger in PFC (peak variance explained—CA1: 7.1%, PFC: 12.9%, P < 0.001). Furthermore, choice × outcome interaction explained more variance in CA1 than PFC (peak variance explained—CA1: 3.7%, PFC: 2.4%, P < 0.001).

Fig. 4: Problem-general and problem-specific representations in PFC and CA1 population activity.figure 4

a, Linear regression predicting activity of each neuron at each timepoint across the trial, as a function of the choice, outcome and outcome × choice interaction. b, CPDs from the linear model shown in a for choice, outcome and outcome × choice regressors in PFC and CA1. Significance levels for within-region effects were based on a two-sided permutation test where firing rates were rolled with respect to trials. Significance levels for differences between regions were based on a two-sided permutation test across sessions. All significance levels were corrected for multiple comparison over timepoints. c, Representation similarity at ‘choice time’ (left) and ‘outcome time’ (right), quantified as the Pearson correlation between the demeaned neural activity vectors for each pair of conditions. d, RDMs used to model the patterns of representation similarity observed in the data. Each RDM codes the expected pattern of similarities among categories in c under the assumption that the population represents a given variable. The Port RDM models a representation of the physical port poked (for example, far left) irrespective of its meaning in the trial. A vs B choice models a representation of A/B choices irrespective of physical port. The Outcome RDM models representation of reward versus reward omission. The Outcome at A vs B RDM models separate representations of reward versus omission after A and B choices. Choice vs Initiation models representation of the stage in the trial. Problem-specific A choice models separate representation of the A choice in different problems. e, CPDs in a regression analysis modeling the pattern of representation similarities using the RDMs shown in d. The time course is given by sliding the windows associated with choices from being centred on choice port entry to 0.76 seconds after choice port entry while holding time windows centred on trial initiations fixed. Stars indicate timepoints where regression weight for each RDM was significantly different between the two regions (P < 0.05 (small stars) and P < 0.001 (big stars)), from one-sided permutation tests across sessions corrected for multiple comparison over timepoints. f, Confusion matrices from linear decoding of position in trial, using a decoder that was trained on one problem and tested on another, averaged across animals and across all problem pairs. Colored squares indicate three possible patterns of decoding that indicate different neuronal content. Blue indicates correct cross-task decoding to the same abstract state (for example, B choice decodes to B choice). Red indicates decoding to a different state that could have occurred at the same sequential position in the trial (for example, B choice decodes to A choice). Dashed green corresponds to decoding to the same physical port for those training and test layouts where the Initiation and B choice ports interchanged (for example, B choice decodes to Initiation when the decoder was trained on layout 2 and tested on layout 3). g, Bar plots showing the probability of the cross-task decoder outputting the correct abstract state (blue), the other state that can have the same position in the trial sequence (red) and the state that has the same physical port as the training data (dashed green, computed only from confusion matrices where B choice and initiation ports interchange) computed using the corresponding cells highlighted in f. Error bars report the mean ± s.e.m. across different mice (CA1: n = 3 mice; PFC: n = 4). Significance levels were compared against the null distribution obtained by shuffling animal identities between regions (one-sided permutation tests). NS, not significant.

Although highlighting some differences in population coding between regions, this approach cannot assess the relative contribution of abstract representations that generalize across problems versus features specific to each problem, such as the physical port location. This requires comparing activity both across timepoints in the trial and across problems, which we did using representational similarity analysis (RSA)44. We extracted firing rates around initiation and choice port entries (±20 ms around each port entry type) and categorized these windows by which problem they came from, whether they were initiation or choice, and, for choice port entries, whether the choice was A or B and whether it was rewarded, yielding a total of 15 categories (Fig. 4c). For each session, we computed the average activity vector for each category and then quantified the similarity between categories as the correlation between the corresponding activity vectors. We show RSA matrices for this ‘choice time’ analysis (Fig. 4c, left panels) and an ‘outcome time’ analysis (Fig. 4c, right panels) where the windows for choice events were moved 250 ms after port entry, holding the time window around trial initiations constant.

To quantify the factors influencing representation similarity, we created representational similarity design matrices (RDMs) that each encapsulated the predicted pattern of similarities under the assumption that activity was influenced by a single feature of the behavior (Fig. 4d). For example, if the population activity represented only which physical port the animal was at, its correlation matrix would look like Fig. 4d, Port. We included RDMs for a set of problem-general features: the trial stage (‘Initiation vs Choice’), choice (A vs B) and trial outcome (both on its own as ‘Outcome’ and in conjunction with choice ‘Outcome at A vs B’). To assess whether the changing context provided by different problems modified the representation of choosing the same physical port at the same trial stage, we included a ‘Problem-specific A choice’ RDM that represents similarity between A choices (which are always in the same location) within each problem.

To assess the influence of these features on neural activity, we modeled the experimentally observed patterns of representational similarity (Fig. 4c) as a linear combination of the RDMs (Fig. 4d), quantifying the influence of each by its corresponding weight in the linear fit. As the RSA matrices changed between choice time and outcome time (Fig. 4c), we characterized this time evaluation using a series of such linear fits, moving the time window around choice port entries in steps from before port entry until after the reward delivery while holding the time window around initiation port constant, generating the time series for the influence of each RDM on activity shown in Fig. 4e.

Consistent with our single-unit observations, both PFC and CA1 represented both problem-specific and problem-general features to some extent. However, there was a marked regional specificity in how strongly different features were encoded (Fig. 4e). PFC had stronger, abstract, sensorimotor-invariant representation of trial stage (Initiation vs Choice) and trial outcome (P < 0.001). In contrast, CA1 had stronger representation of the physical port that the subjects were poking and whether it was an A vs B choice (P < 0.001). Additionally, CA1, but not PFC, showed a problem-specific representation of A choices (P < 0.001). This is striking because, during A choices, both the physical port and its meaning are identical across problems, indicating that the changing problem context alone induced some ‘remapping’ in CA1 but not PFC. Finally, there was a regional difference in the representation of trial outcome. PFC outcome representations were more general (the same neurons responded to reward or reward omission across ports and problems, P < 0.001). CA1 also maintained an outcome representation, but this was more likely to be conjunctive than in PFC—different neurons would respond to reward on A and B choices (P < 0.001).

These representational differences between regions survived the animal random effects test (see the ‘Statistical significance’ section, Extended Data Fig. 5 and individual animal plots in Extended Data Fig. 6a–c). To ensure that they were not driven by fine-grained selectivity to physical movements, we re-ran the analysis on residual firing rates after regressing out the influence of two-dimensional (2D) nose position, velocity and acceleration (for more details, see the ‘Additional controls for physical movement’ section). All inter-region differences except the stronger representation of A vs B choice in CA1 survive this control (Extended Data Fig. 7c–e), consistent with the single-cell examples described above (Fig. 3a,b and Extended Data Fig. 3). We also assessed whether problem specificity in CA1 might be driven by slow drift over time but found that representations changed abruptly at transitions between problems (Extended Data Fig. 8).

We used a cross-problem decoding analysis to further characterize differences in representation between regions. We trained a linear model to decode position in the trial (Initiation and A/B choice/reward/no-reward) using data from one problem and tested the decoding performance on a different problem (Fig. 4f,g). Because the B and initiation ports moved and sometimes interchanged between problems, the pattern of decoding errors is informative about whether activity primarily represented physical port or abstract trial stage (Initiation vs Choice). Where PFC made errors, they were predominantly to the other state that could occur at the same sequential position in the trial (A rather than B choice or outcome). By contrast, CA1 predominantly decoded to the same physical port as the training data. Together, these population results confirm that PFC had a predominantly generalizing representation, and this representation embeds the sequential properties of the trial while CA1 encoded problem specifics (such as port identity) more strongly.

Generalization of low-dimensional population activity

To further explore how the structure of population activity generalized between problems, we assessed how accurately low-dimensional activity patterns in one problem could explain activity in another. Using singular value decomposition (SVD), we decomposed activity in each problem into a set of cellular and temporal modes. Cellular modes correspond to sets of neurons whose activity covaries over time and, hence, can be thought of as cell assemblies. Each cellular mode is specified by a vector with a weight for each cell, indicating how strongly the cell participates in the mode. Cellular and temporal modes come in pairs, such that each cellular mode has a corresponding temporal mode, which is a vector of weights across timepoints indicating how the activity of the cellular mode varies over time.

To evaluate the cellular and temporal modes for a given problem, we first regressed out general movement-related features onto the firing rates (for more details, see Extended Data Fig. 7 and the ‘Additional controls for physical movement’ section). After removing the effect of velocity, acceleration and 2D nose position, we computed the average residual firing rate at each timepoint across the trial for four types of trials: rewarded A choices, non-rewarded A, rewarded B and non-rewarded B (non-rewarded trials included both correct trials and incorrect trials). For each cell, we concatenated these four time series to create a single time series containing the average activity of the cell across each timepoint of the four trial types. The temporal modes span this same set of timepoints and, hence, capture variation across both time-in-trial and trial-type. We then stacked these single-cell activity time series for all neurons to create an activity matrix D where each row contained the activity of one neuron (Fig. 5a). Using SVD, we decomposed this activity matrix into cellular and temporal modes U and V, linked by a diagonal weight matrix Σ

Fig. 5: Generalization of low-dimensional representations of trial events.figure 5

a, Diagram of SVD analysis. A data matrix comprising the average activity of each neuron across timepoints and trial types was decomposed into the product of three matrices, where diagonal matrix Σ linked a set of temporal patterns across trial type and time (rows of VT) to a set of cellular patterns across cells (columns of U). b, First temporal mode in VT from SVD decomposition of data matrix from PFC plotted in each problem separately for clarity and separated by A (green) and B (pink) rewarded (solid) and non-rewarded (dashed) choices. c, First cellular mode from SVD decomposition of data matrix from PFC in each problem showing that similar patterns of cells participate in all problems. d, Variance explained when using temporal activity patterns V1T from one problem to predict either held-out activity from the same problem (solid lines) or activity from a different problem (dashed lines). Light purple and lilac lines indicate variance explained when shuffling timepoints in the firing rates matrices. e, Variance explained when using temporal activity patterns V1T to predict either activity from the same problem and brain region (solid lines) or a different brain region (and, therefore, different animal) and the same problem (dashed lines) D2. f, Variance explained when using cellular activity patterns U1 from one problem to predict either held-out activity from the same problem (solid lines) or activity from a different problem (dashed lines). Dashed light purple and lilac lines indicate variance explained when shuffling cells in the firing rates matrices. g, Cumulative weights along the diagonal Σ using pairs of temporal V1T and cellular U1 activity patterns from one problem to predict either held-out activity from the same problem (solid lines) or activity from a different problem (dashed lines). Weights were normalized by peak cross-validated cumulative weight computed on the activity from the same problem. h, To assess whether the temporal singular vectors generalized significantly better between problems in PFC than CA1, we evaluated the area between the dashed and solid lines in d for CA1 and for PFC separately, giving a measure for each region of how well the singular vectors generalized. We computed the difference in this measure between CA1 and PFC (pink line in h) and compared this difference to the null distribution obtained by permuting sessions between brain regions (gray histogram; black line shows the 95th percentile of distribution). Temporal singular vectors generalized equally well between problems in the two regions. i, Cellular singular vectors generalized significantly better between problems in PFC than CA1. Computed as in h but using the solid and dashed lines from f. g, Pairs of cellular and temporal singular vectors generalized significantly better between problems in PFC than CA1. Computed as in h but using the solid and dashed lines from g. a.u., arbitrary units.

The cellular modes are the columns of U, and the temporal modes are the rows of VT. Both modes are unit vectors, so the contribution of each pair to the total data variance is determined by the corresponding element of the diagonal matrix Σ. The modes are sorted in order of explained variance, such that the first cellular and temporal mode pair explains the most variance. The first cellular and temporal mode of PFC activity in three different problems is shown in Fig. 5b,c. It is high throughout the inter-trial interval (ITI) and trial, with a peak at choice time but strongly suppressed after reward (similar to cell 5 in Fig. 2d).

We reasoned that (1) if the same events were represented across problems (for example, initiation, A/B choice and outcome), then the temporal modes would be exchangeable between problems, no matter whether these representations were found in the same cells; (2) if the same cell assemblies were used across problems, then the cellular modes would be exchangeable across problems, no matter whether the cell assemblies played the same role in each problem; and (3) if the same cell assemblies performed the same roles in each problem, then pairs of cellular and temporal modes would be exchangeable across problems.

To see whether the same representations existed in each problem, we first asked how well the temporal modes from one problem could be used to explain activity from other problems. Because the set of temporal modes V is an orthonormal basis, any data of the same rank or less can be perfectly explained when using all the temporal modes. However, population activity in each problem is low dimensional, so a small number of modes explain a great majority of the variance. Modes that explain a lot of variance in one problem will explain a lot of variance in the other problem only if the structure captured by the mode is prominent in both problems. The question is, therefore, how quickly variance is explained in problem 2ʼs data, when using the modes from problem 1 ordered according to their variance explained in problem 1. To assess this, we projected the data matrix D2 from problem 2 onto the temporal modes V1 from problem 1, giving a matrix MV whose elements indicate how strongly each temporal mode contributes to the problem 2 activity of each neuron:

The variance explained by each temporal mode is given by squaring the elements of MV and summing over neurons. We express this as a percentage of the total variance in D2 and plot the cumulative variance explained as a function of the number of D2ʼs temporal modes, when ordering modes according to variance explained in D1 (Fig. 5d). To control for drift in neuronal representations across time, we computed the data matrices separately for the first and second halves of each problem. We compared the amount of variance explained using modes from the first half of one problem to model activity in the second half of the same problem, with the variance explained using modes from the second half of one problem to model activity from the first half of the next problem.

In both PFC and CA1, the cumulative variance explained as a function of the number of temporal modes used did not depend on whether the two datasets were from the same problem (solid) or different problems (dashed) (Fig. 5d,h; P > 0.05). This indicates that the temporal patterns of activity and, therefore, the trial events represented did not differ across problems in either brain area. However, as this analysis used only the temporal modes, it says nothing about whether the same or different neurons represented a given event across problems. In fact, we can even explain activity in one brain region using temporal modes from another region and mouse (Fig. 5e).

The pattern was very different when we used cellular modes (that is, assemblies of co-activating neurons) from one problem to explain activity in another. We quantified variance explained in problem 2 using cellular modes from problem 1, by projecting the problem 2 data matrix D2 onto problem 1 cellular modes U1, giving a matrix Mu whose elements indicate how strongly each cellular mode contributes to problem 2 the activity at each timepoint:

The total variance explained by each temporal mode is given by squaring the elements of MU and summing over timepoints. In both PFC and CA1, cellular modes in U that explained a lot of variance in one problem explained more variance in the other half of the same problem than they did in an adjacent problem (Fig. 5f; differences between solid and dashed lines). However, the within-problem versus cross-problem difference was larger in CA1 than PFC (Fig. 5i; P < 0.05). This indicates that PFC neurons whose activity covaried in one problem were more likely to also covary in another problem, when compared to CA1 neurons. As this analysis considered only the cellular modes, it does not indicate whether a given cell assembly carried the same information across problems.

To assess how well the cellular and temporal activity patterns from problem 1 explained activity in problem 2, we projected dataset D2 onto the cellular and temporal mode pairs of problem 1 (\(U_1^T\), V1).

$$\Sigma _2 = U_1^TD_2V_1$$

If the same cell assemblies perform the same roles in two different problems, the temporal and cellular modes will align, and Σ2 will have high weights on the diagonal. We, therefore, plotted the cumulative squared weights of the diagonal elements of Σ within and between problems (Fig. 5g). In both PFC and CA1, cellular and temporal modes aligned better in different datasets from the same problem (solid lines) than for different problems (dashed lines). However, this difference was substantially larger for CA1 than PFC (Fig. 5j; P < 0.05). All results also held true when using a time window between only initiation and choice (Extended Data Fig. 9).

These data show that, although the temporal structure of activity in both regions generalizes perfectly across problems, brain regions and subjects— a consequence of the same set of trial events being represented in each—the cell assemblies used to represent them generalized more strongly in PFC than CA1.

Generalization of policy representations

So far, we have focused on how neuronal representations of individual trial events generalize across problems. But, to maximize reward, the subject must also track which option is best by integrating the history of choices and outcomes across trials. To be useful for generalization, this policy representation should also be divorced from the current sensorimotor experience of any specific problem.

To estimate subjects’ beliefs about which option was best, we used a logistic regression predicting the current choice as a function of the choice and outcome history (Fig. 6a). This gave a trial-by-trial estimate of the probability the animal would choose A versus B—that is, the animal’s policy. We used this policy as a predictor in a linear regression predicting neural activity, run separately for each problem with results averaged across problems and sessions (Fig. 6b). Policy explained variance that was not captured by within-trial regressors such as choice, reward and choice × reward interaction. Specifically, the subjects’ policy interacted with the current choice-explained variance (P < 0.001) starting around the time of trial initiation, when it would be particularly useful for guiding the decision.

Fig. 6: Policy generalization in PFC and CA1.figure 6

a, Weights from logistic regression predicting choices in recording sessions using choices, rewards and choice × reward interactions over the previous 12 trials as predictors. The effect of choice × outcome interaction history was significantly above zero on up to 11 trials back (one-sided t-test, P < 0.05) except for the 7th trial (t6 = 1.99, P = 0.094). Error bars report the mean ± s.e.m. across mice. b, CPDs from regression models predicting neural activity using current trial events, subjects’ policy (estimated using the behavioral regression in a) and policy interacted with current choice. Stars denote the timepoints at which each regressor explained significantly more variance than expected by chance (permutation test based on rolling firing rates with respect to trials, P < 0.001, corrected for multiple comparisons; for more details on permutation tests, see the ‘Statistical significance’ section). c, Correlations across problems between policy weights in regressions predicting neural activity. Regressions were run separately for A (left panels) and B (right panels) choices in each problem and at each timepoint across the trial. Correlations of policy representations between all problem pairs were evaluated for each pair of timepoints; values on the diagonal show how correlated policy representation was at the same timepoint in both problems. Positive correlation indicates that the same neurons coded policy with the same sign in both problems. d, To quantify whether policy generalized more strongly between problems in PFC than CA1, we computed the between-region difference in the sum along the diagonal of the correlation matrices in c, separately for A and B choices, and compared it against the null distribution obtained by permuting sessions between brain regions. Policy representation on both A and B choices generalized more strongly in PFC than CA1. e, Slices through the correlation matrices at initiation (left), choice (center) and outcome (right) times for A (solid) and B (dashed line) choices. Significant differences between conditions are indicated by stars as shown in the legend.

We next asked whether this policy representation generalized across problems. Policy may generalize differentially for A and B choices because only the B port varied between problems. We, therefore, analyzed A and B choice trials separately. We ran a set of linear regressions, each predicting neural activity in one problem at a single timepoint in the trial, using policy and trial outcome as regressors. The policy beta weights from each regression correspond to the pattern of neural activity that represented policy in one problem at one timepoint. We can, therefore, quantify the extent to which policy representations generalized between problems as the correlation coefficient between the policy beta weights. We computed the average across-problem correlation of these weights between every pair of timepoints (Fig. 6c). The diagonal elements of these matrices show the average correlation across problems at the same timepoint in each problem. These correlations were larger in PFC than CA1 on both A and B choices (P < 0.05, permutation test; Fig. 6d), showing that, on average, policy representations generalized across problems better in PFC than CA1.

One possible explanation is that PFC simply represented action values in a problem-general way. A more interesting possibility is that current policy shapes the representation of each trial stage differently, but, in CA1, these representations are more tied to the sensorimotor specifics of the current problem. To test this, we examined time slices through the correlation matrices at initiation, choice and outcome times (Fig. 6e). In PFC, all three correlation profiles on both A and B trials peaked at the correct timepoint (the equivalent to the diagonal elements of the matrix)—that is, the policy representations generalized across problems but were specific to each part of the trial (initiate, choose and outcome). A similar pattern was present in CA1 but only on A choices (which are the same physical port across problems). No CA1 correlation was significantly above zero on B choices. Indeed, whereas PFC policy correlations were greater than CA1 correlations for all representations (all P < 0.05) on both A and B choices, CA1 correlations showed a greater difference between A and B trials at outcome time (Fig. 6e; all P < 0.05).

Overall, therefore, both PFC and CA1 maintained representations of the subject’s current policy that were not simple value representations, as they differed depending on the trial stage. These representations were abstracted across problems in PFC but tied to the sensorimotor specifics in CA1. A portion, but not all, of this problem specificity in CA1 was accounted for by the port identity.

留言 (0)

沒有登入
gif