Feature-selective responses in macaque visual cortex follow eye movements during natural vision

In each session, a monkey viewed a sequence of natural images that repeated in a pseudorandom block fashion (Fig. 1a). Each image presentation lasted up to 1.5 s typically (in 410 of 679 sessions; range 0.3–60 s in other sessions) and was interrupted if the monkey looked away from the image. The images were typically shown at a size of 16 × 16 degrees of visual angle (dva; 487 of 679 sessions; range 8 × 8 to 26 × 26 dva in other sessions). Monkeys naturally looked around the images without training, examining each image with varied looking patterns across image repeats (Fig. 1b, inset). An average fixation lasted 276 ± 49 ms; an average saccade took 50 ± 5 ms and subtended 5.4 ± 0.9 dva (all mean ± s.d. across subjects; Fig. 1c–e).

Fig. 1: Overview of the free-viewing experiment.figure 1

a, Monkeys freely viewed images presented in a random sequence. A fixation dot was displayed before some image presentations. b, The gaze trajectory in an example presentation. The inset shows gaze trajectories for the same image across repeat presentations in one experimental session. Colors indicate different presentations; dots, fixations. ce, Distributions of fixation durations (c), saccade durations (d) and saccade sizes (e). Thin lines indicate individual monkeys; thick lines, across-monkey averages. f, Image-onset-aligned spike rasters and average FRs for an example AIT neuron. Red and pink ticks indicate image onset and offset times; pink-shaded regions, the typical image presentation cadence in this session. g, Mean normalized FRs per visual area, using presentations lasting 1.5 s for illustration. Values of n correspond to the number of neurons/monkeys per visual area. The shading indicates the bootstrap 95% CI of the mean.

We recorded extracellular single- and multi-unit activity (hereafter, neurons) using chronically implanted multielectrode arrays. The recordings spanned six visual areas: V1, V2, V4 and the posterior, central and anterior divisions of IT (PIT, CIT and AIT). Most data were collected in CIT and AIT (eight and seven monkeys; neuron and monkey numbers included are noted per plot; see the Methods for detailed inclusion criteria), followed by V4 (three monkeys), V1 (two), V2 and PIT (one each). Ventral visual neurons were generally more active during image presentations (Fig. 1f,g): mean firing rates (FRs) increased following image onset, remained elevated as the monkey explored the image and returned to baseline after image offset.

Face-neuron responses were gaze-specific

To study how neuronal responses interact with eye movements and stimulus content, we first focused on face-selective neurons (face neurons, for brevity). During passive fixation, face neurons respond more to faces than nonface objects32. During free viewing, eye movements can bring a face into and out of a neuron’s spatial RF. Thus, we categorized fixations as face or nonface by whether the face region of interest (ROI) overlapped the RF (Fig. 2a). We recorded neurons from three face patches in three monkeys (CIT in M1 and AIT in M2 and M3). To functionally identify face neurons recorded in multielectrode arrays, we calculated a face selectivity index (FSI) using responses during the ‘zeroth fixation’, the period between the image onset and the first eye movement. In this period, the onset of a random image placed either a face or a nonface in a neuron’s RF depending on where the monkey happened to be looking. Thus, the zeroth fixation was analogous to passive viewing. We defined face neurons as those with zeroth-fixation FSI at least 0.2 (vertical dashed line in Fig. 2b), that is, at least 50% higher responses to faces than to nonfaces. Across sessions, we recorded 6,312 neurons from face-patch arrays. Of these neurons, 2,683 (42.5%) passed the FSI threshold.

Fig. 2: Face-selective neurons responded according to whether fixations placed RFs on a face or not.figure 2

a, Fixations were categorized as face or nonface per neuron based on RF overlap with face ROIs, here illustrated for a foveal RF 5 dva in diameter for the same image as in Fig. 1b. The two dark-shaded areas indicate face ROIs; dots, fixations; colors, categories (orange, face; blue, nonface). b, Neuronal face selectivity was quantified by an index (FSI) and compared between zeroth and nonzeroth fixations (respectively, x and y axes). Each dot corresponds to a neuron. The top and right subplots show marginal distributions. Neurons colored dark red had significantly different FSI between zeroth and nonzeroth fixations (P < 0.01, two-tailed permutation test, FDR-corrected). The diagonal dashed line corresponds to identity; vertical dashed line, zeroth-fixation FSI = 0.2. c, Responses per category for face-selective neurons, aligned to image onsets (top row) or nonzeroth fixation onsets (bottom row). Each column corresponds to a monkey. The n indicates the number of face neurons. d, An example saccade is shown for each of the four categories defined by the start and end fixation categories. e, Responses per saccade category for the same neurons as in c. Horizontal bars indicate time bins where responses were significantly greater for nonface-to-face versus nonface-to-nonface saccades (lower solid bars) or face-to-face versus face-to-nonface saccades (upper open bars). In panels c and e, lines and shading indicate the median ± median absolute deviation (m.a.d.) across neurons.

Parafoveally previewing a stimulus before fixating it leads to better perception, both during reading and specifically for faces33,34,35. Therefore, we asked whether face neurons were more selective during active viewing, or ‘nonzeroth fixations’, compared with passive-viewing-like zeroth fixations. Neurons had correlated FSI in the two conditions (Fig. 2b; r = 0.63, n = 6,312, one-tailed P < 10−4; all P values here and below were based on permutation tests with 10,000 permutations unless noted otherwise). Few neurons had significantly different FSI between zeroth and nonzeroth fixations (Fig. 2b; 39 of 2,683 (1.5%) face neurons and 108 of 6,312 (1.7%) neurons in face-patch arrays; all statistical significance values here and below were at false discovery rate (FDR)-corrected P < 0.01).

We next examined the dynamics of face-neuron activity. Face-selective responses followed image onsets (Fig. 2c, top row) and appeared to precede fixation onsets (Fig. 2c, bottom row). To account for the possibility that the apparent predictive responses arose from consecutive face fixations, we divided saccades into four categories by the start and end fixation category (Fig. 2d). Face-neuron responses followed the fixation category across saccades (Fig. 2e). For example, in nonface-to-face saccades, face-neuron activity increased around fixation onset, whereas in face-to-face saccades, responses were lower than responses following nonface-to-face saccades, consistent with response adaptation. Responses following nonface-to-face saccades were higher than nonface-to-nonface responses, and the differences became statistically significant before fixation onsets (Fig. 2e). Significant prefixation differences persisted for large (≥4 dva) saccades (Extended Data Fig. 1), indicating that presaccadic RF overlap with postsaccadic faces did not fully explain the prefixation differences. To further assess these putative predictive responses, we next sought a metric that did not require binary delineations of neuronal RFs and preferred image features.

General feature-selective responses were gaze-specific

We devised a general readout for selective responses using the prevalent return fixations. Monkeys and humans repeatedly foveate parts of visual scenes above chance frequency in diverse task contexts including free viewing36. Figure 3a shows example return-fixation pairs (distance ≤ 1 dva) in a session, within an image presentation and between repeats. If neurons selectively respond to retinotopic features, responses should be similar between return fixations. To quantify this, we calculated response correlations between each pair of return fixations, across pairs. This measure is analogous to the self-consistency calculated between trial split halves during passive viewing, but because freely viewing monkeys can revisit each image location a different number of times, we calculated self-consistency for per-fixation (single-trial) responses.

Fig. 3: Response self-consistency during return fixations indicated gaze specificity.figure 3

a, Example return-fixation pairs, each comprising two nearby fixations (within 1 dva) on an image within or across presentations. Dots indicate fixations; color, different presentations; black lines, return-fixation pairs; arrows, two example fixation sequences meeting in a return-fixation pair. b, Distribution across neurons of return-fixation self-consistency. Red indicates neurons deemed visually selective. c, Self-consistency per neuron between return fixations (x axis) or any two fixations on the same image regardless of distance (y axis). Each dot indicates a neuron, showing 5,000 examples; dark red, neurons with statistically significant differences between return-fixation and same-image self-consistency (P < 0.01, one-tailed permutation test, FDR-corrected); dashed line, identity. d, Schematics illustrating two rules to pair responses and calculate self-consistency. Orange and blue indicate the example fixation sequences in panel a; purple and green bars, responses paired based on the respective rule, ‘the current (previous) fixations are (were) return fixations’. eg, Illustration of how we quantified self-consistency. e, Each dot indicates a neuron’s FRs in a return-fixation pair. The x and y axes correspond to each of the two fixations. The four subplots show responses 200 ms preceding or following fixation onsets, paired by the current or previous fixations. Because FRs were discrete and often overlapped, dots were slightly jittered for visualization purposes only. f, Self-consistency for all neurons in the example session. The x and y axes correspond to the two response time bins; colors, the pairing rules; each dot within a color, a neuron; square markers, the example neuron in e; dashed line, identity. g, Self-consistency for responses in 50-ms sliding bins, averaged over neurons in the example session. Dashed lines correspond to all return-fixation pairs; solid lines, decorrelated pairs. h, Mean decorrelated self-consistency time courses over monkeys and neurons, separately per visual area. The n values indicate the number of neurons/monkeys per visual area; shading, the bootstrap 95% CI of the mean; horizontal bars, time bins with significantly higher self-consistency for current- than previous-return fixations (purple) or vice versa (green; P < 0.01, one-tailed permutation test, FDR-corrected).

We used self-consistency to identify neurons with robust feature selectivity. Of all 66,260 neurons across sessions, 26,975 (40.7%) had return-fixation self-consistency r ≥ 0.1 and significantly above zero (Fig. 3b). We focused on these neurons throughout the study because all analyses relied on feature selectivity. Although the threshold r = 0.1 is lower than typical values in passive-viewing studies, the single-trial activities considered here are necessarily more stochastic than standard trial-averaged responses.

To distinguish gaze specificity from overall feature selectivity, we compared response self-consistency between return fixations or any two fixations (nearby or not) on the same image (Fig. 3c). Of the 26,975 feature-selective neurons, 95.7% showed higher self-consistency during return fixations, and 60.7% reached statistical significance. Thus, almost all feature-selective neurons were specific to the gaze location.

To study the dynamics of gaze-specific responses, we calculated self-consistency for response time courses aligned to fixation onsets (Fig. 3d–h). The first two subplots in Fig. 3e illustrate the responses of an example neuron 200 ms before and after return-fixation onsets; the responses correspond to purple bars in Fig. 3d. Responses were more self-consistent after fixation onsets than before (r = 0.57 versus 0.29). Although the self-consistency was positive even before fixation onsets, consecutive fixations (that is, separated by one saccade) were often nearby (Fig. 1e), introducing correlations. To discern the contribution from the previous fixation, we examined responses following previous-return fixations (green in Fig. 3d,e). Comparing responses paired by previous-return fixations to those paired by current-return fixations, self-consistency was higher prefixation (Fig. 3e, third versus first subplot; r = 0.59 versus 0.29) and lower postfixation (second versus fourth subplots; r = 0.57 versus 0.35). These relations hold for most neurons in the same session (Fig. 3f).

We evaluated the dynamics of gaze-specific responses at a higher resolution by calculating self-consistency in 50-ms sliding time bins (Fig. 3g, dashed lines). Responses to current-return fixations (purple) became more self-consistent following fixation onsets. Conversely, for previous-return fixations (green), self-consistency decreased after the (current) fixation onset. To further control for the nonpaired fixation, we excluded return-fixation pairs (current or previous) where the nonpaired fixations (preceding or following) were within 4 dva. This decorrelation procedure specifically reduced self-consistency in the nonpaired period (compare solid and dashed lines in Fig. 3g). Thus, we used the decorrelated self-consistency in subsequent analyses (Figs. 3h, 4a and 5a–c).

Fig. 4: Response self-consistency showed spatial precision and no fixation integration.figure 4

a, Self-consistency per visual area as a function of the distance threshold for defining return fixations. Lines and shading indicate the median and its bootstrap m.a.d. b, Schematics of predictions by the null hypothesis of per-fixation responses (H0, left) and the alternative hypothesis of an integrating stable representation (H1, right). Colors indicate response self-consistency between fixations separated by various distances: purple, return fixations ≤1 dva apart; black, fixations on the same image irrespective of distance; yellow, distant fixations >8 dva apart. c, Mean self-consistency time courses throughout an image presentation to compare against the predictions in b. Lines and shading indicate the mean and its bootstrap 95% CI.

Fig. 5: Limited evidence for predictive remapping.figure 5

a, Mean cumulative distribution per area of response latencies following fixation onset. Shading indicates the bootstrap 95% CI of the mean; gray horizontal bracket, neurons with latency < 0 further characterized in the right two plots in panel b. b, Mean estimates per area for latency (left), the fraction of neurons with negative latencies (middle) and latency for those neurons (right; numbers indicated). Larger dots and error bars indicate overall mean ± bootstrap 95% CI; smaller dots, means per monkey. c, Comparison of response latency following image and fixation onsets. Each dot indicates a neuron; error bars, bootstrap s.d.; colors, visual areas as in a and b; gray shading, identity ± 25 ms; the P value, one-tailed permutation test. d, Schematics of how saccades were matched for the face-specific analysis. Each nonface-to-face saccade was matched with a nonface-to-nonface saccade that started nearby (≤1 dva) and ended far away (≥4 dva). e, Face-neuron responses per monkey and saccade category. Lines and shading indicate the median ± m.a.d. across neurons. f, The fraction of neurons that responded significantly more to nonface-to-face versus nonface-to-nonface saccades for unmatched (left) and matched (right) saccades. To visualize small P values, the y axis is linear for P = 0–0.01 and log-scaled for P = 0.01–1. Statistical tests were per-neuron Mann–Whitney U tests (unpaired samples) when saccades were unmatched and Wilcoxon ranked-sum tests (paired samples) when saccades were matched (both one-tailed P < 0.01, FDR-corrected). Colors indicate monkeys; the shading, the bootstrap 95% CI. g, Schematics showing how saccades were matched for the self-consistency analysis. Individual saccades were matched as above, and we further required the match-saccade pair not to constitute a return-fixation pair. h, Plots showing the fraction of neurons with significantly higher self-consistency in current-return pairs than previous-return pairs (left), or current-return pairs than match pairs (right). Colors indicate visual areas. i, Self-consistency time courses for current-return-fixation pairs and match pairs. In panels h and i, lines and shading indicate the mean and its bootstrap 95% CI.

Figure 3h shows the average decorrelated self-consistency time courses for visually selective neurons, separately per visual area. The responses showed gaze specificity across areas. This conclusion did not change for within- and between-presentation return fixations analyzed separately (Extended Data Fig. 2a).

Precise spatial selectivity and no fixation integration

The self-consistency measure furnished a readout for the spatial precision of free-viewing responses. We assessed whether closer-by fixations had higher self-consistency by varying the threshold that defined return fixations. The self-consistency increased for closer-by fixations (that is, lower thresholds; Fig. 4a) down to 0.25 dva across all areas, approaching our eye-tracking resolution. Thus, ventral visual neurons had surprisingly precise spatial selectivity during free viewing.

Neuronal responses that reflected each gaze change are in principle compatible with integration over fixations to provide a useful stable representation37. If the responses integrated over fixations, as the monkey continued to view an image, increasingly similar responses should accompany different fixation locations, and the gaps should narrow (Fig. 4b, right, alternative hypothesis H1) among the self-consistency for return fixations ≤1 dva apart, all fixations on the same image and distant fixations >8 dva apart. In contrast, under the null hypothesis of retinotopic responses (Fig. 4b, left, H0), the three self-consistency measures should remain different. We tested both hypotheses for presentations lasting 1.5 s, our most common design (Fig. 4c; Extended Data Fig. 3 shows the results for other presentation times with sufficient data.) The self-consistency measures remained different throughout an image presentation, consistent with the null, retinotopic, hypothesis and contradicting the hypothesis of an integrating stable representation. However, the null hypothesis does not predict the drop in self-consistency throughout a presentation, a drop that may relate to the overall FR decrease during a presentation (Fig. 1f,g).

Limited evidence for predictive remapping

We asked whether our data provided any evidence in the ventral stream for predictively remapping neurons, which respond to stimuli in the future RF before saccade onset and may contribute to visual stability. Predictive remapping responses should have negative latencies relative to fixation onsets. The self-consistency time courses (Fig. 3h) supplied a measure for feature-selective response latency. We determined the time responses became better explained by the current fixation than the previous one, that is, the crossing point of the previous- and current-return self-consistency curves (Fig. 3g,h). The population latency distribution was mostly positive, was typical of ventral visual areas and increased along the processing hierarchy (Figs. 5a,b, left). A minority of neurons showed negative latencies (gray brackets in Fig. 5a,b). The fraction of negative-latency neurons ranged from none in PIT to 11% (6% to 15%) in AIT and 15% (0% to 45%) in V1 (mean and bootstrap 95% confidence interval (95% CI)). The negative latencies ranged from −4 ms (−9 to −1 ms) in V1 to −32 ms (−41 to −23 ms) in V2 (mean and bootstrap 95% CI; Fig. 5b, right). Because saccades took 50 ms on average (Fig. 1d) and we estimated latency relative to fixation onsets (that is, saccade offsets), these latency values do not anticipate saccade onsets, although a small number of neurons had latencies around −50 ms (for example, Fig. 5c). To cross-examine other evidence for the negative-latency neurons, we pooled all time-resolved analyses for only these neurons (Extended Data Fig. 4). Their self-consistency time courses crossed over before fixation onset, by construction (Extended Data Fig. 4e). The subset of face neurons responded early to nonface-to-face saccades (Extended Data Fig. 4a,b) as did face neurons overall (Fig. 2). RF modeling analyses, described below (Figs. 6 and 7), also allowed negative-latency neurons suggestive evidence for predictive responses (Extended Data Fig. 4c,d,f–j). Thus, a minority of neurons might respond before fixation onsets, although not before saccade onsets as in classical predictive remapping14,20,22.

Fig. 6: Computational models predicted per-fixation responses from stimulus features and revealed gaze-locked RF.figure 6

a, Illustration of image-computable models for per-fixation free-viewing responses. The models comprised a pretrained, fixed neural network (NN) feature extractor and a different linear mapping fit to each neuron’s responses. Model inputs were fixation-centered image patches, shown here for an example fixation sequence (blue line and crosshairs). b, Normalized model fit per area for zeroth fixations and nonzeroth fixations (left and right in each pair). Larger dots and error bars indicate overall mean ± bootstrap 95% CI; smaller dots indicate means per monkey. c, Illustration of model-based RF mapping. The eye-centered image per fixation was partitioned into 2-dva patches on a grid of offsets centered on the fixation indicated by a cross. The NN feature extractor converted each patch into a feature vector. Model fit to neuronal responses was separately assessed at each offset from the fixation. d, Model-inferred RF for an example CIT neuron using fixation-onset-aligned responses. e, Model-inferred RFs for the same neuron using 50-ms sliding time bins aligned to saccade onsets. The arrow indicates the time bin centered on saccade onsets (−25 to 25 ms). The two rows correspond to RFs anchored to the pre- or postsaccadic fixation (RFs 1 and 2). f, Quantification of RF presence over time. Colors indicate RF 1 (green), RF 2 (purple) and the midpoint control (magenta); lines and shading, the mean ± bootstrap s.e.m; horizontal bars, statistically significant differences from the midpoint control (P < 0.01, one-tailed permutation test, FDR-corrected).

Fig. 7: Saccade-normalized RF models showed no perisaccadic RF expansion or history integration and limited prediction.figure 7

a, Saccade vectors (left) were aligned, rotated and scaled to a normalized vector (right) to place RFs 1 and 2 in a joint map. b, Schematics for how the joint map represents retinotopic RFs and putative perisaccadic properties—predictive forward remapping (‘prediction’), perisaccadic expansion (‘exp.’) and history integration (‘history’). c, Model-based RF maps per area, averaged over neurons. The bottom row shows maps from models using original stimulus features but match-saccade responses to control for RF 1 contents. Both maps per area use the same value range, indicated to the lower left of the bottom plots; the color map is otherwise the same as in Fig. 6d,e. d, RF model fits at locations indicated by the colored lines to the right of panel c. Solid and dashed lines, respectively, correspond to original and match-saccade maps. The plot and associated statistical tests adjust the match-saccade model fits to correct for imperfect saccade matching (see the Methods for rationale and details). Horizontal bars indicate statistically significant differences between original and match-saccade RF 2 values (P < 0.01, one-tailed permutation test, FDR-corrected). Lines and shading indicate the mean ± bootstrap s.e.m.

Short of negative latency, fixation-specific responses could be faster than image-onset responses. We directly compared the fixation- and image-onset latencies in 787 neurons for which we could estimate both with bootstrap s.d. < 25 ms. The two latencies covaried across neurons (Fig. 5c; r = 0.27, P < 10−4). Fixation-onset latencies were statistically smaller than image-onset latencies by 19 ± 29 ms (mean ± s.d. across neurons; P < 10−62, one-tailed Wilcoxon signed-rank test), a modest population-level difference below the variance of individual estimates.

We derived latency as an indirect measure from self-consistency. To more directly test for predictive remapping, we identified ‘matched saccades’, pairs of saccades whereby a monkey started from nearby (≤1 dva) locations to acquire divergent (≥4 dva) targets (Fig. 5d). Matched saccades provided natural experiments to control, per saccade, for the presaccadic retinotopic stimulus. In the face-specific analysis, we looked for a nonface-to-nonface saccade (match) for each nonface-to-face saccade (template). Figure 5e shows category-average responses as in Fig. 2e but for matched saccades. Figure 5f shows the fraction of neurons with significantly higher responses to nonface-to-face than nonface-to-nonface saccades, separately per monkey. Without matching saccades, this fraction exceeded the chance level before fixation onsets in all monkeys (Fig. 5f, left), consistent with the population-level statistics (Fig. 2e). With matched saccades, more neurons than chance showed statistical differences only after fixation onsets (Fig. 5f, right). Thus, individual face neurons did not show significant predictive responses after accounting for the presaccadic stimulus.

Leveraging matched saccades, we devised an analogous control for the self-consistency analysis (Fig. 5g–i). For each (current) return-fixation pair, we tried to match each constituent saccade as above and further required the two match saccades not to comprise a (current) return-fixation pair. If prefixation responses contained predictive components, possibly mixed with retinotopic components, prefixation self-consistency should be higher for actual return-fixation pairs than nonreturn match pairs. Figure 5h, left, compares previous- and current-return self-consistency, showing neuron-level statistical test results to complement the population-level tests in Fig. 3h. Figure 5h, right, and 5i compare matched saccades and show that no individual neuron had significantly higher self-consistency before fixation onsets than explained by presaccadic inputs. Thus, we did not find feature-selective predictive remapping responses in individual ventral visual neurons.

Computational models predicted per-fixation responses

The results so far showed that, during free viewing, ventral visual neurons were selective to stimulus features in space and time just as during passive viewing, encouraging us to test whether deep neural network (DNN)-based, image-computable models for passive-viewing responses38 could also predict free-viewing responses. We adapted these models to predict per-fixation responses from an image patch (for example, 4 × 4 dva) anchored to the fixation (Fig. 6a). A pretrained DNN (a vision transformer (ViT)39) converted each image patch into a feature vector. We fit a linear mapping from feature vectors to neuronal responses and evaluated predictions using cross-validation (CV) across images.

While previous work validated similar models on trial-averaged passive-viewing responses, we found the models also predicted single-trial free-viewing responses (Fig. 6b and Extended Data Fig. 5). Models captured a similar fraction of the explainable (that is, self-consistent) responses during passive-viewing-like zeroth fixations and free viewing (nonzeroth fixations). Model fits varied across visual areas, although areas were not directly comparable due to variations in images and data size (number of fixations) and modeling choices such as the DNN layer and image-patch size.

Models revealed retinotopic RFs

The models provided a means to infer neuronal RFs during free viewing: models should predict a neuron’s responses using stimulus features within the RF, but not outside. To test this, we partitioned the scene centered on each fixation into a grid of 2 × 2-dva image patches at 1-dva intervals (Fig. 6c). A model used image patches at each offset from fixation to predict neuronal responses across fixations; separate models were fit on patches at different offsets. We empirically found it helpful to regularize the models by sharing linear mapping coefficients (representing a neuron’s feature selectivity) across offsets, resulting in a metric reminiscent of reverse correlation. This procedure generated a model-fit map that should correspond to a neuron’s spatial RF. Using simulated responses, we validated that this mapping procedure recovered the location and approximate size of ground-truth RFs (Extended Data Fig. 6).

Figure 6d shows the model-mapped RF for an example CIT neuron using fixation-onset-aligned responses. The RF contains a focal region of high model fit about 3 dva across. RFs inferred from free-viewing data were consistent with conventionally mapped RFs in this and other arrays (Extended Data Fig. 7; example neuron from Pa array 1). All well-fit RFs are summarized per array in Extended Data Fig. 8.

The model-based mapping method allowed us to directly examine remapping during natural image free viewing. We modeled responses in sliding time bins aligned to saccade onsets and used image patches anchored to the pre- or postsaccadic fixation point (FP 1 or FP 2) to map RFs in the pre- or postfixation retinotopic space (RF 1 or RF 2). Figure 6e shows the two sets of spatiotemporal RFs for the example neuron in Fig. 6d. The RFs were focal and shifted from RF 1 to RF 2 around 75–125 ms after the saccade onset, consistent with typical latencies in CIT plus an average saccade duration around 50 ms.

To summarize the RF dynamics across neurons, we quantified RF presence using its consistency over CV splits, regularized via Gaussian fits (Fig. 6f). Each time bin and retinotopic map (for example, RF 1 or RF 2) was quantified independently to allow for potential RF shifts nonparallel to the saccade11,12,40. Across visual areas, RF 1 was more evident before the saccade, and RF 2 was more evident after the saccade (Fig. 6f).

Similar to the self-consistency in Fig. 3h, the RF evidence was nonzero even outside the corresponding fixation period. This could indicate predictive RF remapping, memory responses or shared features between successive fixations. To distinguish these possibilities, we evaluated control RFs anchored to the midpoint of FPs 1 and 2, reasoning that the midpoint should contain common features between FPs 1 and 2. RF 2 evidence exceeded the midpoint control only after saccade onsets at the population level (Fig. 6f, top horizontal purple bars), even when we restricted the analysis to well-fit neurons (normalized model fit ≥ 0.5; Extended Data Fig. 9a). Thus, RF modeling did not indicate predictive remapping beyond retinotopic features shared across each saccade.

Modeling showed no perisaccadic RF prediction or expansion

Figure 6 represented RFs 1 and 2 in different maps because saccade vectors varied during free viewing. To more intuitively visualize several hypotheses about perisaccadic responses, we aligned saccades by shifting, rotating and scaling them into normalized vectors such that RFs 1 and 2 were located at relative positions 0 and 1 in a joint map (Fig. 7a). Regions in this joint map readily represent three hypotheses about perisaccadic responses (Fig. 7b): predictive forward remapping14,20,22, perisaccadic RF expansion41 and viewing-history integration (Fig. 4c). RFs in the joint map were quantified using model fit. To control for the RF 1 stimulus, we again compared original (template) and match saccades analogous to Fig. 5d–i. RF maps for the original saccades revealed both RF 1 and RF 2 (Fig. 7c, top row). For match saccades, models used stimulus features along the original saccades to predict match-saccade responses, so the maps should show a weaker RF 1 (because matching was imperfect) and no RF 2. The results confirmed this expectation (Fig. 7c, bottom row).

We tested for perisaccadic expansion via the RF evidence (that is, model fit) at the midpoint between RFs 1 and 2 (relative position 0.5; Fig. 7b–d). Wang et al.41 showed that LIP neurons responded to midpoint stimuli at times between peak RF 1 and RF 2 responses. For neurons across ventral visual areas, the midpoint RF evidence peaked with RFs 1 and 2 (Fig. 7d), unlike LIP RF expansion and consistent with the spatial spread of classical RFs or feature similarity between the midpoint and RF contents.

The maps suggested some evidence consistent with RF 2 prediction (Fig. 7b,c, top). This evidence was not fully due to stimulus autocorrelation, which should cause symmetrical artifacts corresponding to prediction and history integration; instead, there was stronger evidence for a predictive RF 2 than an RF 1 memory (Fig.

留言 (0)

沒有登入
gif