
Nature News · Feb 25, 2026 · Collected from RSS
MainLearning is the product of changes in the strength of synaptic connections between neurons6,7,8,9,10,11,12,13. Synaptic modifications can have difficult-to-predict effects on network output, particularly in complex hierarchical networks such as the brain. The challenge of determining how individual synapses should be altered to improve task performance is known as the credit assignment problem14,15,16,17,18. Whereas this problem is effectively solved in artificial neural networks (ANNs) by the backpropagation-of-error algorithm19, how credit assignment is solved in the brain remains unknown14,15.Recent theoretical work has proposed several models by which biological circuits could solve credit assignment, including target learning and backpropagation-like algorithms1,2,3,4,5,20,21. Central to both artificial and biologically inspired solutions to credit assignment is the vectorization of instructive signals, as opposed to the broadcasting of a single scalar teaching signal14. Effective learning requires, in addition to vectorization, instructive signals to be separable from feedforward inputs to prevent interference15. In ANNs, this is achieved via temporal separation, which has long been thought to be biologically implausible. One hypothesis is that in cortex, credit-related information is spatially, rather than temporally, segregated in the apical dendrites of pyramidal neurons15. This aligns with anatomical and circuit evidence that feedforward inputs are received perisomatically and feedback inputs are received in the distal dendrites22,23,24,25,26,27,28,29,30,31. However, direct evidence regarding the subcellular mechanisms of credit assignment is lacking.Vectorized teaching signals at the dendritic level should meet four experimentally testable conditions. First, dendritic activity should contain information that is not present in somatic activity alone (although somas could theoretically transmit gradients using qualitatively different spiking patterns2,4,32, the cable properties of dendrites predict some level of independence between somatic and dendritic activity). Second, dendritic activity should encode information about task performance that could serve as instructive signals, such as reward and error representations. Third, dendritic activity should reflect the contribution of that neuron to task performance (that is, the reward function). Fourth, disrupting vectorized instructive dendritic signals should impair learning.Specifying a reward function using a BCI taskEvaluating credit assignment in biological neural networks has thus far proved impossible14,15. Teaching signals can only be defined relative to a reward function that maps neural activity to task performance. It is unclear whether such functions are explicitly represented in the brain. Even if they are, experimenters are blind to their specific formulation in terms of neural activity15. Neurofeedback brain–computer interface (BCI) tasks present a potential solution to this problem by directly coupling neural activity to task performance, thereby allowing the experimenter to specify the reward function to be optimized14,20,21. Previous studies have shown that mice are able to learn BCI tasks using a variety of feedback stimuli and brain areas and that learning induces changes in the activity of the neurons controlling the BCI, including in the hippocampus and various sensory and motor cortices33,34,35,36,37,38,39. Here we leveraged a visually guided neurofeedback BCI task in cortical pyramidal neurons to test subcellular mechanisms for error and reward-related signalling (Fig. 1a–c and Supplementary Figs. 1 and 2). We trained head-fixed mice under a 2-photon microscope to control the activity of two spatially intermingled sets of GCaMP7f-labelled layer 5 pyramidal neurons, in the retrosplenial cortex (RSC), designated P+ and P− (selection criteria in Extended Data Figs. 1 and 4b and Methods). The difference in mean somatic GCaMP activity of P+ versus P− neurons was coupled to rotation of a visual grating relative to a rewarded target angle33,34,35,36,38,39 (Fig. 1d–f and Supplementary Data Fig. 1). We selected RSC owing to the optical accessibility of layer 5 and previous demonstration of independent dendritic events in this area40. We recorded GCaMP activity at 15 Hz in the proximal trunk dendrite as a proxy for somatic activity; this allowed imaging of many neurons while reducing signal contamination owing to the more precise spatial footprint and faster signal kinetics of the apical trunk41,42,43. We measured task performance with two metrics: accuracy, which represented the fraction of rewarded trials; and speed, which represented the number of rewards obtained per minute. Mice (n = 6) learned the task by both metrics (Fig. 1g and Extended Data Fig. 2 and 3).Fig. 1: Mice learn a neurofeedback BCI task through the differential regulation of P+ and P− neurons.a, Schematic of the BCI setup. Mice were head-fixed and imaged under a 2-photon (2P) microscope and free to run on a cylindrical treadmill. Two user-defined populations of GCaMP7f-labelled layer 5 (L5) pyramidal neurons in RSC were imaged at the proximal apical trunk: P+ (red) and P− (blue) were selected to control the rotation of a Gabor patch. P0 neurons were designated as all other neurons in the field of view. Single frames were online-registered (motion-corrected). Activity in P+ neurons rotated the patch clockwise, towards the target angle of 90°. Activity in P− neurons rotated the Gabor patch stimulus counter-clockwise, towards a 0° angle. b, Schematic of the mapping between P+ and P− activity, stimulus angle, target activity and error. Error was the distance between current and target activation. The angle represents a binned (7 bins, 15° apart, from 0° to 90°) linear mapping between the mean activity in P+ neurons minus the activity in P− neurons. c, Trial structure: mice had 28 s to reach target activity and receive a reward, delivered 1 s later. In successful trials, the 90° Gabor patch was shown for 2 s, followed by 1 s of black screen presentation. In unsuccessful trials, a 3 s black screen was presented before the onset of the next trial. d, ΔF/F0 traces as recorded live for P+ (red) and P− (blue) neurons. Vertical dashed lines and triangles represent timepoints where the mouse reached target activity. e, Mean activity for the red (P+) and blue (P−) traces shown in d. The black trace shows the arithmetic subtraction of P+ and P− neurons (z-scored). The orange trace shows the corresponding visual stimulus angle as presented to the mouse. f, Mean ΔF/F0 for P+ and P− activity aligned to the time in which the mouse reached target activity (dashed, vertical line and black triangle) for the session highlighted in d,e. Reward was delivered 1 s later (solid vertical line with water reward). Shaded areas represent s.e.m. g, Mean performance over days quantified as the fraction of successful trials over the total number of trials and as the number of rewards per minute (one-way repeated measures ANOVA, P = 5 × 10−4 (accuracy) and P = 0.002 (rewards per minute); n = 6 mice). The dashed horizontal red line represents chance level for accuracy performance (Methods). Shaded areas represent s.e.m. h, ΔF/F0 traces for the same P+ and P− neurons on training day 1 and training day 14. i, Calcium transient frequency for P+, P− and P0 neurons across the 14 days of training normalized to the activity on day 1. All neurons were tracked over the full 14 days of imaging. Two-way repeated measures ANOVA, P = 0.012, P = 0.004 and P = 9.3 × 10−4 for the effect of population identity, days and an interaction between population identity and days. After Tukey’s multiple comparison, P = 0.027 (P+ versus P− neurons), P = 0.95 (P+ versus P0 neurons) and P = 0.01 (P− versus P0 neurons). n = 6 mice. Shaded areas represent s.e.m.Source dataFull size imageWe compared activity levels of P+ and P− populations, as well as the population of surrounding neurons that were not directly involved in the rotation of the stimulus (termed P0), across days of task performance. We imaged the same neurons longitudinally throughout all experiments. We found that learning was accompanied by the differential regulation in the activity of P+ and P− neurons over days (Fig. 1h,i), with P+ neurons maintaining their activity levels while P− neurons were downregulated. Whereas, on average, changes in activity in P0 neurons resembled changes in P+ neurons (Fig. 1i), selecting the subpopulation of P0 neuron with matching activity levels of P+ and P− neurons on day 1 revealed that changes in activity in P0 neurons fell in between those of P+ and P− neurons (Extended Data Fig. 4). As the most active neurons on day 1 were also those that were most strongly downregulated (Extended Data Fig. 4c), our results are consistent with a model of learning by sparsification, an energy-efficient solution to the task44. Increases in task performance were not correlated with changes in locomotion across days (Extended Data Fig. 3). Moreover, the P+ and P− populations were spatially intermingled, and had the same GCaMP transient frequency on day 1 (Extended Data Figs. 1 and 4a), ruling out the possibility of learning the task by simply engaging a non-specific gain modulation mechanism.Dendrites contain information not found in their somasTo determine whether apical dendritic activity contained information that was not encoded in parent somatic activity alone, we used an electrically tunable lens to semi-simultaneously (15 Hz per plane) record activity in proximal and distal trunk dendrites across learning (Fig. 2a). We paired proximal and distal dendrites on the basis of the Pearson correlation of their GCaMP signals, thresholded at r = 0.6 as in previous studies41,42,43. Previous work in brain slices demonstrated that dendritic GCaMP signals are larger when current is injected in the distal trunk and smaller when current is injected at the