Biological Mechanism of Reward-Based Learning

By Suk Joon Lee

How do we use our experience to guide our future behavior and how does our brain allow us to do so? Among our attempts to answer this question, reinforcement learning has been one of the most successful learning theories in the field of neuroscience as well as computer science. In this model, an agent uses a discrepancy between the expectation and the actual outcome of its action in order to inform its future decision. We call this discrepancy, reward prediction error (RPE) because it equals the difference between the reward (outcome) and the prediction (expectation). If there is a positive RPE (better outcome than expectation), the agent is more likely to repeat the action that resulted in this positive RPE.

Does this actually happen in the brain? Many years of neuroscience research have shown that the midbrain dopamine neurons, indeed, show the patterns of activity consistent with RPE (although there has been recent controversies over this topic). With this information, canonical reinforcement learning models postulated that dopamine neurons provide a teaching signal in the form of dopamine release to their downstream neurons in the basal ganglia, another brain region that has been implicated in learning. Provided that dopamine receptors are G-protein coupled receptors, dopamine is thought to guide learning via dynamic modulation of neuronal protein kinase A (PKA), an enzyme that has been shown to be involved in synaptic plasticity in many ex vivo experiments with brain slices. However, this fundamental relationship between RPE encoding dopamine and PKA activity of the basal ganglia neurons remains untested in behaving animals.

Schematic diagram describing fluorescence lifetime photometry to measure PKA activity in the neurons of a behaving mouse.

Schematic diagram describing fluorescence lifetime photometry to measure PKA activity in the neurons of a behaving mouse. An optical fiber relays the fluorescence signal from the brain of the mouse. By measuring how fast fluorescence of the PKA sensor decays, we can estimate the net PKA activity (balance between PKA and phosphatase) in the neurons expressing the sensor.

To test this idea, we developed fluorescence lifetime photometry to monitor PKA activity in the neurons of a mouse performing a reward based learning task. Briefly, fluorescence lifetime photometry uses an optical fiber to capture the fluorescence from the biological sensor expressed in neurons and fast electronics to capture the decay of this fluorescence, which reflects the level of PKA activity. With this technique, we measured PKA activity in the spiny projection neurons (SPNs) of the nucleus accumbens, a sub-region of the basal ganglia that is heavily innervated by the midbrain dopamine neurons. Combining this technique with fiber photometry and optogenetics, we found that dynamic positive and negative modulation of dopamine signal resembling RPE during learning was necessary and sufficient to explain PKA activities in the downstream neurons. The modulation of PKA in SPNs that express type-1 and type-2 dopamine receptors was dichotomous such that in each cell class it is selectively sensitive to increases and decreases in dopamine, respectively, which occur at different phases of learning. Thus, PKA-dependent pathways in type-1 and type-2 dopamine receptor expressing SPNs are asynchronously engaged by dopamine signals to promote different aspects of reinforcement learning: the former responsible for the initial phase of learning and the latter responsible for the later phase of learning.

Our findings provide an in-vivo evidence for the presumed connection between dopamine and PKA in the basal ganglia. Furthermore, it provides additional insight into the interesting biological mechanisms of learning where a single neurotransmitter signal in the same brain region is transformed into different downstream biochemical signals at different time points due to the difference in neurotransmitter receptor properties.

Suk Joon Lee is an MD/PhD Student in the lab of Bernardo Sabatini at Harvard Medical School.


Learn more in the original research article:
Lee, S. J., Lodder, B., Chen, Y., Patriarchi, T., Tian, L., & Sabatini, B. L. (2020). Cell-type-specific asynchronous modulation of PKA by dopamine in learning. Nature, 10.1038/s41586-020-03050-5. Advance online publication. https://doi.org/10.1038/s41586-020-03050-5

News Types:  Community Stories