Most of what we know about the role of dopamine in learning comes from simple, highly structured behavioral tasks — an animal makes the right choice and receives a reward, triggering a burst of dopamine. In this way, dopamine is an important teaching signal, suggesting which behaviors to keep pursuing and which to avoid.
But in the real world, behavior is rarely, if ever, so highly structured, and most of what we do isn’t immediately followed by tangible rewards. “The behaviors I am most fascinated by, like learning to speak or play an instrument, are not learned through the pursuit of food or juice rewards,” says Vikram Gadagkar, a neuroscientist at Columbia University and an investigator with the Simons Collaboration on the Global Brain. Much less is known about how the brain guides these kinds of spontaneous, self-motivated behaviors, and whether everything we’ve learned about dopamine function still applies.
“The brain did not evolve in the context of such structured and discrete tasks,” says Sandeep Robert Datta, an SCGB investigator and neuroscientist at Harvard University. “Most of what we do is generate spontaneous behaviors in the absence of explicit rewards, so there is great value in studying behaviors that more closely resemble these natural conditions.”
Gadagkar, Datta, and other SCGB investigators are now beginning to do just that: explore the role of dopamine in more ethologically relevant tasks. These recent studies take advantage of modern tools for measuring neural signals, monitoring animal activity, and analyzing data, affording them unprecedented access to dopamine dynamics in more natural contexts.
Their early findings are starting to suggest that dopamine is just as important in guiding natural, spontaneous behaviors as it is in more structured, reward-based laboratory tasks. “The neural circuits that have been so extensively studied in the context of external rewards might be the same circuits that are helping you learn all kinds of natural behaviors,” Gadagkar says. “If that is true, that is a big insight.” Suddenly decades’ worth of research on the role of dopamine in structured, reward-based tasks may also apply to behaviors driven by more abstract or internal forms of reward. Through such insights, these studies are now paving the way for more holistic models of dopamine function and behavioral control.
Part of the reason dopamine researchers have relied on structured tasks with clear, extrinsic rewards is that they generate large, easily detectable dopamine responses. These neural signals can then be aligned to each task event and averaged across trials to achieve a sufficient signal-to-noise ratio — a requirement for extrapolating any sort of statistical relationship between the brain and behavior.
To observe a similarly rich signal from an intrinsically motivated behavior, Gadagkar has taken advantage of the unique properties of the zebra finch birdsong. “Birdsong is a completely natural and spontaneous behavior, but also extremely stereotyped and well defined, which is a rare combination,” he says. Without having to be prompted to complete repeated trials, the bird spontaneously performs multiple renditions of its song, each with slight variations that can be correlated to neural activity.
How do birds learn and maintain their songs in the absence of ‘primary rewards’ like food or liquid? “We say that songbirds learn to sing by trial and error,” Gadagkar says, “but we didn’t actually know where this error signal was.” The process of learning through trial and error has been formalized under the framework of reinforcement learning, where rewarding behaviors are reinforced and unrewarding behaviors are devalued. Dopamine plays a major role in this process by representing “reward prediction errors” — the difference between the reward an animal thought it would receive and what it actually got. The greater and more surprising the reward, the more dopamine is released in the brain, providing an important teaching signal as the animal searches for the best strategy.
But recent years have seen a slew of studies revealing new and more nuanced roles for dopamine in shaping behavior. “The traditional framework of reward prediction errors has been a really helpful starting point, but we’ve since realized that what dopamine encodes is a bit more complex than that,” says Adrienne Fairhall, a neuroscientist and SCGB investigator at the University of Washington.
This gets particularly complicated in behaviors where there are no explicit rewards or punishments to learn from. When practicing the piano, for example, you don’t receive a treat every time you play the right note. Instead, you develop an internal goal for what you want the piece to sound like, and you react based on how close you’ve come to meeting that goal. But how does the brain perform this type of self-evaluation? Gadagkar’s first guess was dopamine.
In a study published in Science in December 2016, Gadagkar recorded the activity of dopamine neurons in the ventral tegmental area of adult zebra finches as they sang their song. In these initial experiments, he distorted occasional song syllables with auditory feedback, giving the birds the impression they had made a mistake in their performance. The activity of basal ganglia-projecting dopamine neurons was significantly suppressed after these distorted syllables, consistent with a worse-than-predicted outcome, or performance error.
“It seemed very exciting to next explore what dopamine was doing during natural errors or variations in the bird’s song, to see whether we could find evidence that the animal monitored its own performance,” Fairhall says. The two struck up a collaboration and soon followed up on these experiments, now equipped with the Fairhall lab’s arsenal of data analysis tools. In an update published in Cell Reports in March 2022, Gadagkar and Alison Duffy, a postdoc in the Fairhall lab, revisited the activity of these basal ganglia-projecting dopamine neurons, this time using more sophisticated fitting methods to examine how the activity correlated with natural variations in the bird’s song.
The researchers confirmed that dopamine neuron activity was correlated with recent, and not future, song variations, suggesting that the neuromodulator is used to evaluate whether the bird has just sung the right note. Dopamine neuron spiking was also highest whenever the bird sang a specific ‘target’ version of the song. The researchers say these dynamics could help the bird correct itself by shifting the song toward the target version. In a successful rendition, they could allow it to maintain its performance.
“This suggests that the dopaminergic circuits extensively studied in the context of reward prediction error may be the same ones that help you learn and refine all sorts of complex, self-motivated behaviors through ongoing calculations of performance error,” Gadagkar says. In other words, the same dopaminergic systems that shape how you respond to a sudden juice reward may also be active when you hit the right note on a piano. And if that’s true, then there are likely many other promising leaps neuroscientists can make in how they interpret findings from traditional behavioral paradigms.
“Perhaps more philosophically, it hints that when you find something rewarding, it doesn’t really matter if it comes from the outside or the inside,” Gadagkar says. “We can be so focused on external rewards sometimes, but it’s nice to consider that internal rewards may really function in the same way.”
Datta is also interested in how dopamine shapes spontaneous behavior independent of extrinsic rewards. But he’s taken on the extra challenge of studying these concepts in freely moving mice. “Open field behavior has generally been viewed as trivial locomotor activity, but it’s actually a rich process driven by natural internal goals,” he says.
If it’s not trivial, it at least seems inscrutable to most. Datta’s group developed an automated technique, known as motion sequencing (MoSeq), which employs modern computer vision and machine learning tools to track and analyze the spontaneous mouse behavior. Using this approach, they were able to break down the animals’ activity into sub-second motifs or ‘syllables’ — every turn, rear, dive, scrunch, acceleration, pause or bout of grooming is identified and associated with a neural signal. “It’s an elegant approach to studying animal behavior without having to impose a task structure,” Gadagkar says.
Using fiber photometry and the genetically encoded dopamine indicator dLight, the researchers measured dopamine levels in the dorsolateral striatum (DLS), a part of the basal ganglia involved in shaping behavioral sequences. As the mice moved around their cages, the researchers noticed a flicker of dopamine whenever an animal transitioned between behavioral syllables, according to research published in Nature in January. “It’s like there’s a rhythm in dopamine that roughly matches switching from one action to the next,” Datta says. “But what was really surprising was that the transients themselves were super variable for each syllable.” That is, a syllable where the animal was sitting still and a syllable where it was running around could both show similar bursts of dopamine. This suggests that DLS dopamine signals convey something other than the identity or kinematics of a movement.
To determine what these signals encode, Datta’s group devised a system for mimicking the dopamine fluctuations through calibrated closed-loop optogenetic stimulation. They used this setup to generate a burst of dopamine during individual syllables and then evaluated how it affected the mouse’s behavior. Triggering dopamine didn’t make the mouse initiate specific behaviors or change the kinematics of its movement, but it did seem to affect the sequencing of the mouse’s behavioral syllables over time. Behavior was much more variable in the few seconds after a big wave of dopamine, with the mouse seemingly exploring new syllable sequences for that brief period of time. But when the researchers looked at the behavior that occurred over the next few minutes, it was clear that syllables coinciding with the highest dopamine transients occurred more frequently over time, while syllables that coincided with low-amplitude dopamine transients occurred less. Even without a task structure, sensory cues or exogenous rewards, the mice appeared to build behavioral sequences that maximized dopamine. “This suggests that self-generated, spontaneous behaviors aren’t trivially reflexive, but seem to take advantage of the same online learning mechanisms that support goal-oriented, reward-driven tasks,” Datta says.
It’s as though our ongoing behavior is shaped by a constant self-tutoring process, which Datta says is different from the way scientists think spontaneous behavior is generated, from a neural perspective. Even freely moving animals are guided by complex motivational circuits, so perhaps we can do a better job of inferring what an animal’s goals actually are.
“I think that’s the real mystery,” Datta says. “You look at your cat or your dog or a mouse running around, and you want to know why it’s doing what it’s doing, what competing motivations it might have, and how those internal motivations actually shape its behavioral output. I think our work is a baby step toward being able to draw those kinds of inferences.”
As these latest findings continue to inform our view of dopamine, they also shine a light on how much is left to learn. “No one has a fully fleshed-out normative model to explain what dopamine is doing during spontaneous behavior,” Datta says. “It’s clear that these dopamine fluctuations can structure behavior — we’ve made that phenomenological observation — but now the question is: What do they mean? What’s generating them? What errors do they reflect, how are they computed, and how are they used for learning?”
Neuroscientists are increasingly looking for this more holistic view, in terms of what dopamine is doing and how the whole brain comes together to do it. “Ultimately, any behavior involves some combination of motivation, goals and movement, and that inherently means the entire system is engaged,” Datta says. “We tend not to think about that and how it might all work together, but that’s actually the future of all of this.”
To help explore this broader question, Datta turns to computational neuroscientists like Ashok Litwin-Kumar, an SCGB investigator at Columbia. In a preprint shared on arXiv in July 2022, Litwin-Kumar presents an updated reinforcement learning model that takes a brainwide view of behavioral control. To better capture all the possible contributors to dopamine activity, Litwin-Kumar’s group adds a term to the model which they call “action surprise.” The additional parameter is a measure of how unexpected an action is, relative to the basal ganglia’s current behavioral policy. The premise is that within the brain, there are many competing ideas about what an ideal action might be, and what an animal does is not always what the basal ganglia thinks it should do. The study shows it’s actually beneficial for the basal ganglia to keep track of this as it monitors and shapes behavior.
Say an animal is about to take an extremely unexpected action, perhaps driven by a sudden sensory input to the motor cortex that the basal ganglia is not yet privy to. If this action turns out to be rewarding, the brain should reinforce that behavior in this new context. The action-surprise term helps the basal ganglia learn this relationship more quickly. On the other hand, if the unexpected action turns out to be a bad choice, then this term helps the basal ganglia preserve its current model, so it can continue to make helpful suggestions about what to do next. In either scenario, the action-surprise term provides valuable information that makes the learning process more efficient.
Litwin-Kumar says this might help explain why Datta observed such large dopamine transients in the basal ganglia just before mice engaged in more variable behavior. Perhaps the decision to produce these unexpected behavioral syllables was generated somewhere else in the brain and then relayed to dopamine circuits to produce an action-surprise signal in the DLS.
“All of this points to the fact that studying more rich and ongoing behaviors opens up a lot of interesting questions about how different brain systems interact to generate behavior, which are much harder to get at with simpler behavioral paradigms,” Litwin-Kumar says.
Despite raising new issues, the findings indicate a promising common ground. “The fact that a reinforcing role for dopamine has been seen in so many different contexts suggests that similar circuits, mechanisms and molecules are relevant in low-dimensional structured tasks and in high-dimensional, internally driven, ethologically relevant behaviors,” Datta says. “It says that we can actually have a useful conversation across all of these methods, and I think that is great news.”