Rumor has it that the great pianist Arthur Rubinstein was taking a walk in New York City one day when a pedestrian approached him and asked, “How do I get to Carnegie Hall?” Rubinstein pondered this question for a moment and replied, “Practice!” Indeed, many of our motor skills, whether playing tennis or playing the piano, are not innately programmed but acquired through a process of trial and error.
Much of our understanding of the neural mechanisms underlying trial-and-error learning comes from the study of animals learning simple tasks through primary rewards such as food or juice. A rat in a box with a lever will learn, by trial and error, to press the lever for a food pellet. Classic experiments in the 1950s by James Olds and Peter Milner of McGill University, in Canada, showed that we can remove food from the equation if we hook up the lever to an electrode that stimulates dopamine-releasing neurons in the rat’s brain. The rat will then readily learn to press the lever — not for food, but for a brief pulse of dopamine. Rats can get addicted to such tasks, compulsively pressing the lever to the point of exhaustion and even death. Drugs of addiction often act by hijacking the dopamine system, but what information do dopamine neurons encode in the brain?
To answer this question, Wolfram Schultz of the University of Fribourg, in Switzerland, and his colleagues looked at monkeys. They recorded the activity of dopamine neurons that project to the basal ganglia as the monkey was being trained to associate a flash of light with a juice reward delivered one second after the light turned on. Initially, before learning the association, the neuron increased its firing whenever the monkey got a drop of juice. After learning, however, the neuron fired more upon the presentation of the light instead of at the time of juice delivery. And if the scientists withheld the expected reward after the light, the neurons stopped firing. Thus, these dopamine neurons fired in response to unexpected rewards or reward-predicting cues, and fell silent when an expected reward failed to arrive. In other words, these neurons encode ‘reward prediction error,’ the difference between actual and predicted rewards. (For more on reward prediction error, see ‘Dopamine Cells Influence Our Perception of Time.’)
These increases and pauses in dopamine neuron firing are thought to positively and negatively reinforce preceding motor actions, leading to learning. When a rat happens to press a lever and get an unexpected food pellet, the resulting pulse of dopamine encourages the rat to press the lever again.
Many of our motor skills, however, are not learned for immediate food or juice rewards but by matching ongoing performance to internal goals. Imagine that you are practicing a song on the piano and you want to play C sharp as the third note. If you happen to play F sharp instead, your brain will immediately recognize this as a mistake. There is nothing intrinsically good or bad about the sound of C sharp; all that matters is whether or not you wanted to hear it as the third note. How does the brain evaluate such internally guided motor sequences?
To try to understand how the brain evaluates performance, we study the zebra finch, a songbird that learns to sing a highly stereotyped motor sequence — a song with two to seven syllables — by trial and error. Juvenile zebra finches listen to their tutor’s song (often the father’s), store this song as a memory trace, or template, and spend the next two or three months learning to imitate it. Juvenile finches need to hear themselves while learning the song, indicating the importance of auditory feedback.
The songbird brain contains a discrete set of interconnected brain regions, called the ‘song system,’ that is dedicated to song learning. Similar to mammals, songbirds have dopamine neurons in the midbrain nucleus ventral tegmental area (VTA) that project to a song system nucleus called Area X (VTAx neurons), which resembles the mammalian basal ganglia. Do these VTAx neurons convey an error signal for song learning, even though song is internally evaluated? To test this idea, we recorded data from VTAx neurons in a singing bird. We used real-time song analysis software and auditory feedback to distort a specific song syllable 50 percent of the time, tricking the bird into thinking it sometimes sang the wrong syllable.
We found that VTAx neurons suppressed firing after distorted syllables, indicating the bird performed worse than it expected to. Remarkably, the same neurons increased firing at the precise moment in the song when a distortion might have occurred but did not, indicating a better-than-predicted performance. These precisely timed activations suggest that birds predict how well they will sing based on recent practice. If this interpretation is correct, then the strength of dopamine-neuron activity should depend on the probability of distortion. In other words, if a specific note is likely to be distorted, an undistorted syllable is more surprising, and the dopamine neurons should be more active. On the other hand, if the probability of distortion is low, an undistorted syllable is less surprising, and the dopamine neurons should be less active.
To test this hypothesis, we performed a two-target experiment where we distorted two different syllables of the song, the first syllable 50 percent of the time and the second syllable only 20 percent of the time. As expected, we found that dopamine neurons were more active in response to the more surprising undistorted first syllable than the less surprising undistorted second syllable. We published the results in Science in December 2016.
These findings suggest that dopamine neurons convey an error signal for song learning and maintenance. In addition, they show that birds not only compare their songs to the template, their ultimate goal, but also to a predicted performance quality, which is updated based on recent practice. More broadly, these results suggest that the principle of dopaminergic reward prediction error, which has been so extensively investigated in neuroscience, might generalize to the learning of complex motor sequences that are acquired by trial-and-error matching of ongoing performance to internal goals rather than in the pursuit of food or juice rewards. When you are practicing your song on the piano, and happen to hit just the right note, perhaps a brief burst of dopamine in the basal ganglia encourages you to hit that same note again the next time. In this context, these signals might be thought of as encoding ‘performance prediction error.’ Maybe the way to Carnegie Hall is indeed practice – or in other words, an addiction to the little bursts of dopamine that tell you, “You got it right!”
For more information, check out this video from Cornell.