The Journey of RL, Part 6: The Brain in the Loop
Reinforcement learning and neuroscience read each other for forty years. The conversation reveals that reward in biology was never the simple scalar RL had imported.
In the early 1990s, in a laboratory at the University of Fribourg in Switzerland, Wolfram Schultz was recording from single neurons in the brain of a monkey, and the neurons were doing the wrong thing. Schultz was a physiologist with a medical background, and he had come to dopamine for a practical reason. Dopamine-producing cells die in Parkinson’s disease, and Schultz wanted to understand the movement disorders that follow. So he trained monkeys to reach for food, lowered fine electrodes into the dopamine neurons that he expected would fire as the animal moved, and watched.
The neurons did not fire when the monkey moved. They fired when the food appeared. This was a nuisance at first, a reward response cluttering the motor signal he was hunting for, but it was too consistent to ignore, so Schultz kept watching it. As a monkey learned that a particular cue, a light or the click of a box opening, predicted food, something stranger happened. The dopamine burst that had come at the moment of the food began to creep earlier, until it arrived not at the food but at the cue that predicted it. And when a monkey had learned to expect food and the food then failed to arrive, the dopamine neurons did not simply stay quiet. At the precise moment the food was due, they fell silent below their resting rate, as if registering the absence of something owed. Schultz had a phenomenon he could describe in exact detail and could not explain. As he later put it, he did not yet have the right concept for what the neurons were doing.
The concept existed. It had been worked out a decade earlier, in a different field, by people who had never recorded from a neuron in their lives. Part 6 is about the forty-year conversation between reinforcement learning and the brain, what each told the other, and what the conversation finally revealed about the thing both had taken as their starting point: reward itself.



