The RL Spiral

The RL Spiral, Part 8: The Open Questions

Transfer, adaptation, curiosity, and the nature of reward itself. Four open problems define RL’s frontier. Evolution has been working on them for 500…

Apr 7 • Hugo

The RL Spiral, Part 7: RL Meets the Physical World

RL can master any game in days. Teaching a robot to fold a towel is harder than teaching it to play Go. The physical world is where RL’s deepest…

Apr 1 • Hugo

The RL Spiral, Part 6: The World Inside

DQN gave RL eyes. AlphaGo gave it ambition. Neither gave it imagination. The most important research direction in RL today is fixing that.

Mar 30 • Hugo

The RL Spiral, Part 5: The Self-Play Paradox

Self-play produced the strongest Go player in history. Human feedback produced ChatGPT. RL’s two greatest successes pull in opposite directions.

Mar 25 • Hugo

The RL Spiral, Part 4: When RL Learned to See

For decades, every RL system needed a human to tell it what to look at. Then one opened its own eyes. The headlines went to the neural network. The less…

Mar 23 • Hugo

The RL Spiral, Part 3: The Curse Bellman Couldn’t Break

RL can master any game in days. A toddler still learns faster. The reason is seventy years old, and nobody has fixed it.

Mar 19 • Hugo

The RL Spiral, Part 2: The Equation That Explains Your Brain

Every advanced AI runs on an equation. Your brain has been running it for 500 million years. A neuroscientist proved it by accident. That accident…

Mar 17 • Hugo

The RL Spiral, Part 1: The Reward Trap

You trained ChatGPT to lie to you. You did not mean to. Neither did the engineers. Here is how it happened, and why your brain did it first.

Mar 9 • Hugo

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts