Robonaissance

Robonaissance

The RL Spiral, Part 5: The Self-Play Paradox

Self-play produced the strongest Go player in history. Human feedback produced ChatGPT. RL’s two greatest successes pull in opposite directions.

Hugo's avatar
Hugo
Mar 25, 2026
∙ Paid

This is the fifth article in The RL Spiral, an eight-part series on reinforcement learning. The previous article, When RL Learned to See, traced how deep learning gave RL the ability to build its own representations. This one is about what happened when RL stopped needing human data entirely, and why it then needed humans more than ever.

User's avatar

Continue reading this post for free, courtesy of Hugo.

Or purchase a paid subscription.
© 2026 Robonaissance · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture