<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Robonaissance]]></title><description><![CDATA[A new renaissance in AI and robotics. Navigating the intelligence revolution.]]></description><link>https://www.robonaissance.com</link><image><url>https://substackcdn.com/image/fetch/$s_!2xRC!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a1249c0-fc2c-402e-9188-80f69d468eb6_1024x1024.png</url><title>Robonaissance</title><link>https://www.robonaissance.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 14 Jun 2026 11:58:40 GMT</lastBuildDate><atom:link href="https://www.robonaissance.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Robonaissance]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[robonaissance@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[robonaissance@substack.com]]></itunes:email><itunes:name><![CDATA[Hugo]]></itunes:name></itunes:owner><itunes:author><![CDATA[Hugo]]></itunes:author><googleplay:owner><![CDATA[robonaissance@substack.com]]></googleplay:owner><googleplay:email><![CDATA[robonaissance@substack.com]]></googleplay:email><googleplay:author><![CDATA[Hugo]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Journey of RL, Part 2: The Value Hypothesis]]></title><description><![CDATA[Q-Learning and Its Discontents. The compression that made an entire field possible, and the trap it concealed.]]></description><link>https://www.robonaissance.com/p/the-journey-of-rl-part-2-the-value</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-journey-of-rl-part-2-the-value</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Tue, 09 Jun 2026 16:06:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!o43r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o43r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o43r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!o43r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!o43r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!o43r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o43r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2456541,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/201318415?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o43r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!o43r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!o43r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!o43r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bf21ab9-ec68-49fb-bf90-99120610c732_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In May 1989, a graduate student at King&#8217;s College, Cambridge submitted a PhD thesis with the title <em>Learning from Delayed Rewards</em>. The thesis was 241 pages. Its central algorithm had a one-letter name. The letter was Q.</p><p>The author was Christopher Watkins, who had been employed as an industrial researcher in southern England for the past four years while completing his thesis. The external examiner on his committee, Andrew Barto, was visiting King&#8217;s that spring on sabbatical from the University of Massachusetts. The submission date was forced by the date Barto would fly back to the United States. Watkins&#8217;s own account of the timing uses the phrase &#8220;by sheer good luck.&#8221;</p><p>Q-learning did not appear because the field demanded it. It appeared because one mid-career researcher in southern England, who had failed at his first PhD project, who had spent four years writing expert systems at an industrial lab, who had picked up a 1962 textbook in a company library on a hunch, finally saw what the framework he was reading was actually for.</p><p>Part 2 begins where Q-learning was born, in a King&#8217;s College office in the spring of 1989.</p><div><hr></div><h2>The Letter Was Q</h2><p>Watkins arrived at King&#8217;s College in 1982 as a graduate student. His project was to formalize Jean Piaget&#8217;s theory of how intelligence develops in children, with the eventual aim of writing a computer program that would learn the way children do. By his own later assessment, he was &#8220;a thoroughly unsuccessful graduate student.&#8221; He left without finishing in 1985 and took a job in the AI group at Philips Research Labs in Surrey, England, where he worked on expert systems and decision trees. This was the dominant AI paradigm of the period. The expert systems approach tried to capture human knowledge as explicit rules; the decision tree approach tried to organize those rules into branching decisions. Neither approach asked how the knowledge or the rules were acquired. They started from a body of expertise and tried to encode it.</p><p>Watkins kept his interest in the original question. He still hoped to finish a PhD on the genesis of intelligence. The expert systems work was paying the bills.</p><p>In 1987 Philips sent him to the Fourth International Workshop on Machine Learning at the University of California, Irvine. He had no paper to present and no specific agenda. During one of the talks he was bored, and he interrupted the speaker to ask a question he later remembered as &#8220;totally irrelevant.&#8221; The question was whether anyone in the room had done any work on animal learning. The speaker said, without pausing, that all that had been done, and continued his presentation.</p><p>Watkins later realized this was wrong in both directions. Animal learning had not all been done. And he had not seen any work in machine learning that built on it. He had given up on children because the problem was too hard. Animals might be tractable.</p><p>In the coffee break that followed the session, Richard Sutton came over and introduced himself. Sutton had liked the question. He gave Watkins reprints of several of his recent papers, including a 1983 paper, co-authored with Andrew Barto and Charles Anderson, on a neuronlike system that learned to balance a pole on a cart. Watkins read the pole-balancing paper on the flight back to London, and then again, and then again. He found it highly original, unlike any other work he had seen. It described a small system that learned a difficult control task through direct interaction, getting almost no information about whether any particular action was right, and yet improving. The algorithm seemed to work without obvious theoretical foundation. Watkins felt that there had to be a framework that explained why.</p><p>He went to the Philips company library to look for it. The library was small and out of date, which turned out to be useful. The book he found was <em>Applied Dynamic Programming</em>, by Richard Bellman and Stuart Dreyfus, published in 1962. Inside the book was the framework Watkins had been looking for.</p><p>Bellman had joined the RAND Corporation in the summer of 1949 and begun working on multistage decision processes. The name &#8220;dynamic programming,&#8221; which he settled on in the fall of 1950, was, according to the account he later gave in his 1984 autobiography, partly a political maneuver. RAND was funded through the Air Force, and Bellman would write that the Secretary of Defense at the time had a documented hostility to the word &#8220;research&#8221; and an even sharper hostility to the word &#8220;mathematical.&#8221; A name was needed that conveyed something serious but did not signal what was actually being done. &#8220;Dynamic&#8221; suggested time-varying processes; &#8220;programming&#8221; referred to the planning of multistage operations in the sense of military logistics. The chronology of Bellman&#8217;s later account does not quite fit the public record, since his first paper using the term predates the appointment of the Secretary he later blamed. But the broader point survives: the field&#8217;s founder did not believe the work could be safely called mathematical research, and the name he chose reflected that belief. By 1957, the year Bellman published the book <em>Dynamic Programming</em> with Princeton University Press, the field had been formalized as the theory of Markov decision processes. The recursive equation at its center, which would later carry his name, said that the value of being in a state was the immediate reward plus the discounted value of the next state, optimized over the available actions.</p><p>The 1962 book that Watkins picked up was the applied sequel. It assumed that the structure of the decision process, every transition probability and every reward, was fully known in advance and that the value function could be computed exactly. What Watkins saw, when he closed the book, was the gap. Bellman had given a framework for optimal sequential decisions, but only in the case where the decision-maker already had a complete map. An animal exploring a world does not have a complete map. An animal observes states, chooses actions, receives rewards, finds itself in new states, and starts again.</p><p>The question Watkins asked was whether such an animal could learn an optimal policy from experience alone. He worked out, over the next two years, a family of algorithms that could. The simplest of them updated a function he called Q, which assigned to each state-action pair an estimate of the long-run reward of taking that action from that state. The update rule was a single line:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s,a) \\leftarrow Q(s,a) + \\alpha [r + \\gamma \\max_{a&#8217;} Q(s&#8217;, a&#8217;) - Q(s, a)]&quot;,&quot;id&quot;:&quot;XLXEPCOHPJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The bracketed term was a temporal-difference error in the sense Sutton had introduced in his 1988 paper. The maximum over next-state actions was new. It made the algorithm capable of learning the optimal policy without ever following it during exploration. This property, which would later be called off-policy learning, would become central to the field and central, eventually, to the field&#8217;s later difficulties.</p><p>In Spring 1989, by sheer good luck, Andrew Barto arrived at King&#8217;s College on sabbatical. Watkins went to see him with an early draft of his thesis. Barto read it and made suggestions; a copy was passed to Sutton, who made more suggestions. Barto agreed to serve as the external examiner on Watkins&#8217;s committee, which meant the thesis had to be submitted before he left for the United States. Watkins finished in May. The thesis sketched a proof of convergence for Q-learning but did not prove convergence with probability one. That gap would be closed three years later. In 1992, Peter Dayan, then a graduate student at the Centre for Cognitive Science in Edinburgh, wrote to Watkins to point out that the proof was incomplete. He offered to help fix it. The result was a short joint paper in <em>Machine Learning</em>, titled &#8220;Technical Note: Q-Learning,&#8221; that proved what the thesis had only sketched.</p><p>Q-learning entered the field through a side door. Watkins&#8217;s thesis was not published as a book. For several years, American researchers who wanted to read it requested photocopies from Cambridge, which Barto and Sutton mailed out from Amherst. The algorithm that would, two decades later, anchor an entire wave of deep reinforcement learning research entered circulation as a stack of photocopied pages crossing the Atlantic. The algorithm had a one-letter name. The framework it implied would shape what reinforcement learning meant for the next thirty years.</p><div><hr></div><h2>Value as Compression</h2><p>The Bellman equation in its simplest form said that the value of being in a state is the immediate reward plus the discounted value of the next state, optimized over the available actions:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(s) = \\max_a \\left[ r(s,a) + \\gamma V(s&#8217;) \\right]&quot;,&quot;id&quot;:&quot;TILRRSRTMB&quot;}" data-component-name="LatexBlockToDOM"></div><p>The equation is recursive. The value of any state is defined in terms of the value of states it leads to. The recursion bottoms out either at terminal states or by the discount factor $\gamma$, which is strictly less than one, ensuring that rewards far in the future contribute less than rewards close at hand. In the more general stochastic case, where transitions are not deterministic, the equation is the same with an expectation over next states. The equation does not solve any specific problem. It only specifies what a solution would have to look like.</p><p>What the equation does, considered as an operation on data, is compression. The whole space of all possible futures from any given state, every trajectory the agent might take, every reward it might collect along the way, is folded into a single number $V(s)$. That number is sufficient, under the Markov assumption, to choose the optimal next action. The agent does not need to remember how it got to the current state, nor does it need to enumerate the futures available from it. It needs the scalar.</p><p>This is the second compression in the lineage Part 1 traced. The first was reward itself, the collapse of &#8220;goals and purposes&#8221; into a single time-indexed scalar quantity. The second was value, the collapse of all reachable futures from a given state into another scalar. The value hypothesis, in the sense Watkins&#8217;s thesis introduced it, was the claim that this second compression was tractable. An agent could learn the value function from experience alone, without being given a model of the world, by repeatedly updating its estimate using the temporal-difference error.</p><p>The Markov assumption is what makes the compression possible. If the current state contains all the information relevant to predicting the future, then the value function written as $V(s)$ is a sufficient statistic for everything the agent needs to know. Any feature of the agent&#8217;s history not summarized in the current state is, by assumption, irrelevant. This is what reinforcement learning shares with thermodynamics and with information theory: the move from a high-dimensional trajectory to a low-dimensional summary is licensed by an assumption about what can be safely forgotten.</p><p>When the Markov assumption holds, Bellman&#8217;s equation has another useful property. It is a contraction mapping in the space of value functions, which guarantees that iterated application converges to a unique fixed point: the optimal value function $V^*$. Q-learning&#8217;s convergence proof, sketched by Watkins in 1989 and completed by Watkins and Dayan in 1992, rests on this property. Each update moves the estimated $Q$ closer to its target. The math works because the assumption holds.</p><p>Compression of this kind is a recurring engine across machine learning. The point here is narrower. Watkins&#8217;s algorithm, in its 1989 form, depended on two assumptions doing all the load-bearing work: that reward was a well-defined scalar signal from the environment, and that the current state contained all the information needed to predict the future. When both assumptions held, the algorithm worked. The first two decades of reinforcement learning, from the late 1980s to roughly 2010, were spent demonstrating that the algorithm worked in cases where both assumptions held. The next two decades, which Parts 7 through 12 of this series will trace, are about what happens when one of them does not.</p><div><hr></div><h2>The Discontents</h2><p>In the cases the early Q-learning literature considered, the agent&#8217;s state space was small enough to enumerate. A gridworld of a hundred squares, a small Markov decision problem with a few dozen states, a benchmark control task with a discretized angle and velocity: in each of these, the value function could be stored as a table with one entry per state-action pair. Update one entry at a time, sample enough trajectories, and the table converges to the optimal $Q^*$. Watkins and Dayan&#8217;s 1992 proof guaranteed it.</p><p>The trouble started when the state space grew. Bellman himself had named the problem in 1957, in the same book that formalized dynamic programming: the curse of dimensionality. As the number of state variables increases, the number of distinct states grows exponentially. The number of possible games of chess exceeds the number of atoms in the observable universe. A robot arm with even a few continuous joints has, in any practical sense, infinitely many states. Storing one $Q$ value per state-action pair was not just inefficient. It was not possible.</p><p>The standard response was function approximation. Instead of a table, the agent would learn a parametric function, with a manageable number of parameters, that took a state as input and returned an estimate of $Q$. Linear function approximation, where the value was a weighted sum of hand-engineered features of the state, was the obvious starting point and was studied throughout the late 1980s and into the 1990s. Sutton&#8217;s 1988 paper on temporal-difference learning had already shown that TD methods could be combined with linear approximators in the on-policy case. The convergence guarantees from the tabular case mostly survived.</p><p>The most famous early success of nonlinear function approximation was TD-Gammon, a backgammon program developed by Gerald Tesauro at the IBM Watson Research Center between 1990 and 1998. TD-Gammon trained a multilayer perceptron with eighty hidden units using TD($\lambda$) and self-play, starting from random initial play and playing millions of games against itself. By 1992 it was strong enough to be invited to the World Cup of Backgammon, where in thirty-eight exhibition games against top human players it had a net loss of seven points. By the mid-1990s it was playing at world championship level. The program had been built without expert hand-engineered features and without supervised training on recorded games. The neural network had learned what to value from reward alone.</p><p>TD-Gammon was treated, for a long time, as a proof of concept that would surely generalize. It mostly did not. Backgammon has properties that flatter the algorithm. The dice introduce stochasticity that smooths the value landscape and naturally encourages exploration. The game tree is shallow enough that bootstrapping does not accumulate too much error. The state space is large but factored in ways that the network&#8217;s hidden units can capture. When researchers tried to repeat the trick in domains without these properties, the algorithms tended to diverge, oscillate, or get stuck. Through the 1990s and 2000s, a recurring observation in the field was that the combination of neural networks, temporal-difference learning, and the off-policy updates Q-learning required was unstable. The exception that TD-Gammon represented did not become the rule researchers had hoped to see.</p><p>In 1995, Leemon Baird, then an Air Force officer doing research at the Avionics Directorate of Wright Laboratory at Wright-Patterson Air Force Base in Ohio, presented a paper at the International Conference on Machine Learning that gave the unstable combination its first formal indictment. Baird&#8217;s research group at Wright Lab was led by A. Harry Klopf, the same Harry Klopf whose 1972 report had pulled Sutton into reinforcement learning in the first place. Twenty years on, Klopf&#8217;s group produced the first formal evidence of where the field&#8217;s central algorithm broke. Baird constructed a small example, a Markov decision process with seven states and two actions, in which Q-learning combined with linear function approximation under off-policy sampling provably diverged. The value estimates did not just fail to find the optimum. They grew without bound. The example Baird chose has since been known in the literature as Baird&#8217;s counterexample, and it is reproduced in essentially every modern textbook treatment of the topic. What it established was not new in spirit. Researchers had known since the late 1980s that the combination was unstable. What Baird established was that the instability was not an artifact of bad implementation or unfortunate hyperparameters. It was a property of the setting.</p><p>The three ingredients Baird identified as jointly responsible were function approximation, bootstrapping, and off-policy learning. Function approximation is what generalizes the value estimate across states. Bootstrapping is what updates the value of one state using the estimated value of another, rather than waiting for actual returns. Off-policy learning is what lets the agent learn about an optimal policy while following a different one for exploration. Each of the three is useful on its own. Each pair of two has been combined successfully in many algorithms. All three together can cause the parameters of the value estimate to escape to infinity.</p><p>The name &#8220;deadly triad,&#8221; which the field would later use for this combination, does not appear in Baird&#8217;s 1995 paper. It would not become a fixed term in the literature until Sutton and Barto introduced it in the second edition of their textbook in 2018, after deep reinforcement learning had succeeded despite combining all three. From 1995 to about 2010, however, the underlying observation, that the combination was dangerous in ways that mattered, shaped the field&#8217;s choice of methods. The dominant approach was caution: stay tabular where the problem allows, use linear function approximation when generalization is needed, avoid neural networks for the value function, and treat off-policy learning as something to be approached carefully and on-policy alternatives as the safer default.</p><p>Through this long period, Q-learning continued to work in the cases it had been designed for. It just did not scale. The algorithm that Watkins had introduced as a way for an agent to learn what to optimize from experience alone, in the most general setting reinforcement learning could imagine, was confined for two decades to settings where the state space could be enumerated or approximated by a handful of linear features. The value hypothesis held in those settings. Outside them, it was not so much that the value hypothesis failed as that the algorithms for testing it broke before the question could be properly posed.</p><div><hr></div><h2>DQN and Its Triumph</h2><p>The apparatus was repaired by an unexpected route. The 2010s opened with a series of demonstrations that deep convolutional neural networks, trained on large labeled datasets, could solve image classification problems that the field had considered for decades to be far away. The most cited of these was the 2012 ImageNet paper by Krizhevsky, Sutskever, and Hinton, in which a network with sixty million parameters trained on 1.2 million labeled images dropped the top-five error rate on the standard benchmark from twenty-six percent to fifteen. The success was not an incremental improvement on a research curve. It was a step change.</p><p>The deep learning revolution was happening in supervised learning, not in reinforcement learning, and there was no obvious reason it should transfer. Q-learning was unstable with neural networks even before the deep learning era. Larger neural networks, naively combined with Q-learning, would on most expectations have made the instability worse. What changed the calculation was a small group of researchers at DeepMind in London who decided to try.</p><p>In December 2013, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller posted a paper to arXiv with the title &#8220;Playing Atari with Deep Reinforcement Learning.&#8221; It described a system that took as input the raw pixels of an Atari 2600 game emulator and produced as output a policy for playing the game. The system used a convolutional neural network to estimate the Q function. It was trained by Q-learning. It worked on seven different Atari games. On six of them it outperformed prior reinforcement learning methods. On three of them it surpassed an expert human player.</p><p>The 2013 paper was presented at the NeurIPS deep learning workshop and read by a small number of researchers. The result that crystallized the field&#8217;s attention was the longer follow-up, published in <em>Nature</em> in February 2015 under the title &#8220;Human-level control through deep reinforcement learning,&#8221; with a larger team and a much expanded result. The 2015 system was tested on forty-nine Atari games, outperformed the best prior reinforcement learning methods on forty-three of them, and reached a score comparable to a professional human game tester across the full set. The <em>Breakout</em> result became the iconic example: the agent had learned to dig a tunnel along the side of the brick wall, sending the ball behind to destroy blocks more efficiently, a tactic the developers had not anticipated and had not explicitly rewarded.</p><p>The 2015 paper added one technical innovation beyond the 2013 system that turned out to matter more than it appeared. Alongside the main Q-network being trained, the algorithm maintained a separate target network whose weights were copied from the main network only periodically, every ten thousand steps in the canonical version. The target used in the temporal-difference update was computed against this slower-moving network, not against the network being currently updated. The change broke the tight feedback loop between the Q-network&#8217;s current estimates and the targets it was trying to match, which was one of the three ingredients Baird had identified as jointly dangerous. The combination of function approximation, bootstrapping, and off-policy learning was still present. The target network did not eliminate any of the three. It just kept them from amplifying one another&#8217;s errors as quickly.</p><p>What DQN had shown was not that the deadly triad had been solved. It was that the deadly triad could be engineered around. Experience replay, the practice of storing past transitions in a buffer and training on randomly sampled batches rather than on the live stream, broke the temporal correlations that gradient methods on neural networks tolerated poorly. The target network broke the self-referential update. Reward clipping bounded the magnitude of the TD error. Each of these was a heuristic. Together they constituted a recipe by which the same combination of ingredients that diverged on Baird&#8217;s seven-state counterexample could converge on a multi-million-parameter neural network trained against pixel input from forty-nine Atari games.</p><p>The substrate that made all of this possible was the deep network. Q-learning provided the algorithm; deep convolutional networks provided the function class. The convolutional architecture handled the spatial structure of pixel input, where similar shapes recurred at different positions and could share filters. The agent&#8217;s value estimate was no longer a table or a linear sum of features. It was a learned hierarchy of representations, computed from raw input. The compression that Bellman had identified, the folding of all future returns into a scalar attached to each state, was now being performed by a network whose internal structure was opaque even to its trainers. The architecture that Rosenblatt had defended at the cost of his career, and that Minsky and Papert had argued in 1969 could not learn certain functions, turned out to be the substrate that made Q-learning scale.</p><p>By 2015, Q-learning had moved from a one-line update rule in an unpublished Cambridge thesis to the engine of a system that could play forty-nine arcade games at human level from pixel input. The value hypothesis had been vindicated, in the only domain where it could be tested at scale. What happened next would depend on whether the domain extended further than Atari.</p><div><hr></div><h2>What Compression Cost</h2><p>The value hypothesis as Q-learning had implemented it rested, in the end, on two compressions doing all the work. The first was reward as scalar. The second was value as the recursive compression of all future scalar reward into a single number per state. Both compressions had been borrowed, the first from behaviorist psychology and the second from operations research. Both had been imported into the architecture of an agent and treated as the agent&#8217;s foundation. Both had worked.</p><p>What DQN had demonstrated was that the value hypothesis could be deployed at scale. What it had not demonstrated, despite the temptation to read the <em>Nature</em> paper that way, was that the assumptions licensing the compression were now safe to ignore. Atari is a particular kind of environment. The reward signal is the score, which is a number printed on the screen by the game itself. The objective is to maximize that number. The reward function is not learned. It is not contested. There is no ambiguity about what the agent is being asked to do.</p><p>The forty-nine Atari games shared this property. They were also Markov, or close enough to it that the assumption was harmless. The agent&#8217;s pixel input contained the relevant game state. The next state was a function of the current state and the agent&#8217;s action, not of the agent&#8217;s history. The Markov property held by construction, because the emulator constructed it. They were finite-horizon, or close enough. They had reward signals that were well-defined functions of game state, returned by the emulator without negotiation. They were the cleanest possible setting for the value hypothesis to be tested.</p><p>When reinforcement learning moved off Atari and into settings where the cleanness disappeared, the compression that DQN had made deployable started to show its cost. Two things changed. The Markov assumption that had held by construction in Atari no longer held in domains where agent decisions had long-range dependencies, where the world&#8217;s state was only partially observable, or where the relevant context lived in the agent&#8217;s history rather than in any single state vector. And the reward signal that had been printed on the screen by the emulator was no longer printed on the screen by anything. Language models had to be trained against signals that were proxies for human preferences. Robotic systems had to be trained against rewards that designers shaped by hand, often badly. Recommender systems had to learn what users wanted from clicks that did not reliably reveal it. In each of these, both halves of the assumption that had licensed the compression broke at once.</p><p>Q-learning has no way to tell whether the reward signal is real or constructed. It treats whatever signal arrives as the true objective. The value function the network learns compresses futures with respect to that signal. If the signal is right, the compression is right. If the signal is approximately right at training time but diverges from what was actually wanted at test time, the compression has nothing to fall back on. The value function does not know that its scalar input is wrong. It cannot. The information that would distinguish the right scalar from the approximate one was, by the definition of the architecture, the input to the system, not its concern.</p><p>Watkins, looking back on this lineage in a later note on the history of the field, wrote that the assumption that learning agents inhabit a Markov decision process and that learning consists of finding an optimal policy &#8220;has been dominant in reinforcement learning research since, and perhaps these basic assumptions have not been sufficiently examined.&#8221; Watkins&#8217;s own concern was with the first half of that pair: the Markov decision process as a model of the world. The crises this series will trace in Parts 7 through 12 extend the concern to the second half: optimization itself, when the thing being optimized is not given by the world but has to be constructed. The remark is more striking for who is making it than for what it says. The author of Q-learning, thirty years after submitting his thesis, was flagging that the framework he had helped establish might not have been the right one for the field&#8217;s later questions. The compression had been a tool. It had also been a commitment, in both halves at once. The field had built thirty years of progress on the commitment, and the parts that were about to crack were the parts where the commitment did the most load-bearing work.</p><div><hr></div><p>Q-learning&#8217;s monopoly over the value side of reinforcement learning was, even in the 1990s, never quite complete. A different lineage, traceable to Williams&#8217;s 1992 REINFORCE algorithm and to a current that ran parallel to Watkins&#8217;s work without intersecting it, had been arguing that an agent could learn what to do without learning to value states at all. The policy could be parameterized directly. The gradient of expected return could be estimated and followed. Through the 1990s and 2000s, this lineage was the minority view. Most of the canonical results came from value methods. Then, in 2015 and 2016, almost the moment DQN had succeeded, the policy methods began to win in domains where DQN could not.</p><p>Part 3 begins where Q-learning&#8217;s monopoly began to crack, and where the field discovered that the two routes to the same destination were not, as it had assumed, two routes to the same destination.</p><div><hr></div><p><em>This is <strong><a href="https://www.robonaissance.com/t/the-journey-of-rl">The Journey of RL</a></strong>, a twelve-part journey across reinforcement learning told through one core question: how did machines learn what to optimize?</em></p><p><em>Part 3 forthcoming.</em></p>]]></content:encoded></item><item><title><![CDATA[The Journey of RL, Part 1: Before the Equation]]></title><description><![CDATA[How machines learned what to optimize, beginning in 1898 with a stopwatch and the borrowed assumption that has shaped reinforcement learning ever since.]]></description><link>https://www.robonaissance.com/p/the-journey-of-rl-part-1-before-the</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-journey-of-rl-part-1-before-the</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Tue, 26 May 2026 17:20:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!i706!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i706!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i706!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!i706!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!i706!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!i706!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i706!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2734397,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/199338959?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!i706!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!i706!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!i706!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!i706!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa16e894d-03e3-4d05-84a1-bbf0b14753cd_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Sutton and Barto&#8217;s <em>Reinforcement Learning: An Introduction</em> states a single claim about what intelligent behavior is. Everything we mean by goals and purposes, the book proposes, can be understood as the pursuit of one scalar quantity over time. The book names this the reward hypothesis. Richard Sutton and Michael Littman had been working out the formulation in conversation since around 1990, almost a decade before the first edition put it in print.</p><p>It is a remarkable sentence. It compresses a hundred years of argument about what minds are and what they want into a single signal. And it asks the reader to treat that compression as a starting point rather than a conclusion.</p><p>Reinforcement learning is where AI keeps its hardest unanswered question. Not how to build intelligence, but what intelligence should be for. The question is older than the field of computer science, and the field that inherited it is now discovering that its three-decade answer, the reward hypothesis, is no longer holding.</p><p>Part 1 begins where the answers begin, in the work of three psychologists who would not have called themselves the founders of anything computational.</p><div><hr></div><h2>The Behaviorist Inheritance</h2><p>By the 1890s, psychology had a methodological problem. The dominant approach, inherited from Wilhelm Wundt&#8217;s Leipzig laboratory and consolidated in the United States by Edward Titchener, treated psychology as the study of consciousness through introspection. Trained observers would report on the elemental components of their own mental states. The data of psychology was, in this view, what minds told other minds about themselves. The trouble was that different laboratories produced different elemental components and the procedure could not adjudicate between them. Whatever introspection was, it was not converging.</p><p>A graduate student at Columbia University named Edward Lee Thorndike spent 1897 and 1898 doing something different. He built wooden boxes from boards and slats, twenty inches long and fifteen across, with a door that could be opened from inside by pulling a loop of string or pressing a lever. He placed cats in the boxes, put food just outside, and timed how long each escape took. The first attempt by any given cat was a chaos of scratching and biting. The second was slightly shorter. By the twenty-fourth trial, a cat that had once taken nearly three minutes was out in six seconds. Thorndike kept the timing data and made graphs of it. His PhD dissertation, submitted to Columbia in 1898 under James McKeen Cattell, was the first study in psychology to use nonhuman subjects systematically. He titled it <em>Animal Intelligence: An Experimental Study of the Associative Processes in Animals</em>.</p><p>The interpretation he proposed became known as the Law of Effect. Responses followed by satisfaction would become more firmly connected to the situations that produced them. Responses followed by discomfort would become less firmly connected. He did not yet have a formal model. He had a stopwatch, a wooden box, and a graph that bent downward over trials. The mechanism was named in the language his contemporaries already used about animals and people. Satisfaction. Discomfort. The unanalyzed terms that would, decades later, be compressed into a scalar.</p><p>Fifteen years later, in 1913, John B. Watson delivered a lecture at Columbia that became the document later historians would call the behaviorist manifesto. Published in <em>Psychological Review</em> under the title &#8220;Psychology as the Behaviorist Views It,&#8221; it argued that psychology should become a purely objective branch of natural science whose theoretical goal was &#8220;the prediction and control of behavior.&#8221; Introspection was to be discarded as a method; the human and the animal were to be studied on the same plane. The radical move was not the rejection of the mind, though that was the line that drew the most fire. The radical move was the substitution of an external criterion for an internal one. Psychology would now be evaluated by whether its predictions came true, not by whether its descriptions felt right from the inside.</p><p>Watson did not use the word reinforcement. The mechanism Thorndike had named in 1898 still went by satisfaction and discomfort, terms that retained their everyday meanings. The technical apparatus came later, from B. F. Skinner. Across his 1938 <em>The Behavior of Organisms</em> and the 1957 <em>Schedules of Reinforcement</em> written with Charles Ferster, Skinner converted the Law of Effect into an engineering discipline. He replaced the puzzle box with the operant chamber, a free-roaming environment in which an animal could press a lever many times and the experimenter could deliver or withhold reinforcement on precisely controlled schedules. He separated four cases that the everyday language of reward and punishment had collapsed. Positive reinforcement added a stimulus to increase a behavior. Negative reinforcement removed an aversive stimulus to increase a behavior. Positive punishment added a stimulus to decrease a behavior. Negative punishment removed a desired stimulus to decrease it. Negative reinforcement, in particular, was not punishment. The shock that stops when the lever is pressed reinforces the pressing. This distinction is still misstated almost everywhere outside professional behavior analysis. It mattered then, and it will matter later in this series when reward hacking begins to look like a confusion about what kind of stimulus we are tracking.</p><p>Three generations of work, then, that the field of computer science would inherit without quite noticing the inheritance. Thorndike supplied the engine, an associative process driven by the consequences of behavior. Watson supplied the methodological warrant, the claim that prediction and control were what psychology was for. Skinner supplied the engineering, an apparatus and a vocabulary precise enough to describe reinforcement contingencies without ambiguity. The lineage was complete by the early 1960s, three decades before any of it would be loaded into a computer.</p><p>Behaviorism gave reinforcement learning its inheritance by accident. It was solving a different problem, how to do psychology without minds, and the answer it arrived at was a kind of optimization.</p><div><hr></div><h2>The Mathematicians and the Holdouts</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1qsP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1qsP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!1qsP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!1qsP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!1qsP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1qsP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2639906,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/199338959?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!1qsP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!1qsP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!1qsP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!1qsP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49947052-0f25-4bfb-ad25-383bcfcc2e74_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By the time Skinner&#8217;s <em>Schedules of Reinforcement</em> appeared in 1957, the behaviorist program was forty years old and had subdivided into camps that no longer agreed on what their own discipline was about. Two of those internal disputes mattered later for reinforcement learning. The first was whether learning could be written down as equations. The second was whether the mind, evicted by Watson in 1913, would be allowed back in through the back door under a different name.</p><p>Clark Hull at Yale led the first camp. His 1943 <em>Principles of Behavior</em> attempted what no behaviorist before him had seriously tried: to derive learning from postulates, the way Euclid had derived geometry. Hull proposed that behavior was driven by physiological drives such as hunger and thirst, and that any stimulus paired with the reduction of a drive would form an association whose strength he called habit strength, written sHr. Reinforcement was identical to drive reduction. The equations multiplied across the 1943 book and Hull&#8217;s 1952 revision, attempting to predict what he called reaction potential from any combination of drives and stimulus inputs. For a brief period in the 1940s and 1950s, Hull&#8217;s framework was the most influential body of theory in American psychology. Graduate programs taught it. Critics attacked the postulates and their derivations rather than the project of mathematization itself.</p><p>Edward Tolman at Berkeley pushed the other way. His 1948 paper &#8220;Cognitive maps in rats and men,&#8221; published in <em>Psychological Review</em>, argued that rats running mazes did not merely chain stimulus-response associations. They built internal representations of the maze and used those representations to navigate. The experimental evidence was latent learning. When rats were allowed to explore a maze without reward, they showed no obvious progress, but once a reward was introduced they immediately performed as if they had known the maze all along. The information was not in synaptic connections strengthened by reinforcement. It was in something cognitive, an internal model.</p><p>The two programs were treated as opposed at the time, and they were. Hull was determined to write learning as equations from observable variables; Tolman was determined to put the mind back. The standard narrative of mid-century learning theory says Hull won the 1950s and Tolman lost the argument in his lifetime. The standard narrative is partial. After the cognitive revolution of the 1960s, Tolman&#8217;s claim about internal representations was vindicated; Hull&#8217;s specific equations were largely abandoned but his project, of writing learning as mathematics, became the template for computational learning theory in the 1980s. Reinforcement learning would later need both lineages: Hull&#8217;s mathematized value function, and Tolman&#8217;s internal model of the world. They are the algorithmic ancestors of what later became model-free and model-based RL.</p><p>Between Hull&#8217;s mathematization and Tolman&#8217;s cognitive revolt, on a different problem entirely, the most consequential mathematical model of the behaviorist tradition was being assembled. Three independent lines of research in the 1960s had questioned a fundamental assumption shared by Pavlov and his successors: that temporal contiguity between a conditioned stimulus and an unconditioned stimulus, the bell and the food, was sufficient to produce learning. Robert Rescorla&#8217;s contingency studies showed that contiguity without statistical contingency produced little learning. Leon Kamin&#8217;s 1969 blocking experiments showed that a stimulus already predicted by another cue stopped acquiring associative strength even when paired with the unconditioned stimulus. Allan Wagner&#8217;s studies on relative cue validity showed that the informativeness of a cue, not just its presence, drove learning.</p><p>Rescorla and Wagner, both at Yale, published a synthesis in 1972 in a chapter of Black and Prokasy&#8217;s edited volume <em>Classical Conditioning II</em>. Their model gave the field its first formal prediction-error equation:</p><pre><code><code>&#916;V = &#945;&#946;(&#955; &#8722; &#931;V)
</code></code></pre><p>The change in associative strength on a trial depends on three factors: the salience of the conditioned stimulus (&#945;), the salience of the unconditioned stimulus (&#946;), and a prediction-error term (&#955; &#8722; &#931;V). The bracketed quantity is the gap between what the trial conditions can support as an association and what the animal currently has across all cues present. When that gap is zero, the animal has nothing left to learn on the trial and learning stops. When the gap is large, learning is fast.</p><p>This is the equation reinforcement learning would later discover for itself in a different notation. The temporal-difference error Sutton would name in his 1988 <em>Machine Learning</em> paper has the same shape: change equals learning rate times the difference between target and current estimate. Rescorla-Wagner was the psychology side of that equation. The two communities would not realize they had been working on the same form until the late 1980s.</p><p>By 1972, three pieces were in place that reinforcement learning would inherit. A working engineering of reward and punishment in operant chambers. A live disagreement about whether learning was best described as equations over observables or as internal models. And a formal prediction-error equation that captured the surprise-driven structure of learning in a way that could be ported to any computational substrate. None of this had yet been put inside a computer. The next two decades would do that, and almost everything would change in the porting.</p><div><hr></div><h2>The Computational Bridge</h2><p>The first reinforcement learning machine was built in January 1952 at the Harvard Psychological Laboratories by a graduate student named Marvin Minsky. The Stochastic Neural Analog Reinforcement Calculator (SNARC) used vacuum tubes to implement a network of forty Hebb-style synapses. Each synapse strengthened the recently-used pathway when the network was &#8220;rewarded,&#8221; so that a simulated maze runner could learn to find its way to the goal through trial and error. The apparatus filled a room. The math behind it became Minsky&#8217;s 1954 Princeton PhD dissertation, <em>Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem</em>, supervised by John Tukey.</p><p>Seven years later, an IBM researcher named Arthur Samuel published &#8220;Some Studies in Machine Learning Using the Game of Checkers&#8221; in the July 1959 issue of <em>IBM Journal of Research and Development</em>. The paper coined the term <em>machine learning</em>. Samuel&#8217;s program played checkers, was given the legal moves and a heuristic goal of winning, and had to discover for itself the correct weights of the board-evaluation parameters that determined its choices. He called the parameter list &#8220;redundant and incomplete.&#8221; The program learned by playing itself, adjusting the weights based on which positions tended to lead to wins. The mechanism would later be recognized as temporal-difference learning, three decades before that name existed.</p><p>Then, for almost twenty years, the line went dark. The reasons were institutional as much as intellectual. By the early 1960s, AI funding had concentrated around symbolic methods and search-based problem solvers; the neural-network and reinforcement traditions, which had no comparable demonstrations of competence, lost the resources they would have needed to grow.</p><p>Some time in 1976 or 1977, an undergraduate at Stanford named Richard Sutton was searching the university library for everything he could find on animal learning. He was a psychology major. He had concluded that animals did something different from what computer scientists were modeling, and he wanted to find someone who had said so in writing. The one author he kept returning to was A. Harry Klopf, a scientist at the U.S. Air Force Cambridge Research Laboratories whose 1972 technical report had argued that individual neurons might be hedonistic, seeking reinforcement the way an animal seeks food. Most psychologists considered the framing overreaching, the report a kind of crank document. Sutton thought he had found a research program. His undergraduate thesis at Stanford, completed in 1978 and titled &#8220;A Unified Theory of Expectation in Classical and Instrumental Conditioning,&#8221; already laid out the direction his graduate work would take.</p><p>Klopf&#8217;s writing style did not help his reception. He published in Air Force technical reports rather than peer-reviewed journals, and his book <em>The Hedonistic Neuron</em> used language that mainstream neuroscience considered speculative. The broader environment did not help either. Through most of the 1960s and 1970s, mainstream AI had moved on from connectionism and reinforcement-driven adaptation. Expert systems, symbolic reasoning, and search heuristics dominated the field, and funding followed. Reinforcement learning, where it survived, survived in psychology departments and obscure technical reports.</p><p>The substance of the retreat had a personal dimension. The same Marvin Minsky who had built the first reinforcement learning machine in 1952 co-authored, with Seymour Papert in 1969, <em>Perceptrons: An Introduction to Computational Geometry</em>. The book demonstrated rigorously that single-layer perceptrons could not learn certain functions, the XOR problem chief among them. Minsky and Papert had been arguing this case in conference talks and circulating preprints since around 1965; the published book consolidated a critique they had been making for years. Connectionism by then was already declining; the book is widely credited with accelerating its retreat. Minsky, who had built the first reinforcement learning machine as a graduate student, was now helping to close the door on the architecture his own apparatus had launched. Frank Rosenblatt, the principal figure that critique had targeted, had completed the Mark I Perceptron at Cornell Aeronautical Laboratory in 1960 and had known Minsky since their adolescent years at the Bronx High School of Science. He died in a boating accident on Chesapeake Bay in July 1971, on his forty-third birthday, two years after <em>Perceptrons</em> appeared and still defending the architecture. When the book&#8217;s expanded edition was published in 1988, it carried a dedication: in memory of Frank Rosenblatt.</p><p>Klopf kept publishing through that period. His 1982 book <em>The Hedonistic Neuron</em> was the revised and expanded version of the 1972 AFCRL report; it argued that neurons were not Hebbian associators but goal-seeking heterostatic units, driven to maximize reinforcement signals rather than maintain homeostatic balance. The mainstream did not adopt the framing. Sutton did. When the second edition of <em>Reinforcement Learning: An Introduction</em> appeared in 2018, its dedication read: &#8220;In memory of A. Harry Klopf.&#8221;</p><p>Sutton arrived at the University of Massachusetts Amherst in 1978 for graduate study. His advisor was Andrew Barto, a theorist who had taken his PhD in computer science at the University of Michigan in 1975 and had joined UMass in 1977 as a postdoctoral researcher in Michael Arbib&#8217;s Brain Theory Group. Sutton was Barto&#8217;s first PhD student. The two of them began a collaboration that would not, in any meaningful sense, end.</p><p>Sutton&#8217;s master&#8217;s thesis, completed in 1980, was titled &#8220;An Adaptive Network That Constructs and Uses an Internal Model of Its World.&#8221; The title is a quiet thesis statement: an adaptive network, in this view, did not merely respond to stimuli; it modeled the environment it operated in. The lineage from Klopf was direct. The Tolman lineage was visible too, in the words &#8220;internal model.&#8221; His doctoral dissertation, completed in 1984, was titled &#8220;Temporal Credit Assignment in Reinforcement Learning.&#8221; The phrase that became central to the field appeared on the title page of his dissertation.</p><p>The early flagship paper of the Sutton-Barto program was their 1983 article with Charles Anderson in <em>IEEE Transactions on Systems, Man, and Cybernetics</em>, &#8220;Neuronlike adaptive elements that can solve difficult learning control problems.&#8221; The paper presented an architecture in which two cooperating components, an actor and a critic, learned to balance a pole hinged to a movable cart. The critic estimated whether the current state was good or bad; the actor adjusted its policy in the direction the critic indicated. The system worked. The cart-pole problem, simple enough to verify and difficult enough to require non-trivial learning, became a benchmark task in the field for decades after.</p><p>By the early 1980s, then, the apparatus existed. An agent capable of perceiving a state, choosing an action, receiving a scalar reward, updating its internal estimates, and trying again. Klopf&#8217;s heterostatic principle was the philosophical backbone. Barto&#8217;s network-theoretic training gave the framework its mathematical handles. Sutton supplied the algorithmic discipline. None of the deep-learning machinery that would come later existed yet; the networks were small, the problems toy, the results published in journals few outside the immediate field read.</p><p>What did not yet exist was the larger claim. The claim that the apparatus, this reward-driven loop with its agent and its environment and its scalar signal, was a sufficient account of intelligent behavior in general. That claim would arrive over the following decade, first in conversations between Sutton and Michael Littman around 1990, then in the textbook Sutton and Barto would write together in the late 1990s. And the claim, when it arrived, would still carry the inheritance of behaviorist psychology. Reward as the engine. Satisfaction and discomfort, now compressed into a scalar signal, doing the work Thorndike&#8217;s stopwatch had first measured eighty years earlier.</p><div><hr></div><h2>The Reward Hypothesis Crystallized</h2><p>The textbook came later. The conversation came first. Some time around 1990, Richard Sutton and Michael Littman, then both working on reinforcement learning at separate institutions, worked out a sentence that compressed what they thought their field was actually about. Everything an agent might be said to optimize for, they proposed, could be understood as maximizing a single scalar signal accumulated over time. The sentence was not written down for publication. It moved around the small RL community as a working summary, a thing people said when they needed to explain to outsiders what the field assumed.</p><p>In 1998, <em>Reinforcement Learning: An Introduction</em> appeared from MIT Press. The first edition was the book most working reinforcement learning researchers would teach from for the next two decades. Sutton and Barto stated the central proposition in Chapter 3, in the section setting up the formalism for the agent-environment interface. The proposition was clear, was load-bearing for everything that followed, and was offered without much fanfare. The book did not yet, in its first edition, explicitly anoint the proposition with a name.</p><p>The name came in 2004, on a web page Sutton set up at the University of Alberta. The page existed to state the principle as a scientific hypothesis open for discussion, refinement, and falsification. Sutton called it the reward hypothesis. The framing mattered: it was offered not as a definition but as a proposal one could in principle reject. The page is still online. Most readers find it via Google rather than via Sutton&#8217;s textbook, which says something about how academic ideas now travel.</p><p>The second edition of the textbook, in 2018, gave the formulation its now-canonical form. Goals and purposes, the book stated, are best understood as maximizing the expected sum of a scalar reward signal. The wording was identical to the 2004 web page. Twenty-eight years after the conversation that produced it, the hypothesis had its textbook-canonical sentence.</p><p>What the formulation includes is precise. The reward is scalar, not a vector or a structure; one real number per time step. It is cumulative, summed across time. The agent does not maximize the next reward but the expected sum of all rewards from now until some horizon, possibly discounted. What is maximized is an expectation, not a guarantee. And the operation applied is maximization itself, not satisficing, minimization, or any other relation to the reward stream.</p><p>What the formulation leaves out is also precise. Where the reward signal comes from is not part of the hypothesis. In the framework as stated, the reward is delivered by the environment; the agent does not generate it, does not negotiate it, does not interpret it. Whether the reward signal is itself the product of some construction, by an engineer writing a loss function or by a model trained to predict human preferences, is outside the scope. The hypothesis is about what the agent does with reward, not about what reward is.</p><p>For most of the two decades after the textbook appeared, the hypothesis was uncontested within the field. Reinforcement learning research expanded; the architectures multiplied; the applications stretched. The reward hypothesis sat underneath all of it as a working axiom. By the late 2010s, that began to change. Reward hacking became a recognized failure mode. Goodhart&#8217;s law, in its sharpest forms, was rediscovered for learned systems. The Reinforcement Learning from Human Feedback paradigm that emerged with InstructGPT and Constitutional AI made the construction of reward an explicit research problem. Papers titled &#8220;Reward is enough&#8221; and &#8220;Reward is not enough&#8221; and &#8220;The Reward Hypothesis is False&#8221; and &#8220;Settling the Reward Hypothesis&#8221; appeared between 2021 and 2024.</p><p>These debates are the subject of later parts of this series. What matters here, in Part 1, is that the reward hypothesis arrived not as a discovery but as a consolidation. By 1998 the apparatus that made reinforcement learning a field had been assembled. The hypothesis was the field&#8217;s working summary of what it had taken itself to be doing all along. The statement itself introduced no new claim. Its work was to make the inheritance from behaviorist psychology, visible already in 1898 in Thorndike&#8217;s stopwatch graphs, feel like an axiom of the new field rather than a borrowed assumption from the old one.</p><div><hr></div><p>By the time Sutton and Barto&#8217;s textbook canonized the reward hypothesis as the working axiom of their field, the field had already started asking the next question. If reward is the engine, how does an agent learn to maximize it when the world is too large to enumerate, when actions have consequences delayed by many steps, when the connections between behavior and outcome have to be inferred from sparse and noisy feedback?</p><p>The question was a century old in its psychological form. Thorndike&#8217;s cat in the puzzle box had faced it. Skinner&#8217;s rat in the operant chamber had faced it. Now an agent represented as a function over states and actions, running on a digital computer, had to face it with mathematical tools that did not yet exist.</p><p>In May 1989, at King&#8217;s College, Cambridge, a graduate student named Christopher Watkins submitted a doctoral dissertation titled <em>Learning from Delayed Rewards</em>. The thesis sketched a convergence proof for a learning algorithm that handled the credit-assignment problem in a way previous methods could not. The algorithm had a one-letter name. The proof would be completed three years later, with Peter Dayan, in the journal <em>Machine Learning</em>.</p><p>Reward was the axiom. Part 2 begins where every axiom eventually goes: under the load.</p><div><hr></div><p><em>This is <strong><a href="https://www.robonaissance.com/t/the-journey-of-rl">The Journey of RL</a></strong>, a twelve-part journey across reinforcement learning told through one core question: how did machines learn what to optimize?</em></p><p><em>Part 2: The Value Hypothesis (Q-Learning and Its Discontents). Forthcoming.</em></p>]]></content:encoded></item><item><title><![CDATA[Tokenomics, Part 3: The Agent Tax]]></title><description><![CDATA[Agent tasks consume orders of magnitude more tokens than chat. The inversion lands on application P&L. Gross margin compresses. Where the cost lives, and who absorbs it.]]></description><link>https://www.robonaissance.com/p/tokenomics-part-3-the-agent-tax</link><guid isPermaLink="false">https://www.robonaissance.com/p/tokenomics-part-3-the-agent-tax</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Wed, 20 May 2026 12:30:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JYZP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JYZP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JYZP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JYZP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JYZP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JYZP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JYZP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2486643,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/198549823?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JYZP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JYZP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JYZP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JYZP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a0e2c4-9650-4a02-bb8f-488644fa42e1_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The cleanest place to read the inversion is not a hyperscaler&#8217;s earnings call, where training and inference compute are bundled into capital expenditure footnotes. It is the income statement of the application company that has just bolted an agent feature onto a seat-based product. The compression shows up immediately. Cost of revenue rises. Gross margin falls. By the next earnings call, the CFO is disclosing inference cost as a separate line item that did not exist twelve months earlier.</p><p>For traditional SaaS companies layering AI on top of existing products, the compression shows up immediately. A typical eighty-dollar-per-month seat, when an AI assistant is added, picks up roughly fifteen dollars of direct variable cost from inference, routing, and supporting infrastructure. Gross margin on that seat drops from eighty percent to closer to sixty-five percent overnight. Across Q4 2025 and Q1 2026 earnings, public SaaS companies disclosing AI-driven margin pressure now name sixty-to-seventy-percent gross margin as the new operating range.</p><p>The pattern is sharper for AI-native companies that do not have a non-AI baseline to fall back on. ICONIQ Capital&#8217;s January 2026 State of AI snapshot, surveying roughly three hundred AI builders, reported average AI product gross margin at fifty-two percent. The number has improved from forty-one percent in 2024 and forty-five percent in 2025 as companies build inference optimization and routing discipline, but it remains roughly twenty to thirty percentage points below the eighty-percent SaaS standard. The improving trajectory and the structural floor are both real. AI-native companies optimize their way up toward the SaaS norm. They never reach it.</p><p>The compression is structural, not cyclical.</p><p>This article traces what changes at the application layer when workflows shift from chat to agent. It uses the Margin Geography framework to read where the cost moves, where the customer revenue does not move with it, and which scarcities determine which application companies survive the transition.</p><h2>What Changes When Workflows Become Agentic</h2><p>The shift from seat-based interface to agent is not a feature upgrade. It is a token-consumption regime change.</p><p>A seat-based chat interaction consumes a few hundred to a few thousand output tokens. The user types a question, the model produces an answer, the transaction completes. Token consumption is bounded by what one human can read and respond to in a session.</p><p>An agent task removes that ceiling. The agent reads files autonomously, executes searches, runs terminal commands, iterates on intermediate outputs, verifies results, and retries on failure. Each step generates tokens that the user never directly sees. A typical agent-mode coding task in Cursor reads eight to fifteen files before making a single edit, consuming eight thousand to forty-five thousand tokens on exploration alone, before any code generation begins. Cognition AI&#8217;s Devin, operating closer to fully autonomous engineering rather than developer assistance, routinely consumes hundreds of thousands of tokens per task across multi-hour workflows. The user observes one task. The model performs a multi-step workflow underneath.</p><p>The token consumption per task has risen by an order of magnitude in the chat-to-agent transition, and by two orders of magnitude in the chat-to-fully-autonomous-agent transition. A workflow that consumed two thousand tokens as a chat interaction in 2023 now consumes twenty thousand as a single-shot agent task in 2025, and two hundred thousand as a multi-step autonomous agent task in 2026. The per-task token consumption climb is not linear. It is compounding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DnSD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DnSD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!DnSD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!DnSD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!DnSD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DnSD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png" width="1456" height="980" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:980,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80751,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/198549823?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DnSD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!DnSD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!DnSD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!DnSD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9101d802-99fa-4e77-bab9-4433928109e5_1456x980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The Token Climb. Output tokens per task across workflow generations from pre-LLM scripts through multi-step autonomous agents. The vertical axis is logarithmic. Token consumption per task has climbed by approximately three and a half orders of magnitude in the chat-to-autonomous-agent transition, a structural shift that no per-seat pricing convention can absorb without gross margin compression.</em></p><p>The economic mechanism is straightforward. Output tokens cost three to five times what input tokens cost on the same model. Agents are output-heavy by structure: they generate reasoning traces, intermediate outputs, tool-call arguments, verification steps. Caching helps with the input side. It does not help with the output side. Agent workflows produce the worst possible token mix for application-company unit economics: high-volume, output-heavy, non-cacheable.</p><p>This is the cost-side mechanics. The revenue side has not moved correspondingly.</p><h2>The Cost Structure of an AI Application Company</h2><p>The seat-based pricing convention came from the SaaS era, when marginal cost per seat approached zero and a flat monthly subscription captured value efficiently. The convention persisted into the first wave of AI applications. Eighty dollars per month for the seat, with AI features bundled.</p><p>The convention fails on agent workloads.</p><p>A heavy agent user can consume thousands of dollars of inference compute per month against a hundred-dollar seat fee. The variance across users is extreme. Light users may consume single-digit dollars of compute. Heavy users routinely consume two to three orders of magnitude more. Average pricing fails when the distribution is power-law shaped. The company either prices for the heavy user, pricing out the light user, or prices for the light user, losing money on the heavy user, or prices in the middle, losing money on the heavy users and capturing too much from the light ones.</p><p>The cost structure of an AI application company in 2026 decomposes into layers that no SaaS finance template captures. Inference compute averages roughly twenty-three percent of revenue at scaling-stage AI B2B companies across ICONIQ&#8217;s surveyed cohort. Supporting infrastructure, including vector databases, retrieval systems, and routing layers, adds materially on top. The combined AI cost of revenue is materially higher than the historical SaaS hosting cost, and it scales with usage rather than with seats.</p><p>The token-mix asymmetry compounds the problem. Application companies running agent workloads do not get to choose their token composition. The workload determines it. A coding agent reads orders of magnitude more context tokens than the code tokens it generates. A research agent processes thousands of tokens of source content for every paragraph of analysis produced. A scheduling agent evaluates context many times larger than the calendar actions it outputs. None of these workflows have favorable caching profiles. None of them have favorable input-to-output ratios.</p><p>The inversion at the compute layer becomes the agent tax at the application layer. Every dollar that the inversion moved from training capex to inference variable cost arrives in the application company&#8217;s cost of revenue.</p><h2>Three Strategic Responses</h2><p>Application companies that have run into the agent tax in 2025 and 2026 have converged on three response strategies, each with its own trade-offs.</p><p>The first is to shift the customer to consumption pricing. The seat fee shrinks or disappears. The customer pays for what they use, denominated in credits, queries, or tasks. Cursor&#8217;s June 2025 pricing overhaul moved exactly this way: a flat Pro tier was replaced by a credit pool that depletes against actual API rates per request. The advantage is that gross margin stops swinging with workload composition. The disadvantage is that the customer experience changes. Customers lose predictability. Light users feel they are subsidizing nothing and start to leave. Heavy users feel surprise bills and start to economize. Both behavioral responses compress revenue growth.</p><p>The second is to absorb the cost. The seat fee stays flat. The customer experience stays familiar. The application company carries the variable cost on its income statement and accepts gross margin compression. This is the default response for companies whose primary metric is annual recurring revenue rather than gross margin. ARR keeps growing. The income statement narrative gets harder to sustain over multiple quarters as inference cost as a percentage of revenue rises and gross margin descends from eighty to sixty-five to fifty percent. Eventually the public market or the next funding round prices the compression in.</p><p>The third is to vertically integrate the inference stack. Build proprietary inference infrastructure, train or fine-tune in-house models, route the easy queries to cheap open-weight models and the hard queries to frontier APIs. This is the strategy that mature AI companies are pursuing in parallel with the first two. Anthropic and OpenAI pursue it at the model-provider level, building dedicated inference infrastructure optimized for their own model families. At the application level, the same logic appears as routing intelligence: companies like Cursor, Harvey, and Glean increasingly maintain proprietary evaluation harnesses, custom routing layers, and selective in-house fine-tuning to reduce the cost of serving each customer query. The capital investment required to pull it off is substantial. The talent pool of people who can build a production inference stack is small. The companies that succeed will hold a structural cost advantage over their pure-API-integrator competitors. The companies that fail will have spent twelve months on infrastructure that did not work and lost the formation-phase window during which positions in the application layer were available.</p><p>Most application companies in 2026 are running combinations of all three strategies. The pure plays at each end of the spectrum are rare. The combinations are messy on the income statement but reflect the reality that no single strategy currently solves the agent tax cleanly.</p><h2>Margin Geography of the Agent Tax</h2><p>The application layer reads as the most contested layer in the Margin Geography of the AI economy in 2026, because three things are happening simultaneously. The cost of serving the customer is rising. The willingness of the customer to pay more is uncertain. The structural defensibility of any particular application company&#8217;s position is being tested in real time.</p><p>Three scarcities are forming at the application layer through the agent transition. They protect gross margin through two distinct financial mechanisms.</p><p>Two operate on the cost side. Workflow design talent reduces the compute spent per unit of customer value delivered. Engineers and designers who understand both the user-facing workflow and the underlying token-consumption economics can produce applications that capture meaningful value per token of inference. This talent pool is small, recently formed, and currently distributed across a handful of high-growth AI-native companies. Evaluation infrastructure reduces the compute bill systematically. Application companies that can measure agent task quality empirically, attribute cost per task, and route workloads to the cheapest model that meets the quality bar capture compounding margin advantage. The infrastructure to do this at scale exists at perhaps a dozen companies in 2026.</p><p>The third scarcity operates on the revenue side. Integration depth into customer workflows does not reduce the inference cost, but it sustains the customer&#8217;s willingness to pay for it. Applications embedded so deeply into customer business processes that the customer cannot disaggregate the AI feature from the rest of the delivered value can pass higher prices through without losing the seat.</p><p>Two scarcities are eroding. Raw API access to frontier models is no longer a competitive moat. By 2026, every application company can integrate the same set of models on roughly the same terms. Generic prompt engineering is similarly commoditized: the techniques are documented, the prompts are reproducible, the differentiation has migrated upstream into workflow design and downstream into evaluation infrastructure.</p><p>The durable margin within the application layer sits at the intersection of integration depth and evaluation discipline. Revenue-side defense without cost-side discipline becomes a premium-priced product that still loses money on heavy users. Cost-side discipline without revenue-side defense becomes a cheaply-served product that competitors can match. Both sides are required to hold gross margin through the transition. The pattern is clearest in vertical AI applications. Harvey&#8217;s embedment into legal workflows, Glean&#8217;s enterprise search integration, and Sierra&#8217;s customer-service workflow positioning all illustrate the integration-depth scarcity. The contrast is visible too. Horizontal AI feature overlays bolted onto general-purpose software illustrate the strategy that does not protect margin through the agent tax. The AI capability is undifferentiated from competitors. The customer relationship is mediated by the underlying platform. Companies that win deep workflow integration with a specific customer category, and that build the evaluation infrastructure to operate efficiently within that category&#8217;s economic constraints, will capture the application-layer margin pool. Companies that compete on generic AI features layered onto generic seats will see their gross margins compress to the point where the unit economics no longer support continued operation.</p><p>This is a formation-phase scarcity reading. The application layer is in its margin-formation window in 2026 and 2027. By 2028, the positions will largely be settled.</p><h2>What Consensus is Mispricing</h2><p>The agent tax is widely acknowledged in industry conversation. Three aspects of it remain consistently mispriced.</p><p>The first is the speed of the transition. Public market analysts are still applying SaaS gross-margin multiples to AI-native ARR. The new operating range is sixty to seventy percent, not eighty to ninety percent, and the implied valuation compression has not yet flowed through analyst models. Companies that grew their ARR fast in 2024 and 2025 by absorbing the agent tax are about to discover that their next funding round or earnings call prices the gross margin reality.</p><p>The second is the divergence between AI Supernovas and AI Shooting Stars. Bessemer&#8217;s framing of the AI-native company cohort separates the explosive-growth, thin-wrapper companies running around twenty-five percent gross margins from the disciplined, infrastructure-mature companies running closer to sixty percent. The two groups are often valued similarly because their top-line growth rates look comparable. The unit economics are not comparable. The Shooting Stars will compound. The Supernovas will compress when growth slows and the inference bill cannot be outrun with new ARR.</p><p>The third is the layer location of the durable margin. The application layer is widely assumed to be the safest place to invest in AI, because it captures the customer relationship. The structural reading is more nuanced. The application layer captures the customer relationship, but the customer relationship alone does not protect against the agent tax. The application companies that survive the transition will be the ones that built integration depth and evaluation infrastructure before the formation window closed, not the ones that won the early customer acquisition race.</p><h2>What the Series Will Treat Next</h2><p>The agent tax is the application-layer manifestation of the inversion. Part 4 of this series turns to the open-weight question more deeply: whether open-weight competition continues to compress training margin, what its structural endpoint is, and which layer of the stack captures the value released by training-layer commoditization. The agent tax connects to the open-weight question directly, because open-weight inference is one of the few mechanisms that materially relieves the cost pressure on application companies. Part 5 examines China&#8217;s parallel infrastructure stack, where the agent tax is playing out under different policy and capital constraints, and where vertical integration of the inference stack is being attempted at the national rather than the company level. Parts 6, 7, and 8 deploy the framework across pricing intelligence, the efficiency frontier, and the layer-by-layer Margin Geography that the series has been building toward.</p><p>The agent tax is the inversion&#8217;s application-layer reckoning. Where the inversion lands next, layer by layer, is the question the series is built to answer.</p><div><hr></div><p><em>This is <a href="https://www.robonaissance.com/t/tokenomics">Tokenomics</a>, a series that explores the economic physics of the AI era, measured in the unit that runs it all.</em></p><div><hr></div><p><strong>Disclaimer:</strong> This article is for informational purposes only and does not constitute investment, financial, or legal advice.</p>]]></content:encoded></item><item><title><![CDATA[Tokenomics, Part 2: The Great Inversion. From Training to Inference.]]></title><description><![CDATA[Training capex was the story for three years. Inference is the story now. The largest pool of AI profit is forming downstream. Where the margin is moving, layer by layer.]]></description><link>https://www.robonaissance.com/p/tokenomics-part-2-the-great-inversion</link><guid isPermaLink="false">https://www.robonaissance.com/p/tokenomics-part-2-the-great-inversion</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Mon, 18 May 2026 18:42:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KNcJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KNcJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KNcJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KNcJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KNcJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KNcJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KNcJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg" width="1248" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:269892,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/198247447?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KNcJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KNcJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KNcJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KNcJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ac224ca-448f-4f97-bd41-ba30a56637e2_1248x832.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In 2023, the AI economy was a story about how much it cost to train a model. By 2026, it is a story about how much it costs to run one. The two are not the same story, and the difference matters more than almost any other structural shift this technology cycle has produced.</p><p>A frontier-class large language model in 2023 required hundreds of millions of dollars to train and a few cents of compute to answer each user query. By 2026, training a frontier model costs in the low billions, and the cumulative inference compute spent on serving that model exceeds the training cost within the first year of deployment. For GPT-4 alone, industry analysts estimated that the model accumulated roughly $2.3 billion in inference compute spending between its launch in March 2023 and the end of 2024, approximately fifteen times what the training run cost. The model&#8217;s lifetime inference bill will be larger again by the time it is retired.</p><p>The single-model picture is dramatic. The industry-aggregate picture lags, because labs continue to invest heavily in training the next generation of models even as inference workloads grow underneath. By compute-hours, inference workloads are projected to approach half of total AI compute usage by 2026, and two-thirds by 2028, with training accounting for the remainder. Capital spending lags compute-hour utilization by roughly two years. Provisioning grid power and building out inference-capable data centers takes that long, but the direction is identical. Industry analysts now describe inference as accounting for eighty to ninety percent of the lifetime cost of a deployed AI system. The &#8220;training era&#8221; of AI economics, which dominated public attention from 2020 to 2024, was actually a transitional period. The current era is the one that will matter for capital allocation, market structure, and where the durable margin sits.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9uTQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9uTQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!9uTQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!9uTQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!9uTQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9uTQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png" width="1456" height="980" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/deb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:980,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95055,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/198247447?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9uTQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!9uTQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!9uTQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!9uTQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb42898-ce46-47d4-8d07-4ad1de81e21a_1456x980.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The Inversion. AI compute spending shifts from training to inference between 2023 and 2030. The compute-hour crossover lands in mid-2026; capital spending lags by roughly two years. By 2030, inference is projected to account for sixty-five percent of AI compute and eighty to ninety percent of the lifetime cost of every deployed AI system.</em></p><p>This article reads the inversion through the two frameworks the series introduced in Part 1. The Token Stack tells us which layers are producing the cost. Margin Geography tells us which scarcities are absorbing the value. Together they answer the question the inversion forces every AI company to confront: if the bulk of the spending is moving from training to inference, where is the corresponding pool of profit going to form?</p><h2>The Training Era and What Made It Scarce</h2><p>Between roughly 2020 and 2024, the company that could afford the largest cluster of accelerators, recruit the rarest talent in pretraining engineering, and curate the largest corpus of high-quality data was the company whose model led the benchmarks. The binding constraint on capability was the size and quality of the training run that produced a model. Capital, talent, and data were the scarcities that held the margin at the training layer.</p><p>The economic logic of that period was straightforward. A training run was a capital expenditure on the order of hundreds of millions of dollars. The economic argument was that the resulting model would then serve users cheaply enough per query that the training investment would amortize over billions of queries, and the company that made the bet would capture the margin on every subsequent one. The closest analog in industrial economics was a fixed-cost asset like a hydroelectric dam: enormous upfront commitment, decades of low marginal-cost output, durable competitive position once built. The argument held at the time. What it underestimated was how large &#8220;billions of queries&#8221; would actually become, and how much cumulative inference compute would cost at that scale.</p><p>This was the logic that justified hundred-billion-dollar valuations for companies whose primary asset was a single trained model. It was also the logic that drove the capital arms race between OpenAI, Anthropic, Google, Meta, and a handful of others, each of which raised funding rounds calibrated to the cost of the next training generation rather than to current revenue.</p><p>For three years the logic held. Each generation of frontier model required two to four times the compute of the previous generation, and each generation pushed capability forward in ways that smaller models could not match. The capital scarcity at the training layer was real because the capability gap was real, and the capability gap was real because the training compute ceiling kept moving up.</p><h2>What Eroded Training Margin</h2><p>The training era ended, structurally, when the capability gap between frontier closed models and the best open-weight alternatives narrowed faster than the training cost ratio could justify.</p><p>The signals accumulated through 2024 and accelerated through 2025. Open-weight releases from Meta, Mistral, Alibaba, and DeepSeek consistently shipped models within months that approximated the capability of frontier closed models released a generation earlier. The cost gap was extreme: DeepSeek&#8217;s V3 release in January 2025 reported training costs of approximately five and a half million dollars, less than five percent of what the closest US competitor had spent. Whether or not the exact figure was reproducible, the directional claim was correct. Open-weight competition had compressed the capability premium of a hundred-million-dollar training run to a value that was decreasingly defensible.</p><p>Three structural forces drove the erosion. The first was the commoditization of the pretraining recipe. By 2025, the methodology for producing a frontier-class base model was well-documented across published papers, leaked technical reports, and reproduced open-weight implementations. The recipe was no longer the moat. The second was the diffusion of talent. Researchers who had built the early frontier models moved to second-wave labs, open-weight initiatives, and well-funded research groups in China and Europe. The third was the saturation of useful pretraining data. By 2025, every major lab was training on roughly the same corpus of high-quality text and code, with diminishing marginal returns from each additional curated trillion tokens.</p><p>The combination meant that capital alone no longer purchased a durable capability lead. A lab could still spend more on training than its competitors, but the resulting model&#8217;s advantage period was measured in months rather than years. The scarcity that had held the margin at the training layer was eroding from underneath the largest capital expenditures in private-sector technology history.</p><h2>Why It Inverted</h2><p>The inversion was not only caused by training margin eroding. It was caused, in parallel, by inference becoming structurally more valuable than it had been.</p><p>The shift was technical before it was economic. Reasoning models, beginning with OpenAI&#8217;s o-series in 2024 and accelerating through DeepSeek&#8217;s R1 in January 2025, established that an additional dollar spent on inference-time compute frequently produced better outcomes than the same dollar spent on training. Each reasoning query consumes many times the compute of a traditional inference call, sometimes by orders of magnitude, by combining a smaller base model with structured chain-of-thought reasoning, search over candidate solutions, and verification of intermediate steps. The economic structure is fundamentally different from training-time scaling: instead of capitalizing an enormous upfront investment, the cost is paid query by query.</p><p>This created two effects. First, it broke the assumption that capability improvements had to come from larger training runs. A small, capable, efficient base model combined with adequate inference-time compute could match or exceed a much larger model run at a single forward pass. Second, it shifted the unit of analysis from model capability to inference economics. The question stopped being &#8220;whose model is best&#8221; and became &#8220;whose serving infrastructure produces the highest quality output per dollar of inference compute.&#8221;</p><p>The economic implication was the inversion itself. Training, which had been the primary capital expenditure, became one of several capital expenditures, none of which alone secured a competitive position. Inference, which had been the variable cost of serving users, became the layer at which the largest pool of cumulative spending now accumulated.</p><p>By the end of 2026, the global AI economy was running on a structure that almost no one had predicted as recently as 2023: the spending lived on the variable-cost side of the ledger, not the capital-cost side.</p><h2>Margin Geography of the Inversion</h2><p>The migration is the most consequential value-capture event of this technology cycle. The Margin Geography framework lets us read it precisely.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tzmd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tzmd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!Tzmd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!Tzmd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!Tzmd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tzmd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png" width="1456" height="980" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:980,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:122967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/198247447?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tzmd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!Tzmd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!Tzmd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!Tzmd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc442ccd6-751b-4795-a594-d9d6a9f3842d_1456x980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the training layer, capital and talent are now eroding scarcities. The training run remains expensive, but each additional dollar of training spend now buys a smaller and shorter-lived capability lead. The labs that continue to push the training frontier do so to remain at parity with peers, not to establish a defensible lead. Capital that flows to training in 2026 and beyond is increasingly capital that prevents a fall behind, not capital that wins.</p><p>At the inference layer, three scarcities are forming. The first is talent in inference optimization: engineers who can squeeze the maximum quality output per dollar of inference compute through architectural choices, serving infrastructure design, routing logic, caching strategy, and the cumulative thousand small decisions that determine whether an inference dollar produces ten cents of output or a dollar of output. This talent pool is small, recently formed, and concentrated at a handful of companies and labs. The second is capital in specialized inference infrastructure during the buildout phase: data centers designed for inference workloads rather than training workloads, with different memory hierarchies, network topologies, and chip choices. The capital investment to build out this infrastructure is real and currently bottlenecked by both supply and the difficulty of provisioning grid power at the necessary scale. This is a buildout-window scarcity, not a permanent one. Once raw inference capacity catches up to demand, the scarcity will erode the way raw compute capacity in cloud IaaS eroded over the 2010s. The durable margin within the inference layer sits one level above the capacity itself, at the workflow and routing tier that decides what to do with the capacity. The third is a forming network-effect scarcity in developer adoption: the inference platforms that win developer mindshare in 2026 will benefit from accumulating workflow integrations, tool ecosystems, and switching costs that compound over years.</p><p>The forming scarcities are exactly what was preview-described in Article 1&#8217;s worked example on the inference layer. The eighteen-to-twenty-four-month formation window remains the operative one. Companies establishing positions in inference optimization, infrastructure, and developer adoption before late 2027 will hold those positions for years. Companies attempting to establish them later will not.</p><h2>What Consensus is Mispricing</h2><p>The inversion is widely acknowledged in industry discussion. Three aspects of it remain consistently mispriced.</p><p>The first is the speed. Capital allocation models still treat inference as a future load and training as the current cost center, even though compute-hour utilization has already crossed the inflection point. The lag between actual usage and reported financials creates a window in which inference-capable infrastructure is being underbuilt relative to where workload will be in two years. The mispricing is straightforward: the market is funding training capacity as if 2024 will continue, while the underlying workload composition has already moved on.</p><p>The second is the workload composition asymmetry. Article 1&#8217;s multi-dimensional caveat applies directly: not all inference is the same inference. Agent workloads, reasoning workloads, and long-context workloads carry materially different price structures than chat-style inference. Output tokens cost three to ten times what input tokens cost. Cached input is a fraction of non-cached. Agent loops generate high-output, low-cache token compositions that no average price-per-token figure captures. The companies that win at inference are not the companies with the lowest published price-per-token; they are the companies whose serving economics work across the specific workload mixes their customers actually run.</p><p>The third is where the durable margin will sit within the inference layer. Public commentary tends to assume that whoever wins inference will be whoever has the largest GPU fleet or the cheapest electricity. The structural reading is different. The lasting margin sits at the workflow tier, where developer integrations, evaluation frameworks, and the routing intelligence that decides which model handles which query become the actual moat. Raw inference capacity will commoditize the way training capacity is commoditizing now. The value will be captured by the tier that decides what to do with the capacity, not by the tier that provides it.</p><h2>What the Series Will Treat Next</h2><p>The inversion is the structural event. The Agent Tax, which Part 3 of this series treats next, is what the inversion looks like at the application company&#8217;s profit-and-loss statement. Agents are the workload force that bends the inference-layer cost curve upward fastest, because agent tasks consume an order of magnitude more output tokens than the seat-based interfaces they replace. The Agent Tax article unpacks what that means for the unit economics of every application company building on top of inference.</p><p>Part 4 treats the open-weight question more deeply: whether open-weight competition continues to compress training margin, what its structural endpoint is, and which layer of the stack captures the released value. Part 5 examines China&#8217;s parallel infrastructure stack, where the inversion is playing out under different policy and capital constraints. Parts 6, 7, and 8 deploy the framework across pricing intelligence, the efficiency frontier, and the layer-by-layer Margin Geography that the series has been building toward.</p><p>The inversion is happening. Where it goes next, layer by layer, is the question the series is built to answer.</p><div><hr></div><p><em>This is <a href="https://www.robonaissance.com/t/tokenomics">Tokenomics</a>, a series that explores the economic physics of the AI era, measured in the unit that runs it all.</em></p><div><hr></div><p><strong>Disclaimer:</strong> This article is for informational purposes only and does not constitute investment, financial, or legal advice.</p>]]></content:encoded></item><item><title><![CDATA[Tokenomics, Part 1: The Token Is the New Kilowatt-Hour]]></title><description><![CDATA[Token prices have fallen by two orders of magnitude in three years, more on the cheapest routes. The Token Stack. Margin Geography. Where the value went, and where it is going next.]]></description><link>https://www.robonaissance.com/p/tokenomics-part-1-the-token-is-the</link><guid isPermaLink="false">https://www.robonaissance.com/p/tokenomics-part-1-the-token-is-the</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Sat, 16 May 2026 17:42:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Litb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Litb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Litb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Litb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Litb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Litb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Litb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg" width="1248" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:336254,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197847869?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Litb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Litb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Litb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Litb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb41a763e-0ac5-4d0d-a85b-3e99b9f5edeb_1248x832.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A million tokens cost roughly thirty dollars on a leading API in early 2023. By 2026 the same million tokens, at production quality on competitive APIs, cost in the range of ten to thirty cents. On cached inputs and small-model routes, single-digit cents or less. Two orders of magnitude at the headline, three or more at the extremes, in three years. There is no precedent in the recent history of compute for a unit to compress that fast.</p><p>The question this series asks is not how it happened. The mechanics are widely covered. The question is where the margin went.</p><p>When a commodity drops a hundredfold in three years, and ten times further along the cheapest paths, it produces a slaughter. The slaughter has winners. The pricing power did not disappear when token costs fell. It migrated. Understanding where it migrated to, why, and where it is going next, is the only question in AI economics that consistently rewards reading.</p><p>This is the first article in <em>Tokenomics</em>, a series about the economic physics of artificial intelligence. The unit of analysis is the token. The framework is geographic: margin sits somewhere in the AI stack, and that somewhere is moving. Over nine articles, the series tracks the movement, layer by layer, with the goal of producing a structured way to think about value capture in an industry where conventional moat analysis has stopped working.</p><p>This article does two things. It defines the unit. And it introduces the two frameworks that the rest of the series deploys.</p><h2>The Unit</h2><p>To analyze a token economy you have to know what a token is. The technical answer is short. A token is the elemental piece of text a language model processes. Most short words are one token. Long or unusual words split into multiple tokens. A page of English prose is roughly five hundred tokens. The model receives a sequence of tokens as input, produces a sequence of tokens as output, and the operating cost of any AI service reduces, finally, to the count of tokens passing through.</p><p>The economic answer matters more. The token is the atomic, billable unit of a capability that was previously unmetered. Before token pricing, intelligence was something a company employed, in the form of staff, with overhead costs and contracts. After token pricing, intelligence is something a company purchases, by the unit, the way it purchases electricity. This is the structural shift. The token is what makes AI an economy rather than a research field.</p><p>Electricity is the canonical case for what happens next. The phenomenon was demonstrated in laboratories for decades before it transformed industry. What changed was not the invention. What changed was the unit cost. When electricity became cheap enough to meter, price, and budget against, every industry that touched it restructured itself around the kilowatt-hour. Steel reorganized. Aluminum became economically possible. Whole categories of business that could not exist at the old cost emerged because the new cost made them viable.</p><p>The token is at the same point. The phenomenon of language models has been demonstrated for years. What is changing now is the unit cost. The hundredfold-and-beyond price compression is the unlock, not the underlying technology. As tokens become cheap enough to budget against, AI is restructuring itself around the unit. The companies that survive the next five years will be the ones whose business models reorganize around tokens the way prior generations reorganized around the kilowatt-hour. The companies that do not survive will be the ones whose models were built on the assumption that the unit cost would stay where it was.</p><h2>The Scissor of Token Economics</h2><p>Token economics moves along two curves, governed by independent forces. Confusing them is the most common error in commentary about AI costs.</p><p>The first curve is <strong>the price of a token</strong>. What it costs to produce one. Two forces compress this price. <strong>Hardware</strong> raises throughput per dollar of infrastructure: faster chips, denser memory, better interconnects. Every improvement in hardware throughput lowers the cost of producing a token. <strong>Model</strong> improvements let the same hardware produce the same quality output with less computation. Mixture-of-experts designs activate only a fraction of parameters per token. Distillation produces small models with capabilities close to large ones. Quantization runs models at lower numerical precision without significant quality loss. Speculative decoding, sparse attention, and a half-dozen other techniques each cut the price of a token by some multiplier. The compounding effect across these innovations is the hundredfold-and-beyond price collapse the headlines celebrate. Model innovation is responsible for more of the compression than hardware is.</p><p>The second curve is <strong>the token consumption of a task</strong>. How many tokens a task requires to be completed. <strong>Workload</strong> is the consumption pattern that decides this number, and it is moving in the opposite direction from price. A chat query consumes a few hundred tokens. A code completion consumes a few thousand. A document analysis consumes tens of thousands. An agent task that orchestrates sub-agents, each calling tools, each processing context, consumes millions. As workloads shift toward agents and longer context, token consumption per task rises.</p><p>The two curves move against each other. Per-token price has fallen by two orders of magnitude at the headline and more on the cheapest routes. Per-task consumption is rising by orders of magnitude as agents become standard, depending on which workload class you measure. Multiply them, and the product is what an enterprise actually pays per task.</p><p>This is the first lens. For any specific question the series will treat, the answer factors along both curves: what hardware and model innovations did to the token price, what workload evolution did to the tokens per task, and which blade of the scissor the question is really asking about.</p><p>The two-curve scissor is a first-order model. Real token economics fragments further. Token prices split into input and output, cached and non-cached, with output typically three to ten times the cost of input and cached routes a fraction of either. Token consumption fragments by task type, with agent workloads weighted toward high-output and low-cache shares that no aggregate average captures. These dimensions multiply within the scissor; they do not change its direction. The series unpacks them, layer by layer, in later articles.</p><h2>The Token Stack</h2><p>This brings the series to the first of its two frameworks.</p><p>The Token Stack maps the cost structure of the AI economy. Four layers, stacked vertically.</p><p>At the top, applications. What users actually pay for. The products, the subscriptions, the per-call APIs that sit closest to the customer&#8217;s wallet.</p><p>Below applications, inference. The cost of running models against user inputs. Measured per token. Governed by latency requirements, serving infrastructure, and pricing strategy.</p><p>Below inference, the model. Pre-training and post-training, the cost of producing a model that is capable of being served at all.</p><p>Below the model, infrastructure. The chips, the energy, the data centers, the cooling. The physical substrate that everything above depends on.</p><p>Cost flows up. A user paying for an application is paying, indirectly, for the inference that produced their result, which is paying for the model that was trained to make that inference possible, which is paying for the infrastructure that supports the whole arrangement. The price of a subscription contains, embedded within it, the price of every layer below it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!foDL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!foDL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 424w, https://substackcdn.com/image/fetch/$s_!foDL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 848w, https://substackcdn.com/image/fetch/$s_!foDL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 1272w, https://substackcdn.com/image/fetch/$s_!foDL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!foDL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png" width="1456" height="920" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:920,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:122223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197847869?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!foDL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 424w, https://substackcdn.com/image/fetch/$s_!foDL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 848w, https://substackcdn.com/image/fetch/$s_!foDL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 1272w, https://substackcdn.com/image/fetch/$s_!foDL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff900253e-fcc6-464b-889d-3c3d333e8f38_1456x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The Token Stack: the cost structure of the AI economy in four layers. Each layer pays for everything below it; user payment at the top supports the whole arrangement. The price compression from $30 per million tokens in 2023 to ten to thirty cents on competitive APIs in 2026, with single-digit cents and lower on cached and small-model routes, is what makes the rest of the framework matter.</em></p><p>This is the descriptive map of the AI economy. It tells you what costs what, what units mediate value, and how the layers depend on each other. It is the cleanest mental model for translating &#8220;AI is getting cheaper&#8221; into a precise statement about which component is getting cheaper, which is getting more expensive, and how the net effect produces a hundredfold-and-beyond collapse at the user-facing layer.</p><p>But the Token Stack does not answer the question this series is built around. The Token Stack tells you what something costs. It does not tell you who keeps the money.</p><h2>The Other Half</h2><p>There is a second framework, and it does most of the analytical work in the series.</p><p>Margin Geography maps where value gets captured in the AI stack, and where that capture is migrating to next.</p><p>If the Token Stack reads the supply side, what it costs to produce a token, Margin Geography reads the demand side and the capital side, where revenue accumulates from token consumption and how those positions change as the economy scales. The two frameworks are not redundant. They are the two faces of the same economy. One asks what it costs. The other asks who keeps it.</p><p>The framework operates along three dimensions.</p><p>The first is the stack layer. Same four layers as the Token Stack: applications, inference, model, infrastructure. Margin can sit in any layer, in several layers at once, and shift between them.</p><p>The second is the scarcity type. A layer holds margin not because of where it sits but because of what makes it scarce. Seven scarcity types recur: capital, talent, energy, distribution, data, regulatory, and network-effect. A margin position resting on one scarcity is fragile. A position resting on three reinforcing scarcities is durable. The seven types are the analytical alphabet of the framework.</p><p>The third dimension is the migration pressure. Margin in AI does not settle. It moves faster than in any prior technology cycle, for reasons specific to this industry: the price collapse on the underlying unit, the open-weight competition that sets a floor on every layer, the architectural substitution that displaces premium capability with cheap capability. For every layer that holds margin today, identify the active pressure that is eroding the scarcity. Substitution, commoditization, disintermediation, geopolitical shift, demand saturation. The intersection of layer, scarcity type, and migration pressure produces the framework&#8217;s primary output: a map of where margin sits today, and a thesis about where it is moving next.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Psu4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Psu4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!Psu4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!Psu4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!Psu4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Psu4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png" width="1456" height="980" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/adc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:980,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:122967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197847869?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Psu4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 424w, https://substackcdn.com/image/fetch/$s_!Psu4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 848w, https://substackcdn.com/image/fetch/$s_!Psu4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 1272w, https://substackcdn.com/image/fetch/$s_!Psu4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadc97398-480b-4eee-ab13-a9d8e4b7b1a7_1456x980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Margin Geography of the AI stack, as of mid-2026. Each cell shows whether a scarcity type holds the corresponding layer, and at what strength. Amber dashed borders mark scarcities that are forming. Diagonal hatching marks scarcities that are eroding. The three groups of amber arrows track migrations currently underway. Each migration is the subject of subsequent articles in the series.</em></p><p>The framework is not a stock-pick tool. It does not produce specific buy or sell recommendations on specific companies. It operates at the layer level. Specific companies appear in the series as evidence of where margin sits at a moment in time, not as the unit of analysis. This is a deliberate constraint. The companies that hold margin today are not always the companies that will hold it three years from now, and the framework is built to surface the migration, not to recommend the position.</p><h2>One Worked Example</h2><p>To show the framework in action, consider one layer: inference.</p><p>The inference layer in 2026 holds margin where three scarcity types are forming at once.</p><p>Talent is the most acute. Building serving infrastructure that delivers competitive token throughput at acceptable latency and acceptable cost requires optimization engineering that has not yet diffused. The teams that have solved the inference problem at production scale are concentrated in a small number of organizations. The expertise is portable in principle but slow to spread in practice.</p><p>Capital is the second. Specialized inference infrastructure requires investment that smaller players cannot match. The serving optimizations that produce competitive economics depend on hardware configurations, software stacks, and operational practices that compound through expenditure over time.</p><p>A network effect is forming as a third scarcity. Once an application has been built on an inference provider&#8217;s API patterns, latency characteristics, and tooling, the switching costs are real. The lock-in is not yet permanent, but it is forming.</p><p>What is the active migration pressure on these scarcities? On talent, the diffusion is gradual but ongoing. On capital, the picture is mixed: hyperscale spending continues to flow into inference infrastructure, but custom silicon programs at the largest cloud providers are starting to compress the margin available to general-purpose inference providers. On the network effect, the layer is still forming; the developer lock-in is not yet locked.</p><p>What does this read of the layer imply? The inference layer is in a margin formation phase, not a margin erosion phase. The companies that establish defensible positions in the next eighteen to twenty-four months will hold those positions for years. The companies that fail to establish positions in that window will not establish them later. The layer locks in once. This is the kind of structural reading Margin Geography is designed to produce.</p><h2>What the Series Will Do</h2><p>This article is the entry point. Over the next eight articles, the framework gets deployed across the AI stack.</p><p>Article 2 takes the most consequential migration first: training to inference, the move that is reshaping the largest pool of AI profit in the industry. Article 3 examines the agent tax, the structural transfer of margin that agent adoption produces, with attention to how seat-based software pricing is giving way to token-based consumption. Article 4 maps the applications layer in detail, where a three-way split between thin wrappers, token-consuming SaaS, and workflow-rich applications is producing a divergence that will define enterprise software for the rest of the decade.</p><p>Article 5 analyzes the infrastructure layer at the margin-analysis level, with deliberate complementarity to the dedicated industry research that handles the supply-side detail better than a publication should attempt. Article 6 treats pricing as the active mechanism through which margin gets distributed between layers, with attention to which pricing transitions are actively reshaping the geography. Article 7 examines efficiency innovation as a margin redistribution event whose direction the market consistently misreads.</p><p>Article 8 reads China&#8217;s parallel margin map, structurally different from the US stack in ways that English-language coverage rarely captures. Article 9 projects the geography forward, with the framework deployed against a five-year horizon.</p><p>The thesis runs through every article. Tokens are the atomic unit. The cost is going one way. The margin is going somewhere else. Knowing where it is going is the question that rewards careful work in AI economics. This series is an attempt to do that work in public, on a unit small enough to count and a stack large enough to remap an industry.</p><div><hr></div><p><em>This is <a href="https://www.robonaissance.com/t/tokenomics">Tokenomics</a>, a series that explores the economic physics of the AI era, measured in the unit that runs it all.</em></p><div><hr></div><p><strong>Disclaimer:</strong> This article is for informational purposes only and does not constitute investment, financial, or legal advice.</p>]]></content:encoded></item><item><title><![CDATA[The Humanoid Value Chain: Where Does the Money Actually Go?]]></title><description><![CDATA[The trillion-dollar stack. Six layers. Three profit pools. Where the next decade of capital actually accrues.]]></description><link>https://www.robonaissance.com/p/the-humanoid-value-chain-where-does</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-humanoid-value-chain-where-does</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Thu, 14 May 2026 16:15:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YcmQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YcmQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YcmQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!YcmQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!YcmQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!YcmQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YcmQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2536666,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197712072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YcmQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!YcmQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!YcmQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!YcmQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F410ace35-0cc4-4c58-9fc6-9f285afb8f1f_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two years ago, Unitree had one humanoid robot. The H1 cost about as much as a Tesla Model S. $90,000, give or take.</p><p>Today Unitree has four. The R1 starts at $4,900. The G1 at $13,500. The H2 at $29,900. The H1 at $90,000 still anchors the top.</p><p>Unitree&#8217;s IPO prospectus, filed in March on Shanghai&#8217;s STAR Market, shows the company&#8217;s average selling price across this expanded humanoid line: &#165;593,400 in 2023, &#165;260,700 in 2024, &#165;167,600 in the first three quarters of 2025. A 72% decline. Five units sold in 2023. Over five thousand five hundred in 2025, making Unitree the world&#8217;s top-selling humanoid manufacturer.</p><p>Unitree is one company among hundreds. The same 24 months saw the global humanoid funding pool grow faster than at any point in history. $3.1 billion across 60+ deals in the first half of 2025 alone, exceeding the entire 2010-to-2024 cumulative total. Morgan Stanley projects $5 trillion by 2050. Barclays calls humanoids the next major industrial theme. Every major investment bank now has a coverage report.</p><p>What none of those reports tells you, with any useful precision, is where the money will actually go.</p><p>The standard Wall Street framing groups everything into &#8220;humanoid robotics&#8221; as if it were one industry. It isn&#8217;t. The stack contains at least six layers, and the profit pools beneath them behave in structurally different ways. Some layers will commoditize within five years. Some will produce a handful of trillion-dollar winners. Some will generate textbook mass-extinction events that wipe out most current entrants. Conflating them is how fortunes get destroyed.</p><p>This essay is a first attempt to map the stack at the resolution the moment requires. Six layers, decomposed. Three profit pools, named. Representative companies positioned in each. And a thesis at every level.</p><p>This is the first article of Capital &amp; Atoms, a new research thread within Robonaissance focused on the investment economics of humanoid robotics. If the framework here resonates, the articles that follow will go deeper.</p><p>The funding flow is undeniable. The framework is missing. That&#8217;s the gap this essay tries to fill.</p><div><hr></div><h2>The Analysis Gap</h2><p>Walk through the public reports a careful investor would actually read.</p><p>Morgan Stanley&#8217;s Humanoid 100 partitions companies into Brain, Body, and Integrators, which is the closest existing approximation to a stack view. But the framework is built for stock screening, not for the active allocation thesis a humanoid stack at this stage of development requires. Other major banks have published coverage of varying depth, but none, in my reading, treats the stack as an active capital allocation framework that takes a position at each layer.</p><p>The same gap exists at higher resolution. The mainstream tech press covers humanoid demos as spectacle. The robotics-specific newsletters cover the news flow. The AI newsletters cover frontier model releases. But the joint distribution of &#8220;I read the latest VLA paper&#8221; and &#8220;I can model the cap table for the company that trains it&#8221; is small, and the cross-domain analysis that joins the two sides does not yet exist as a public research product.</p><p>It needs to. In dollar terms, the humanoid buildout is already comparable to early-stage AI compute. Figure&#8217;s $1 billion Series C at $39 billion, Physical Intelligence&#8217;s $600 million round at $5.6 billion, Skild&#8217;s $1.4 billion at $14 billion, Apptronik&#8217;s $935 million Series A at $5 billion, Agility reportedly raising another $400 million. These are not seed-stage bets. These are growth-stage commitments being made by sovereign wealth funds, OpenAI, Microsoft, Nvidia, Bezos, and increasingly the largest sovereign-linked capital pools in Asia. The decisions being made right now will shape the cap table of the 2030s.</p><p>Three forces compound the urgency. First, the Chinese supply chain is now demonstrably ahead in component manufacturing for the most expensive parts of the stack, and that fact is absent from most English-language analysis. Second, the foundation-model layer has bifurcated into at least four distinct technical bets in the last eighteen months, each with starkly different commercial implications. Third, Unitree&#8217;s IPO and the public-market path that follows will force valuation discipline onto a sector that has been pricing itself privately, with all the opacity that implies.</p><p>This is the moment the framework must arrive. The next eighteen months will see tens of billions of dollars allocated against whatever framework readers have available. If that framework is a one-line entry in a TAM projection or coverage list, the misallocation will be enormous.</p><p>Capital &amp; Atoms exists to close that gap. The format is straightforward. Each piece maps one slice of the humanoid robotics landscape at the resolution actual capital allocation requires. Some pieces will be technical reads of foundation model architectures. Some will be supply-chain decompositions down to the specific Chinese precision-component supplier. Some will be uncomfortable assessments of which companies are going to fail. All of them will end with a structured thesis the reader can take to a meeting and use.</p><p>The first piece is this one. The stack itself.</p><div><hr></div><h2>The Six-Layer Stack</h2><p>A humanoid robot, viewed as an investment object, is not one product but a stack of six. Each layer has its own competitive dynamics, its own profit margin profile, and its own answer to the only question that matters in the long run: what makes the cash flow defensible.</p><p>The layers, from intelligence down to industrial physics:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8hgF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8hgF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 424w, https://substackcdn.com/image/fetch/$s_!8hgF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 848w, https://substackcdn.com/image/fetch/$s_!8hgF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 1272w, https://substackcdn.com/image/fetch/$s_!8hgF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8hgF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png" width="1180" height="915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:915,&quot;width&quot;:1180,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4326778,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197712072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8hgF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 424w, https://substackcdn.com/image/fetch/$s_!8hgF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 848w, https://substackcdn.com/image/fetch/$s_!8hgF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 1272w, https://substackcdn.com/image/fetch/$s_!8hgF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f764068-ea5b-4997-bded-816d461d87fa_1180x915.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>L1</strong> Foundation Models <strong>L2</strong> Integrators <strong>L3</strong> Sensing <strong>L4</strong> Actuation <strong>L5</strong> Edge Compute <strong>L6</strong> Data Infrastructure</p><p>Each layer below gets the same treatment. What it does in one sentence. The current competitive shape. Where the margin lives. Representative names with valuation reference points where available. And a thesis I expect to be held to.</p><div><hr></div><h3>L1: Foundation Models</h3><p>This is the layer where the robot&#8217;s brain is trained. Vision-Language-Action models, or VLAs, learn to take in pixels and instructions and emit motor commands. The training happens in data centers on clusters of thousands of GPUs. What gets shipped to the robot is the resulting policy: a set of weights and an inference recipe. The technical lineage runs from Google&#8217;s RT-2 in 2023 through Physical Intelligence&#8217;s &#960;-0 in 2024 and &#960;-0.5 in 2025, alongside Figure&#8217;s Helix and a growing list of open-weight contenders.</p><p>The competitive shape is recognizable to anyone who lived through the LLM moment of 2022 to 2024. A small number of well-capitalized labs are racing to train ever-larger generalist policies on a combination of teleoperation data, simulation, and internet-scale video. Physical Intelligence has raised roughly $1.1 billion in total, most recently $600 million led by Google&#8217;s CapitalG at a $5.6 billion valuation, and has recruited a meaningful share of the senior robotics learning talent that left Google and Meta. Skild has raised roughly $1.8 billion in total, with its January 2026 round led by SoftBank tripling its valuation to over $14 billion on a hardware-agnostic generalist-policy thesis. Google DeepMind continues to publish, with the Gemini Robotics line representing the in-house bet. 1X is building its own foundation model in tight integration with its NEO platform. Several Chinese efforts, most visibly from Galbot and Agibot, are training on domestic hardware fleets that the Western labs cannot easily replicate.</p><p>Margins at this layer, if a winner emerges, will be among the highest in the entire stack. The economic structure mirrors the LLM API business. A trained policy is software, replicable at near-zero marginal cost, distributed across whatever hardware customers operate. The cost of training is enormous and rising. The cost of serving is meaningful but improvable. The moat is data flywheel and talent density, both of which compound. The data flywheel in this layer is built on teleoperation data, simulation, and internet-scale video, not on real-world deployment data, which accrues to whichever L2 integrator owns the robot fleet. This distinction matters: if real-world deployment data becomes the dominant training signal over the next five years, L1 economics deteriorate and the layer&#8217;s value migrates to L2.</p><p>The risk is also recognizable. Most of these labs will not survive as independent entities. The training-cost curve and the talent-cost curve both run in directions that favor concentration. Two or three winners is the historical pattern in AI infrastructure, and there is no obvious reason robot foundation models break the pattern.</p><p><strong>Thesis</strong>: L1 is the highest expected-value layer in the stack and the highest variance. Within five years, two players will hold more than 60% of meaningful enterprise robot foundation model deployments outside Tesla&#8217;s vertical stack. I expect Physical Intelligence to be one. The second seat is genuinely contested between Skild, Google&#8217;s in-house effort, and a Chinese entrant currently underweight in Western analysis.</p><div><hr></div><h3>L2: Integrators</h3><p>This is the layer everyone photographs. The full humanoid: legs, torso, arms, hands, head, the thing that walks across the stage at conferences and ends up on magazine covers. Figure, Apptronik, Agility, 1X, Unitree, UBTech, Tesla, plus a long tail of Chinese entrants whose names will be familiar in two years and forgotten in five.</p><p>The competitive shape here is the most crowded in the stack. Counterpoint Research counts more than 100 active humanoid companies in China alone, with the analyst consensus expecting consolidation to a few dozen survivors over the next IPO cycle. The American field is smaller in count but no less crowded in capital. Figure raised $675 million at a $2.6 billion valuation in early 2024, then closed a Series C exceeding $1 billion at a $39 billion post-money valuation in September 2025. Apptronik raised $350 million in early 2025, then extended its Series A with another $520 million in February 2026 at a roughly $5 billion valuation. Agility was reported to be raising $400 million. 1X has shipped early units of NEO into homes. Tesla is a category unto itself, with Optimus serving simultaneously as the most-watched humanoid program and the least independently verifiable.</p><p>Margins at this layer follow the historical pattern of integrated hardware businesses, which is to say, thin and brutal. The bill of materials is dominated by precision components purchased from L4 suppliers. The software is increasingly purchased or licensed from L1 model providers, or built in-house at significant ongoing R&amp;D cost. What remains for the integrator is design, system integration, manufacturing scale, and customer relationships. The closest analogue is the automotive industry, where companies that sell complete vehicles run gross margins in the high teens to mid-twenties and net margins in the single digits, while their tier-1 component suppliers can run higher.</p><p>Unitree is the early disconfirming case. Profitable, scaling, and aggressively cutting price. The IPO prospectus shows operating margins that look more like a precision-component manufacturer than a traditional integrator, which is what happens when you vertically integrate the highest-cost components and serve a market segment that values capability per dollar over polish. Whether that economic structure scales beyond hobbyist and research customers into the industrial workhorse segment is the open question of the next 24 months.</p><p><strong>Thesis</strong>: L2 is the layer most likely to produce textbook capital destruction at the median, alongside two or three trillion-dollar winners at the right tail. The expected value calculation is real but the variance dominates. The winners will look like vertically-integrated automakers, not like SaaS companies. Apple&#8217;s iPhone is a bad analogy. Toyota&#8217;s Camry is a better one.</p><div><hr></div><h3>L3: Sensing</h3><p>Before a humanoid can do anything useful, it has to know what is in front of it. Cameras, depth sensors, force sensors at the joints, tactile sensors at the fingertips, IMUs in the torso, microphones, and the perception software that turns raw signals into a usable scene representation. The sensor stack on a current-generation humanoid runs to several thousand dollars per unit. The stack on a humanoid that can actually fold laundry without breaking the fabric runs higher, and the gap between those two numbers is where the layer&#8217;s investment thesis lives.</p><p>The competitive shape is fragmented and largely consumer-electronics-adjacent. Vision sensors come from Sony, OmniVision, and a long tail of CMOS image sensor specialists. Depth comes from Intel RealSense&#8217;s spiritual successors, Orbbec out of China, and a slow drift toward time-of-flight modules sourced from the smartphone supply chain. LiDAR for robotics is increasingly converging on solid-state designs from Hesai, Innoviz, Luminar, and several Chinese auto-LiDAR pivots. Tactile sensing remains genuinely unsolved at the price points humanoids will tolerate, with research-stage entrants like GelSight competing against industrial incumbents from semiconductor and medical adjacencies. Force-torque sensors are dominated by ATI Industrial Automation and a handful of Japanese specialists. None of these companies were founded for humanoid robotics. All of them are now positioning themselves as humanoid suppliers.</p><p>Margins at this layer split sharply along two lines. Commodity vision and IMU components carry consumer-electronics gross margins in the 20% to 30% range, with humanoid volumes too small to command pricing premiums. Specialized sensors with no consumer analogue, particularly tactile and high-end force-torque, carry industrial-component margins of 50% or more, but the addressable market is small and the customer base is technically demanding. The bifurcation matters because investors looking at L3 see &#8220;robot sensors&#8221; as a single theme when the actual economics live at opposite ends of the margin spectrum.</p><p>The structural fact that defines L3 is that humanoids are not yet a large enough market to drive the sensor roadmap. The roadmap is being driven by smartphones, automotive, and AR/VR. Humanoids will inherit components designed for those applications, with one or two specialized exceptions. This is good for cost curves and bad for differentiation.</p><p><strong>Thesis</strong>: L3 produces a small number of category winners in the genuinely robot-specific sensor categories, particularly tactile, while the rest of the layer commoditizes faster than the integrators expect. Investors looking for L3 returns should focus on tactile and force-torque, not on cameras and LiDAR.</p><div><hr></div><h3>L4: Actuation</h3><p>If L1 trains the brain, L4 builds the body. Harmonic reducers, planetary roller screws, frameless torque motors, encoders, controllers, and the entire mechanical system that converts electrical signals into precise physical force. By Barclays&#8217; decomposition, this single layer accounts for roughly 50% of humanoid production costs. By any honest reckoning, it is also the layer where the United States has lost the supply chain.</p><p>The competitive shape is the inverse of L1. Instead of a small number of well-capitalized labs racing on talent and training compute, L4 contains decades-old precision manufacturers. Mature production processes. Deep customer relationships. The kind of capital efficiency that only shows up in industries where the cost of a bad part is measured in field failures rather than user complaints. Harmonic Drive Systems out of Japan has been the global benchmark in strain wave reducers since the 1970s. Nidec dominates a wide range of motor categories. SKF and NSK in bearings. ABB and Siemens in motion control. The Western and Japanese incumbents are profitable, technically excellent, and structurally vulnerable.</p><p>The vulnerability is volume. Chinese precision-component manufacturers, particularly &#32511;&#30340;&#35856;&#27874; (Leaderdrive), &#21452;&#29615;&#20256;&#21160; (Shuanghuan Driveline), and &#20013;&#22823;&#21147;&#24503; (Zhongda Leader), have spent the last decade closing the technical gap on harmonic reducers and planetary roller screws while operating at cost structures the incumbents cannot match. The Chinese players were not optimizing for humanoid robots. They were optimizing for industrial automation, electric vehicles, and machine tools. The humanoid wave arrived as a windfall, layered on top of supply chains already at scale.</p><p>Margins at this layer follow precision-manufacturing economics. Gross margins in the 30% to 40% range for established players, operating margins in the high teens, with significant moat from process know-how and customer qualification cycles that take years. Once a humanoid integrator qualifies a harmonic reducer supplier, switching is expensive enough that the relationship tends to persist for the product lifetime. The Chinese suppliers are now in active qualification with most of the major Western humanoid programs, a fact that will surface in financials over the next two to four quarters.</p><p>Battery, structural, and thermal cost categories, 15-25% of BOM combined, ride existing EV, industrial, and consumer-electronics supply chains. None produces investment opportunities distinguishable from their adjacent end markets, which is why this framework does not treat them as separate layers.</p><p><strong>Thesis</strong>: L4 is where the largest share of humanoid value will accrue over the next five to ten years, and the largest share of that share will accrue to a handful of Chinese precision-component suppliers currently trading at industrial-automation multiples in Shanghai and Shenzhen. This is the most asymmetric trade in the humanoid stack and the one Western analysis is least equipped to see.</p><div><hr></div><h3>L5: Edge Compute</h3><p>If L1 is where the brain is trained, L5 is where it runs. A foundation model trained on a 10,000-GPU cluster is useless if it cannot execute a control loop at 30 hertz on a battery-powered chassis with a compute power envelope measured in tens of watts. L5 is the layer that turns trained policies into onboard inference, in real time, within thermal and power constraints that have no parallel in the data center.</p><p>The competitive shape is NVIDIA-dominant in a way that mirrors its training-side position. The Jetson Orin platform, paired with the Isaac robotics SDK, is the default development target for nearly every Western humanoid program. Qualcomm has positioned its Robotics RB-series chips as the mobile-derived alternative, leveraging a decade of smartphone power-efficiency engineering. Ambarella occupies a vision-focused niche. Tesla and a small set of vertically-integrated programs are designing custom silicon. In China, Horizon Robotics has built genuine share in autonomous driving and is pushing into humanoid edge compute, with Black Sesame and several smaller domestic players in adjacent positions.</p><p>Margins at this layer follow specialty-silicon economics. NVIDIA carries premium silicon margins on Jetson products, sustained by the CUDA ecosystem lock-in that took nearly two decades to build and that humanoid programs are not equipped to circumvent in the near term. Qualcomm runs at smartphone-derived margins, lower than NVIDIA but with significantly stronger power efficiency at given performance levels. The vertical integrators face the classic make-or-buy tradeoff.</p><p>The structural fact that defines L5 is that humanoid robot compute sits at a different point on the price-performance curve than mobile or automotive. Mobile chips optimize for power. Automotive chips optimize for safety certification. Robot chips need both, plus deterministic real-time response that neither mobile nor automotive natively provide.</p><p><strong>Thesis</strong>: L5 is the second-most-defensible toll position in the stack after L4, and NVIDIA&#8217;s Jetson franchise will hold the majority of Western humanoid edge inference through at least 2028. The risk is not Qualcomm or Ambarella but vertical integration. If Tesla, Figure, or 1X successfully ship custom silicon at humanoid volumes within five years, the layer fragments and Jetson&#8217;s share collapses faster than the CUDA moat would otherwise allow.</p><div><hr></div><h3>L6: Data Infrastructure</h3><p>Foundation models at L1 need data. Lots of it. The infrastructure that produces, processes, and pipelines that data into trainable form is L6, and it is the most structurally confusing layer in the entire humanoid stack.</p><p>The competitive shape splits into three sub-layers along distinct economic structures. Synthetic data and simulation, dominated by NVIDIA&#8217;s Cosmos World Foundation Models and Isaac Sim platform, with competition from DeepMind&#8217;s MuJoCo and a growing list of academic and open-source simulators. Real-world data, captured through fleet operations: Tesla&#8217;s deployed Optimus units, 1X&#8217;s NEO home deployments, Figure&#8217;s factory placements, and Unitree&#8217;s research customer base. And open data ecosystems, anchored by Hugging Face&#8217;s LeRobot, Google DeepMind&#8217;s Open X-Embodiment dataset, K-Scale&#8217;s open hardware program, and the steady output of academic robotics groups.</p><p>Margins look completely different across the three sub-layers. Synthetic data and simulation carry software economics, with NVIDIA pricing Cosmos and Isaac access at the kind of margins its data-center silicon products produce. Real-world data has no margin at all, because it is not sold. It accrues to whichever L2 integrator ships robots in meaningful volume, becoming a captive input to that integrator&#8217;s L1 model. Open data ecosystems carry no margin by design, but they create a price ceiling that constrains what commercial data providers can charge.</p><p>The structural fact that makes L6 confusing for investors is that the commercial opportunity and the strategic opportunity live in different places. The commercial opportunity is in synthetic data and simulation tooling, where NVIDIA has a near-decisive lead. The strategic opportunity is in real-world data flywheels, which are not investable as standalone businesses. They are the byproduct of L2 deployment scale, captured by whoever ships first. This split is why L6 sits across two profit pools rather than one: the commercial sub-layer joins the Brain Pool, the real-world data flywheel joins the Body Pool.</p><p><strong>Thesis</strong>: L6 produces one commercial winner at meaningful scale, NVIDIA, sustained by Cosmos and Isaac in tight integration with the rest of its humanoid stack. Real-world data does not produce L6 winners. It produces L2 winners. Pure-play robotics data startups will mostly fail to find venture-scale outcomes, because the moat is in robot deployment volume, not in data infrastructure. The exception is the small number of annotation, simulation tooling, and benchmark companies that successfully attach themselves to a winning L1 lab as preferred providers.</p><div><hr></div><h2>Three Profit Pools</h2><p>The six layers compress, when you look at them through the lens of where returns actually accrue, into three economically distinct pools. Each pool has a different risk profile, a different time horizon, a different appropriate valuation multiple, and a different historical analogue investors can pattern-match against. Mixing them in a single portfolio without recognizing the distinction is how humanoid robotics investment goes wrong.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UF07!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UF07!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 424w, https://substackcdn.com/image/fetch/$s_!UF07!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 848w, https://substackcdn.com/image/fetch/$s_!UF07!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 1272w, https://substackcdn.com/image/fetch/$s_!UF07!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UF07!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png" width="1180" height="955" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:955,&quot;width&quot;:1180,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4515924,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197712072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UF07!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 424w, https://substackcdn.com/image/fetch/$s_!UF07!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 848w, https://substackcdn.com/image/fetch/$s_!UF07!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 1272w, https://substackcdn.com/image/fetch/$s_!UF07!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F181c271e-c523-4940-a740-02a6b2b0f024_1180x955.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Toll Pool</h3><p>The Toll Pool collects rent regardless of which integrator wins. It contains L4 actuation and L5 compute, the two layers where every humanoid built has to buy the same essential inputs from a small number of suppliers with structural pricing power.</p><p>Toll Pool investors are not betting on which humanoid company succeeds. They are betting that humanoid volumes, in aggregate, grow. NVIDIA on the compute side and the Chinese precision-component manufacturers on the actuator side both win whether Figure or Tesla or Unitree ultimately dominates the integrator layer. The bet is on the category, not the category winner.</p><p>Investment logic favors patient capital. Valuation multiples will look more like industrial automation or specialty semiconductors than like AI software. Today&#8217;s Chinese harmonic reducer manufacturers trade in the 25 to 35 times forward earnings range, which feels expensive against industrial benchmarks and cheap against the volume curve coming through the next five years. The historical analogue is TSMC during the smartphone buildout, or ASML during the EUV ramp. Less spectacular than the application-layer winners, but with returns that compound for longer and survive the bust phase intact.</p><p>The Toll Pool risk is that the toll gets disintermediated. Vertical integration by the largest integrators is the main vector. If Tesla, Figure, or one of the Chinese champions successfully internalizes harmonic reducer manufacturing or builds custom inference silicon at humanoid volumes, a meaningful share of toll revenue collapses to internal transfer pricing.</p><h3>The Brain Pool</h3><p>The Brain Pool contains L1 foundation models and L6 synthetic data and simulation. This is the humanoid robotics equivalent of the LLM trade. Winner-take-most economics, software margins at scale, talent and data flywheel moats, and the kind of variance that produces both the trillion-dollar outcomes and the zero-dollar outcomes.</p><p>Investment logic here is venture-style, not industrial-style. Returns are concentrated. Out of the current Brain Pool entrants, the market will sustain two foundation model winners and at most one commercial simulation winner over the next five to seven years, plus whatever the largest integrators build in-house. The remaining labs will be acquired at modest premiums or wound down. There is no graceful middle outcome.</p><p>Valuation multiples in this pool are not anchored to current revenue, which is functionally zero for most entrants. They are anchored to terminal value scenarios that look like Anthropic or OpenAI in 2030. The historical analogue is the 2019-to-2023 LLM lab race, with all of its associated capital intensity and concentration of outcomes. The fact that the public has barely begun to price this pool is, depending on your view, either the largest opportunity in the humanoid stack or the most dangerous late-cycle valuation distortion in technology investing.</p><p>Brain Pool risk is binary. The winners are 100x. The losers are zero. Position sizing must reflect this.</p><h3>The Body Pool</h3><p>The Body Pool contains L2 integrators, L3 sensing, and L6 real-world data. This is the layer everyone photographs and the layer where the most capital destruction will occur. Real-world data is grouped here rather than in the Brain Pool because it is not an independently investable asset: to own it, an investor must own the integrator that captures it. The same logic applies to L3 sensing: the bulk of perception hardware commoditizes against smartphone and automotive demand, while robot-specific perception value is largely captive to L2 in-house development.</p><p>Investment logic mirrors the early automotive industry. Many entrants, brutal competition, capital-intensive scale-up curves, and a small number of survivors who eventually achieve manufacturing economics that earn back the buildout. The right pattern-match is not Apple Inc. It is the period from 1900 to 1930, when something like 2,000 American automakers consolidated to roughly three. The winners earned generational returns. The losers, including some of the most well-funded and best-engineered, lost everything.</p><p>Valuation multiples in the Body Pool will compress dramatically from current private-market levels as the public-market path opens and unit economics become visible. The integrators that survive the compression will be those who internalize enough of the L4 cost structure to escape the auto-industry margin trap, while shipping enough volume to build a defensible data flywheel that strengthens their L1 capabilities.</p><p>Body Pool risk is that the median entrant fails commercially. Two or three right-tail winners will deliver virtually all of the layer&#8217;s positive investor returns. The expected value calculation depends almost entirely on whether the investor can identify those winners early enough to weather a decade of dilution and capital calls.</p><p>The three pools require three different investor types. The Toll Pool wants industrial value investors. The Brain Pool wants venture-style risk capital. The Body Pool wants the kind of patient growth investor who can hold through capital-cycle volatility. Allocating across all three is possible, but the rare fund that does so successfully will be one that recognizes the pools are not interchangeable.</p><div><hr></div><h2>The Humanoid Stack in Five Years</h2><p>The test of the framework above is not whether it sounds compelling today. It is whether the predictions it generates hold up when 2030 arrives.</p><p>Three predictions, each tied to one of the three pools, each accompanied by the conditions that would put it at risk.</p><h3>Prediction One &#183; L4 Consolidates Around Chinese Champions</h3><p>By 2030, three or fewer Chinese precision-component manufacturers will collectively supply more than 60% of harmonic reducers and planetary roller screws used in humanoid robots shipped globally. The current top entrants in this race are &#32511;&#30340;&#35856;&#27874; (Leaderdrive), &#21452;&#29615;&#20256;&#21160; (Shuanghuan Driveline), &#20013;&#22823;&#21147;&#24503; (Zhongda Leader), and &#26469;&#31119;&#35856;&#27874; (Laifual), with a longer tail closing the technical gap. Japanese and German incumbents will retain premium positions in specific high-performance niches but will lose volume share decisively.</p><p>The structural drivers are cost, scale, and policy. Chinese suppliers are already manufacturing at unit economics the Japanese incumbents cannot match without abandoning the precision tolerances that justify their pricing. The humanoid wave layers volume on top of supply chains already at scale for electric vehicles and industrial robots. Chinese industrial policy is actively subsidizing the buildout.</p><h3>Prediction Two &#183; The L6 Surprise Comes From Simulation</h3><p>By 2030, the largest commercial winner in L6 data infrastructure, outside NVIDIA&#8217;s first-party Cosmos and Isaac businesses, will be a company that does not yet exist as a venture-stage entity in 2026, or exists only in seed stage. It will win by building the simulation and synthetic data tooling that humanoid integrators use to train and evaluate policies before deployment, capturing margin in the gap NVIDIA&#8217;s first-party tools leave open.</p><p>This is the most contrarian of the three predictions. Most current Brain Pool capital is flowing to L1 foundation model labs. The L6 simulation layer is currently treated as either NVIDIA&#8217;s territory or as open-source commodity. Both treatments will turn out to be wrong. The category needs a specialist with vertical depth in robot-relevant simulation that NVIDIA is too horizontally focused to build, and that the academic open-source community will not productize at sufficient quality.</p><h3>Prediction Three &#183; The L2 Mid-Tail Wipeout</h3><p>By 2030, of the more than one hundred humanoid integrator companies operating globally in 2026, fewer than fifteen will remain as independent commercial entities. Of those fifteen, three to five will dominate global shipments. The remaining survivors will occupy regional or vertical niches. The wipeout will not happen in a single year. It will arrive as a series of down rounds, quiet acquisitions, and shutdowns spread across 2027 to 2029, with the worst concentration in 2028 as the first generation of growth-stage capital exhausts its runway against deployment economics that fail to scale on the timelines promised in pitch decks.</p><p>Early investors in the median Body Pool entrant will lose substantial portions of their capital. Early investors in the right-tail winners will earn the returns that justify the entire humanoid robotics investment.</p><h2>Where This Begins</h2><p>Five trillion dollars of capital will sort itself across this stack by 2050. Most of it will misallocate. The framework above is one attempt to think about which slice will not. Capital &amp; Atoms is the work of getting it right.</p><div><hr></div><h2>Sources</h2><p><strong>Unitree IPO Prospectus</strong>: Pricing and shipment data, filed March 20, 2026 on Shanghai STAR Market. Coverage via <a href="https://restofworld.org/2026/unitree-china-humanoid-robot-shanghai-ipo/">Rest of World</a>, March 2026.</p><p><strong>Humanoid Funding H1 2025</strong>: $3.1 billion across 60+ deals exceeding 2010-to-2024 cumulative total. <a href="https://www.ipo.club/humanoids-report">IPO Club Humanoids Report</a>, citing iCapital Market Pulse, November 2025.</p><p><strong>Morgan Stanley $5 Trillion Projection</strong>: &#8220;A $5 Trillion Global Market&#8221; research note. <a href="https://www.morganstanley.com/insights/articles/humanoid-robot-market-5-trillion-by-2050">Morgan Stanley Insights</a>, April 2025.</p><p><strong>Figure Series C</strong>: $1 billion+ committed capital at $39 billion post-money valuation. <a href="https://www.figure.ai/news/series-c">Figure AI Official Announcement</a>, September 16, 2025.</p><p><strong>Physical Intelligence Series B</strong>: $600 million round led by Alphabet&#8217;s CapitalG at $5.6 billion valuation. <a href="https://www.bloomberg.com/news/articles/2025-11-20/robotics-startup-physical-intelligence-valued-at-5-6-billion-in-new-funding">Bloomberg</a>, November 20, 2025.</p><p><strong>Skild AI Series C</strong>: $1.4 billion round led by SoftBank at $14 billion+ valuation. <a href="https://www.businesswire.com/news/home/20260114335623/en/Skild-AI-Raises-$1.4B-Now-Valued-Over-$14B">BusinessWire Press Release</a>, January 14, 2026.</p><p><strong>Apptronik Series A Extension</strong>: $935 million total at $5 billion+ valuation. <a href="https://www.globenewswire.com/news-release/2026/02/11/3236352/0/en/Apptronik-Closes-Over-935-Million-Series-A-with-New-520-Million-Extension-Round.html">GlobeNewswire Press Release</a>, February 11, 2026.</p><div><hr></div><p><em>This is <strong><a href="https://www.robonaissance.com/t/capital-and-atoms">Capital &amp; Atoms</a></strong>, a research thread within Robonaissance focused on the investment economics of humanoid robotics.</em></p><div><hr></div><p><strong>Disclaimer</strong>: This article is for informational purposes only and does not constitute investment, financial, or legal advice.</p>]]></content:encoded></item><item><title><![CDATA[Inside China’s Machine: Unitree]]></title><description><![CDATA[Filed for a $7 Billion IPO. The Price Destroyer. 5,500 Humanoids Shipped. Now Profitable. Waiting on a Brain.]]></description><link>https://www.robonaissance.com/p/inside-chinas-machine-unitree</link><guid isPermaLink="false">https://www.robonaissance.com/p/inside-chinas-machine-unitree</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Mon, 11 May 2026 10:21:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wGb7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wGb7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wGb7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 424w, https://substackcdn.com/image/fetch/$s_!wGb7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 848w, https://substackcdn.com/image/fetch/$s_!wGb7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 1272w, https://substackcdn.com/image/fetch/$s_!wGb7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wGb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png" width="1456" height="923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:923,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2213301,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wGb7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 424w, https://substackcdn.com/image/fetch/$s_!wGb7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 848w, https://substackcdn.com/image/fetch/$s_!wGb7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 1272w, https://substackcdn.com/image/fetch/$s_!wGb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F234849b0-7e89-4c59-b2dd-1da501b268a6_1575x998.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On March 20, 2026, Unitree filed for an initial public offering on Shanghai&#8217;s STAR Market (&#31185;&#21019;&#26495;) seeking to raise 4.2 billion yuan ($610 million). Reuters and other market reports estimate a target valuation of 40 to 50 billion yuan (approximately $7 billion), though the prospectus itself does not disclose a valuation figure. The prospectus revealed numbers that no other humanoid robotics company has matched in public disclosure. 2025 revenue of 1.708 billion yuan, up 335 percent year-on-year. Net profit attributable to shareholders of 288 million yuan (up 204 percent), with adjusted net profit excluding non-recurring items of approximately 600 million yuan (up 674 percent). Humanoid shipments of 5,500 units in a single year, which Unitree reported as 32.4 percent of the global market. Omdia&#8217;s separate January 2026 report had placed AgiBot first with 5,168 units and Unitree second with 4,200; the prospectus&#8217;s higher figure likely reflects later Q4 deliveries, and Unitree and AgiBot are effectively tied at the top. The prospectus also shows the path to profitability was not linear: net losses of 22.1 million yuan in 2022 and 11.1 million yuan in 2023, followed by Unitree&#8217;s first scaled profit year in 2024 (94.5 million yuan), and 105.3 million yuan in the first nine months of 2025 before the surge to year-end. Early investors have publicly described Unitree as &#8220;profitable since 2020,&#8221; but the formal financial disclosures show the breakthrough was 2024, with humanoid revenue scaling above quadruped for the first time.</p><p>The founder, Wang Xingxing (&#29579;&#20852;&#20852;), is 36 years old, was born in Ningbo, and built his first quadruped robot as a master&#8217;s thesis at Shanghai University in 2013. He now controls 68.78 percent of the voting rights in what may become the first humanoid robotics company to trade on a major Chinese exchange. He has spent 2026 navigating the most intense year of his career: a Spring Festival Gala performance watched by hundreds of millions, an IPO acceptance under the STAR Market&#8217;s new preliminary review mechanism, an on-site regulatory inspection twelve days later, and a product line that now spans five humanoid models priced from $4,290 to $90,000 plus.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nYSk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nYSk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nYSk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nYSk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nYSk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nYSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg" width="650" height="434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3876abc-f912-443f-839e-6de70c015078_650x434.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:650,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95431,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nYSk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nYSk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nYSk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nYSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3876abc-f912-443f-839e-6de70c015078_650x434.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the story of how a 50-square-meter office in Hangzhou built one of the two highest-volume humanoid robot operations in the world by being ruthless about cost, how an academic supplier became a national symbol, and why the same business model that made Unitree profitable may also be the ceiling it cannot break through without building something it has not yet demonstrated: a capable embodied intelligence.</p><div><hr></div><h2>The Engineer Who Won Twice</h2><p>Wang Xingxing&#8217;s biography reads like a prepared legend, and in Chinese tech media it has been retold often enough to acquire mythological texture. Born 1990 in rural Ningbo, Zhejiang. Undergraduate at Zhejiang Sci-Tech University (&#27993;&#27743;&#29702;&#24037;&#22823;&#23398;), a regional school not in the top tier. Master&#8217;s at Shanghai University (&#19978;&#28023;&#22823;&#23398;), also not top tier. Built XDog, a quadruped robot, as his 2016 master&#8217;s thesis. The robot became a Bilibili sensation, attracted early investors and buyers, and gave Wang his way out of academia.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q6VN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q6VN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Q6VN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Q6VN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Q6VN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q6VN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg" width="1000" height="667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:667,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:128548,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Q6VN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Q6VN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Q6VN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Q6VN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57d12bbc-03b7-4898-a509-03b39946afbd_1000x667.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>He took a job at DJI in 2016 after graduation, then resigned within months to start Unitree in a 50-square-meter office in Hangzhou&#8217;s Binjiang District (&#28392;&#27743;&#21306;) on August 26, 2016. His co-founder and college classmate Chen Li (&#38472;&#31435;) had been running international sales at Hikvision (&#28023;&#24247;&#23041;&#35270;), the Hangzhou-based surveillance giant that built one of China&#8217;s most successful overseas B2B operations before being added to the US Entity List. Chen brought the playbook. Unitree started shipping internationally in 2018. Between 2022 and 2024 overseas sales exceeded domestic revenue, with overseas falling to roughly 39 percent of revenue in the first nine months of 2025 per the prospectus.</p><p>Wang&#8217;s voting control (68.78 percent through direct, indirect, and special-voting-rights arrangements) keeps founder authority exceptionally high at IPO. The shareholder list is a small gallery of Chinese industrial and financial power: Meituan (9.6 percent), HongShan Capital/HongShan China (7.1 percent), Matrix Partners China (5.5 percent), plus investments direct or indirect from Tencent, Alibaba, Ant Group, Xiaomi, and ByteDance. Industrial backers include BYD and Geely. State-backed funds from Shanghai and Beijing also participated. Lei Jun&#8217;s Shunwei Capital has been in since early rounds and has publicly thanked Wang in person. The return on that Xiaomi-affiliated investment is now one of Lei Jun&#8217;s most celebrated bets. </p><p>Wang sat in the front row at President Xi Jinping&#8217;s high-profile business symposium in February 2025, alongside Jack Ma and other Chinese tech founders. Being in that row places a company in a specific political category. Unitree has since been positioned consistently by state media as a national champion of embodied intelligence (&#20855;&#36523;&#26234;&#33021;), the term China&#8217;s 15th Five-Year Plan identifies as a strategic industry alongside quantum computing, 6G, nuclear fusion, and brain-computer interfaces.</p><div><hr></div><h2>The Price Destroyer</h2><p>Unitree ships five humanoid robot models, alongside the quadruped lineup that built the company. Official and current prices from Unitree&#8217;s global shop:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!etXT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!etXT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 424w, https://substackcdn.com/image/fetch/$s_!etXT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 848w, https://substackcdn.com/image/fetch/$s_!etXT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 1272w, https://substackcdn.com/image/fetch/$s_!etXT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!etXT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png" width="1400" height="942" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:942,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5284744,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!etXT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 424w, https://substackcdn.com/image/fetch/$s_!etXT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 848w, https://substackcdn.com/image/fetch/$s_!etXT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 1272w, https://substackcdn.com/image/fetch/$s_!etXT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e3e43-81c6-41e4-9065-1b52a3a37924_1400x942.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On April 30, 2026, Unitree extended the R1 line downward with a dual-arm-only variant starting at 26,900 yuan, roughly $4,290. The new entry-level configuration abandons full bipedal architecture in favor of a fixed base or mobile chassis paired with one or two arms (5 or 7 degrees of freedom each, total 15 to 31 DoF), with optional Nvidia Jetson Orin compute. The same day, the company opened its first direct-sales retail store in Beijing&#8217;s Wangfujing commercial district. Both moves extend the price-destroyer thesis. The new R1 dual-arm puts a humanoid manipulation platform inside the price band of a high-end laptop, and Wangfujing positions Unitree to sell directly to walk-in retail customers, not just researchers and enterprises.</p><p>The pricing numbers tell the story of the business. The G1, launched at $16,000 in 2024, now sells for $13,500 after eighteen months of manufacturing learning. That is less than a premium mountain bike. A university robotics lab can put it on a purchase order; a PhD student can buy one with grant money. No Western humanoid is sold this way. Boston Dynamics&#8217; production Atlas is not on the market: all 2026 units are committed to Hyundai&#8217;s factories and Google DeepMind&#8217;s research. Tesla&#8217;s Optimus has no commercial product. Agility Robotics&#8217; Digit is deployed through Robots-as-a-Service contracts with monthly fees, aimed at logistics operators rather than individual buyers. Unitree is not competing on price against these companies. It is operating in a different product category entirely: a humanoid you can actually purchase, for research, development, or demonstration, without a procurement officer and a multi-year contract.</p><p>The pricing is not achieved through subsidy or loss-leader strategy. The 2024 breakthrough year showed how the model finally compounded: gross margin expanded from 44.2 to 56.4 percent even as revenue grew nearly two and a half times, with 2025 adjusted net profit then growing 674 percent year-over-year per the prospectus. The prospectus reveals how.</p><p><strong>Vertical integration of hardware.</strong> Except for chips, almost all of Unitree&#8217;s robot hardware is designed and manufactured in-house, including motors and reducers (the gearboxes that translate motor rotation into joint movement). The GO-M8010-6 motor, a core component, retails at $369. A Boston Dynamics equivalent component is neither sold publicly nor priced comparably. Chen Li, in a 2024 TechNode interview, described in-house component design as &#8220;our most fundamental competitive advantage.&#8221; The prospectus confirms: vertical integration is the margin source.</p><p><strong>Manufacturing discipline.</strong> Reduce the number of wires. Reduce chips. Reduce screws. Design for assembly. The Hangzhou operation applies standard Chinese consumer-electronics manufacturing discipline to humanoid robots, the same cost-engineering that made DJI dominant in drones. Wang&#8217;s short stint at DJI in 2016 before founding Unitree is underrated in most retellings: he absorbed DJI&#8217;s approach and transferred it to bipedal robotics.</p><p><strong>Supply chain leverage.</strong> Zhejiang province is home to dozens of precision component manufacturers that supply the broader Chinese robotics industry. Unitree buys at scale from suppliers geographically adjacent to its own factory. Alibaba&#8217;s ecosystem and the broader Zhejiang industrial base provide the supporting infrastructure.</p><p>The result: Unitree has effectively created a product category that did not previously exist, a humanoid robot accessible to individual researchers, universities, and hobbyists, at a price point that fits inside a single research grant. In the quadruped market, where the category is mature and Western competitors do sell comparable products commercially, the price delta is stark: Boston Dynamics&#8217; Spot quadruped sells for $74,500, while Unitree&#8217;s Go2 sells for under $2,000 at base configuration. This is a direct comparison between two products sold through the same channels to the same customers. The thirty-fold price delta has collapsed the quadruped research market into near-monopoly: Unitree holds 60+ percent global market share in quadrupeds per Omdia and other third-party research. Whether the same thing happens in humanoids depends on whether industrial deployment becomes a separable market from research and development, or whether Unitree&#8217;s category-defining affordability eventually extends into the factory floor where Atlas and Digit are trying to operate.</p><div><hr></div><h2>Who Actually Buys These Robots</h2><p>The IPO prospectus breaks down Unitree&#8217;s customers with unusual clarity. The picture that emerges is narrower than the brand suggests. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y2mn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y2mn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 424w, https://substackcdn.com/image/fetch/$s_!y2mn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 848w, https://substackcdn.com/image/fetch/$s_!y2mn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 1272w, https://substackcdn.com/image/fetch/$s_!y2mn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y2mn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png" width="1456" height="791" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:791,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1963452,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!y2mn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 424w, https://substackcdn.com/image/fetch/$s_!y2mn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 848w, https://substackcdn.com/image/fetch/$s_!y2mn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 1272w, https://substackcdn.com/image/fetch/$s_!y2mn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395dfaa-5601-487c-88c3-b67fb0f6afef_2097x1139.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Universities and research institutions</strong> remain the largest customer category. The prospectus documents 71 order records from universities, representing roughly half of Unitree&#8217;s business by transaction count. Chinese institutions include Sun Yat-sen University (&#20013;&#23665;&#22823;&#23398;), Southern University of Science and Technology (&#21335;&#26041;&#31185;&#25216;&#22823;&#23398;), and Shenzhen University (&#28145;&#22323;&#22823;&#23398;). International institutions span European and North American research labs across robotics, reinforcement learning, and embodied AI. The China Select Committee&#8217;s May 2025 call to add Unitree to the US Entity List was motivated in part by concern that Unitree&#8217;s quadrupeds have become the default research platform in US academic robotics labs.</p><p><strong>State-owned enterprises and government agencies</strong> became the fastest-growing customer category in 2025. Applications include power-grid inspection (where quadrupeds replace humans in hazardous electrical environments), subway tunnel inspection, gas pipeline monitoring, and &#8220;guided tour and performance&#8221; deployments in offline cultural and tourism venues. In mid-2025, Unitree and Zhipu Robotics jointly won a 124 million yuan China Mobile procurement contract for humanoid biped robot OEM services spanning 2025-2027. Unitree&#8217;s share was 46.05 million yuan for small-form humanoid robots, computing backpacks, and five-finger dexterous hands. The customer was a subsidiary of China Mobile; the application was enterprise reception and tour-guide duties in business halls.</p><p><strong>Enterprise pilots</strong> are the smallest category by revenue but the one with the largest long-term strategic implication. Unitree humanoid industry-application revenue comes mainly from enterprise reception and tour-guide use (50-70 percent), intelligent manufacturing, and intelligent inspection. JD.com (&#20140;&#19996;) is Unitree&#8217;s largest individual corporate customer for quadrupeds. Actual factory floor deployment of humanoids remains limited. Tech Buzz China, in its April 2026 analysis of the prospectus, noted that &#8220;real factories require 95 to 99 percent uptime; current humanoids manage about 90 minutes&#8221; of continuous operation before battery swap or software intervention.</p><p><strong>Consumer and entertainment</strong> buyers drove the fastest revenue growth in 2025. Consumer quadruped sales nearly quadrupled year-on-year in the first nine months of 2025. Entertainment events, marketing demos, and the Spring Festival Gala appearances have produced a secondary revenue stream through rental and performance contracts. This is the revenue the brand is built on, but it is not the revenue that will sustain the 2030 valuation implied by the current IPO target.</p><p>Three facts about the customer mix deserve emphasis. First, the top five customers represent only 10.6 percent of revenue per the prospectus, signaling unusually healthy diversification. Second, 115 winning bids across Chinese public procurement platforms through 2025 confirm the state-sector integration. Third, overseas sales were 39 percent of revenue in the first nine months of 2025, down from above 50 percent in earlier years. Unitree is not a domestic-only story even as US policy pressure intensifies.</p><div><hr></div><h2>The Technology Position</h2><p>Unitree&#8217;s hardware is one of the two highest-volume humanoid platforms in the world by unit count. The intelligence that operates the hardware is not world-class. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WTzl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WTzl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WTzl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WTzl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WTzl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WTzl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg" width="1456" height="1421" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1421,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261284,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!WTzl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WTzl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WTzl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WTzl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff595852e-b9b0-4d49-a59a-753e6fbf3e63_1512x1476.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the honest assessment the prospectus itself implicitly acknowledges. Of the 4.2 billion yuan the IPO is raising, nearly half (approximately 673 million yuan per year over three years) is earmarked for training AI models. Unitree&#8217;s existing R&amp;D expenditure on &#8220;Multimodal Embodied AI Model&#8221; increased exponentially between 2024 and 2025 while other R&amp;D areas grew at a steadier pace. The company&#8217;s own strategic plan, documented in the IPO prospectus and the 2025 &#8220;Statement to Investors&#8221; from Wang Xingxing, is to close the gap between world-class hardware and world-class embodied intelligence.</p><p>The current state of Unitree&#8217;s software capability:</p><p><strong>Motion control and locomotion: world-class.</strong> The Drunken Fist routine, the mid-air backflips, the four-meters-per-second cluster coordination, and the autonomous fall-and-recovery are real demonstrations of world-class motion control through reinforcement learning applied to bipedal dynamics. Unitree&#8217;s robots are among the most physically capable humanoids shipping today.</p><p><strong>Manipulation: competent but limited.</strong> The G1 supports force-controlled three-fingered hands with optional tactile feedback. The R1 and H2 offer upgraded dexterous manipulation. The demonstrations exist; the failure rate on unscripted manipulation tasks is not publicly disclosed.</p><p><strong>Autonomous task completion: limited.</strong> The Spring Festival Gala performance was choreographed. Wang confirmed to 36Kr that the sequences were practiced. Many enterprise demonstrations involve teleoperation with a human operator directing the robot remotely. Autonomous workflow completion in novel environments, which is the capability that would unlock the factory-floor market Tesla and Figure are targeting, remains in development.</p><p><strong>Embodied intelligence foundation model: emerging.</strong> Unitree&#8217;s UnifoLM (Unitree Unified Large Model) initiative, announced in 2025, is the company&#8217;s bet on building its own multimodal embodied AI system. The architecture and training details are not publicly disclosed. The IPO proceeds will fund the scaling of this work over 2026-2028. Whether UnifoLM can produce the kind of task generalization that competitors like Figure&#8217;s Helix and Google DeepMind&#8217;s RT-2 line are demonstrating is the open question.</p><p>The pattern matches a broader Chinese robotics industry thesis: world-class hardware, competitive software at the motion-control and perception layers, and a meaningful gap at the embodied intelligence layer where generalization across tasks and environments requires model capabilities that depend on foundation-model research traditionally stronger in US labs.</p><div><hr></div><h2>The 20,000-Unit Bet</h2><p>Wang told 36Kr in February 2026 that Unitree plans to ship 10,000 to 20,000 humanoid robots in 2026, roughly four times 2025&#8217;s 5,500. Morgan Stanley has doubled its 2026 Chinese humanoid shipment forecast to 28,000 units. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!68_O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!68_O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!68_O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!68_O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!68_O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!68_O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png" width="1000" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Unitree G1 Humanoid Robot&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Unitree G1 Humanoid Robot" title="Unitree G1 Humanoid Robot" srcset="https://substackcdn.com/image/fetch/$s_!68_O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!68_O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!68_O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!68_O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed24a17b-c836-4ab2-9823-4d2247167957_1000x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The production capacity exists. Unitree&#8217;s manufacturing infrastructure claims a 75,000-unit annual capacity ceiling, though this is an aspirational peak rather than a realistic 2026 target. The question is demand.</p><p>Unitree&#8217;s prospectus is unusually specific about where demand comes from. University and research sales are saturating: there are only so many universities in the world, and Unitree&#8217;s quadrupeds and humanoids are already widely deployed across Chinese, US, and European research labs. State-owned enterprise tour-guide applications will continue to grow but have natural ceilings set by the number of business halls, science museums, and cultural sites in China. Entertainment revenue is episodic and does not support twelve thousand units a year.</p><p>The long-term prize, as Unitree and every serious humanoid company acknowledges, is industrial deployment where humanoids substitute for human labor at scale. Automotive assembly. Logistics. Warehouse picking. Assembly-line support. These applications require 95-99 percent uptime, sustained autonomous operation, and the kind of task generalization that embodied intelligence has not yet achieved. Tesla&#8217;s Optimus is competing for this market. Figure is competing for this market. Apptronik, 1X, AgiBot, and a dozen Chinese entrants are all competing for this market. None has demonstrated sustained commercial deployment at scale.</p><p>Unitree&#8217;s bet is that hardware leadership plus three years of concentrated AI model investment plus continued cost reduction produces a viable competitor for the industrial market by 2028. The risk is that hardware advantage matters less than embodied-intelligence capability once deployment reaches factory scale, and Unitree&#8217;s software work has to catch up faster than Tesla&#8217;s, Figure&#8217;s, and Google DeepMind&#8217;s work progresses.</p><div><hr></div><h2>The Risks</h2><p>Five risks deserve acknowledgment.</p><p><strong>The software gap.</strong> This is the most-discussed and the most important. Unitree&#8217;s hardware ships; its embodied intelligence is a work in progress. The IPO funding allocation shows the company is taking the problem seriously, but the outcome is not guaranteed. A company that spends 673 million yuan per year on AI model training over three years is still spending less than what Anthropic, OpenAI, or even MiniMax spends annually on foundation-model research. Hardware margin subsidizing software investment is a defensible strategy only if the software investment converges on useful generalization within the runway.</p><p><strong>US regulatory pressure.</strong> The May 2025 China Select Committee request to designate Unitree as a Chinese military company and add it to the Entity List has not yet resulted in formal action, but the political pressure is real. Security researchers published findings in April 2025 alleging backdoors in Unitree products. Unitree denied intentional backdoors and patched the vulnerability. In September 2025, the same researchers published wormable vulnerabilities affecting the Go2, B2, G1, and H1 lines. If Entity List designation occurs, the 39 percent overseas revenue Unitree reported for the first nine months of 2025 is at immediate risk, and the US academic-robotics customer base that cannot easily substitute away from Unitree hardware becomes a political liability rather than a strategic moat.</p><p><strong>Customer concentration in unsustainable segments.</strong> University sales saturate. Tour-guide deployments have ceilings. The IPO valuation implies industrial-scale deployment by the late 2020s. If that deployment does not materialize, Unitree is a mid-size industrial hardware supplier with a valuation appropriate to a much larger market opportunity.</p><p><strong>IPO regulatory scrutiny.</strong> On April 1, 2026, the China Securities Association randomly selected Unitree for mandatory on-site IPO inspection, just twelve days after the Shanghai Stock Exchange accepted the company&#8217;s STAR Market application. The selection itself was statistically expected (CSAC samples 20 to 33 percent of recent applicants, and Unitree was one of two companies drawn from six Q1 2026 STAR Market applications), but on-site inspections in Chinese IPO history have produced material delays and occasional withdrawals when prospectus claims fail to verify. Independent analysts have already flagged four areas of likely scrutiny in the prospectus: customer concentration disclosure, the gap between humanoid revenue figures and demonstrated factory deployments, the use-of-proceeds plan, and specific revenue-recognition practices. A clean inspection outcome supports the implied 40 to 50 billion yuan valuation. A finding of material deficiencies would force re-rating not just of Unitree but of the broader humanoid IPO pipeline behind it.</p><p><strong>Competitive intensification.</strong> AgiBot shipped 5,168 humanoid units in 2025 per Omdia and has built the fastest production ramp in Chinese humanoid history. UBTECH has China Mobile contracts and Dongfeng/NIO pilot programs. A dozen other Chinese humanoid companies announced IPO plans in 2025-2026. Western competitors including Tesla, Figure, Apptronik, Agility, and 1X are funded at scale. TrendForce projects Unitree and AgiBot together will hold roughly 80 percent of Chinese humanoid shipments in 2026, a duopoly structure that benefits Unitree at the top of the market but offers no protection against AgiBot specifically. Unitree&#8217;s price-destroyer strategy works against Western competitors but not against other Chinese competitors who can match the manufacturing cost structure.</p><div><hr></div><h2>The Implications</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3fOx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3fOx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 424w, https://substackcdn.com/image/fetch/$s_!3fOx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 848w, https://substackcdn.com/image/fetch/$s_!3fOx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 1272w, https://substackcdn.com/image/fetch/$s_!3fOx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3fOx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp" width="1456" height="704" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:704,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166026,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3fOx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 424w, https://substackcdn.com/image/fetch/$s_!3fOx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 848w, https://substackcdn.com/image/fetch/$s_!3fOx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 1272w, https://substackcdn.com/image/fetch/$s_!3fOx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a334a09-a23f-482e-a727-1202ee742964_1920x929.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>For the engineer:</strong> Unitree is the platform Chinese and international embodied-intelligence research runs on. Studying what the community builds on Unitree hardware is studying the actual experimental frontier of embodied AI. The company&#8217;s SDK and integration tooling are, by industry consensus, less polished than Boston Dynamics&#8217; or ANYbotics&#8217;, but the hardware cost-performance ratio compensates. For any engineer working on bipedal locomotion, whole-body control, or manipulation learning, Unitree is the default hardware target.</p><p><strong>For the founder:</strong> Unitree validates a specific thesis: in Chinese hardware manufacturing, vertical integration of motors, reducers, and key mechanical components produces structural cost advantages that Western competitors cannot easily replicate. If you are building anything where the bill of materials matters and where Chinese supply chain access is feasible, the Unitree approach (design in-house, manufacture in-house, ship at prices your competitors cannot match) is the template. If you are building at the embodied intelligence software layer, Unitree is either your partner or your competitor depending on how UnifoLM develops.</p><p><strong>For the investor:</strong> Unitree&#8217;s IPO is the cleanest available exposure to Chinese humanoid robotics at scale, but with three qualifications. First, the technology bet is not yet proven: a hardware leader must converge on software capability. Second, US regulatory risk is real and growing. Third, the pricing implies industrial-scale deployment the industry has not yet demonstrated. The company is now profitable, which differentiates it from every other humanoid robotics company you could buy, but the scaled profit history is one year (2024) plus nine months. Profitability today comes from customer segments (universities, tour-guides, entertainment) that will not support the market capitalization implied by the current IPO target. The investment case requires believing UnifoLM will close the gap in time.</p><div><hr></div><h2>The Body Without the Brain</h2><p>Unitree is one of the two highest-volume humanoid manufacturers in the world, alongside AgiBot. It is also a company whose business model depends on eventually becoming something different than what it is now. The quadruped business that built Unitree is a mature segment where the company has won. The humanoid business that drives the current valuation is a frontier segment where hardware has outpaced intelligence, and where Unitree&#8217;s next three years will determine whether the company graduates from hardware volume leader to viable industrial deployment platform, or settles into a profitable but bounded niche supplying the research and entertainment markets.</p><p>Wang Xingxing wrote in the IPO prospectus that 2026 marks Unitree&#8217;s tenth anniversary. The first ten years built the body. The next ten will determine whether the company can build the brain. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iTW-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iTW-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iTW-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iTW-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iTW-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iTW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:198925,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/197196343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iTW-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iTW-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iTW-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iTW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbced8c13-6f7c-44ba-bbfe-587945e3f13d_1280x720.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On February 16, 2026, approximately twenty-five Unitree robots performed a Drunken Fist kung fu routine on China&#8217;s Spring Festival Gala, the most-watched television broadcast in China. Chinese media reported the cluster as twenty-four G1s and one H2. The robots executed trampoline-assisted aerial flips reaching three meters, performed what Unitree described as the world&#8217;s first single-leg continuous flips and seven-and-a-half-rotation airflare spins, and repositioned at four meters per second in coordinated cluster formation. Unitree&#8217;s partnership deal with the Gala was reportedly valued at roughly 100 million yuan. Wang Xingxing told state media: &#8220;In the past one or two months, I&#8217;ve personally been under quite a lot of pressure. We had to deliver a performance that was significantly better than last year&#8217;s.&#8221; When the Gala director called him to ask whether a robot&#8217;s fall during rehearsal had been a malfunction, Wang told her it was part of the routine. Drunken Fist requires a state of precarious balance. A robot that falls and then stands back up looks more convincingly drunk than one that performs cleanly. The fall was scripted.</p><p>What viewers saw was a body with no mind: a carefully choreographed sequence executed by hardware doing exactly what it was told. The performance was impressive because the hardware is impressive. The performance was also the ceiling of what Unitree&#8217;s current software can reliably deliver without extensive rehearsal and scripted choreography. The company&#8217;s bet, its IPO, and its position in China&#8217;s embodied intelligence strategy all depend on closing the gap between what the hardware can do and what the intelligence can direct it to do.</p><p>The hardware is real. The intelligence isn&#8217;t. Not yet.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/inside-chinas-machine">Inside China&#8217;s Machine</a>. China&#8217;s AI and robotics ecosystem, from the inside.</em></p><div><hr></div><p><strong>Sources</strong></p><p><strong>IPO prospectus and financial data:</strong> South China Morning Post (&#8221;Inside Unitree&#8217;s landmark IPO: what to know about China&#8217;s humanoid giant,&#8221; March 2026), CNBC (&#8221;Unitree plans Shanghai IPO, testing interest in humanoid robots,&#8221; March 20, 2026), KraneShares (&#8221;A Complete Guide To Unitree Robotics&#8217; 2026 IPO,&#8221; April 2026), 36Kr (&#8221;Half of the Investment Circle Owes Gratitude to Unitree,&#8221; March 2026), Caixin Global (&#8221;Unitree Robotics Files for $608 Million STAR Market IPO,&#8221; March 21, 2026). All financial figures (2025 revenue 1.708B yuan, 335% YoY growth, 600M yuan adjusted net profit, 674% adjusted net profit growth, 288M yuan headline net profit, 204% headline net profit growth, 32.4% global humanoid market share, 5,500 humanoid units 2025) are Confirmed from the Shanghai Stock Exchange prospectus. Historical financial trajectory (2022 net loss 22.1M yuan, 2023 net loss 11.1M yuan, 2024 net profit 94.5M yuan, 2025 9M net profit 105.3M yuan; gross margin trajectory 44.2% in 2023 to 56.4% in 2024 to 59.5% in 2025 9M) is from the prospectus financial summary on the Shanghai Stock Exchange filing portal. Early investor characterization of &#8220;profitable since 2020&#8221; comes from Zhao Nan via 36Kr (March 2025); this characterization conflicts with the formal prospectus disclosures for 2022-2023, and the prospectus is the primary source. The 40 to 50 billion yuan ($7B) IPO valuation target is per Reuters via CNBC.</p><p><strong>IPO regulatory inspection:</strong> China Securities Association announcement (April 1, 2026), RobotToday analysis (&#8221;Unitree&#8217;s IPO Review: What It Means for China&#8217;s Humanoid Robot IPO Landscape,&#8221; April 2026).</p><p><strong>April 30, 2026 R1 dual-arm launch and Wangfujing retail store:</strong> Humanoids Daily (&#8221;Unitree Expands R1 Lineup with Dual-Arm Modular Platform Starting at $4,290,&#8221; April 30, 2026), CNX Software (&#8221;$4,290+ Unitree R1-A5 and R1-A7 humanoid robots,&#8221; May 1, 2026), Interesting Engineering (&#8221;Unitree unveils $4,290 humanoid robot with upper-body-only design,&#8221; May 2026), 36Kr (&#8221;Unitree Unveils Upper-Body-Only Humanoid Robot,&#8221; May 2026), CnTechPost (April 30, 2026). The 26,900 yuan base price is consistent across all sources; the $4,290 USD conversion reflects the May 2026 official rate and is used by the majority of English-language coverage.</p><p><strong>TrendForce duopoly projection:</strong> TrendForce 2026 humanoid robot industry report, summarized in Jiemian News and 36Kr (April 2026).</p><p><strong>Shareholder structure and voting rights:</strong> SCMP IPO coverage, 36Kr. Wang Xingxing 23.82% direct + 10.94% indirect, 68.78% voting via special-voting-rights arrangement. Meituan 9.6%, HongShan 7.1%, Matrix Partners China 5.5%. Industrial and financial backers confirmed: Tencent, Alibaba, Ant Group, Xiaomi, ByteDance, BYD, Geely, Shunwei Capital, Shanghai and Beijing state-backed funds.</p><p><strong>Founder biography:</strong> SCMP, Wikipedia, CnTechPost, 36Kr, ChinaTalk (&#8221;Unitree Goes Public&#8221; by Irene Zhang, April 2026). Wang born 1990 Ningbo, Zhejiang Sci-Tech University undergrad, Shanghai University master&#8217;s (2013 XDog thesis), brief DJI employment, Unitree founded Hangzhou Binjiang District August 26, 2016. Co-founder Chen Li (Hikvision international sales background) confirmed from TechNode interview.</p><p><strong>Product pricing:</strong> Unitree official global shop (shop.unitree.com as of April 2026): R1 $4,900 presale, G1 $13,500, H2 $29,900, H1 $90,000+. G1 launched $16,000 in 2024 per The Robot Report. Go2 quadruped base configuration confirmed under $2,000.</p><p><strong>Customer data:</strong> Prospectus (top 5 customers 10.6% of revenue, 71 university orders, 115 winning bids, 39% overseas in first nine months 2025). JD.com largest corporate customer per ChinaTalk. China Mobile 124M yuan contract (Unitree portion 46.05M yuan) per 36Kr. Overseas exceeded 50% in earlier years per Chen Li TechNode interview, with the prospectus showing the more recent decline to 39% in 9M 2025.</p><p><strong>Spring Festival Gala technical claims:</strong> Wang interview with 36Kr and Cailian (February 2026), as reported by SCMP and CnTechPost. Three-meter-plus maximum backflip height, 4 m/s cluster speed, autonomous coordination via 3D LiDAR. Drunken Fist scripting confirmed by Wang directly. Performance autonomy claim (&#8221;fully autonomously&#8221;) from Unitree announcement.</p><p><strong>Technology assessment:</strong> Tech Buzz China (&#8221;Unitree Can Build the Body, Can It Build the Mind?&#8221; April 2026) for honest technology reading (teleoperation prevalence, 90-minute battery runtime, 95-99% factory uptime requirement gap). UnifoLM described in Unitree official materials and ChinaTalk analysis. R&amp;D expenditure on &#8220;Multimodal Embodied AI Model&#8221; growth trajectory per prospectus and ChinaTalk.</p><p><strong>US regulatory pressure:</strong> Wikipedia, US House Select Committee on China public statements (May 2025), ABI Research and security research publications (April 2025, September 2025). Entity List request confirmed; formal designation not yet enacted as of April 2026.</p><p><strong>Competitive context:</strong> Article 1 (China Humanoid Robotics Industry Landscape) and Article 2 (AgiBot) for AgiBot 5,168-unit comparison. Morgan Stanley 2026 Chinese humanoid shipment forecast (28,000 units) via eWeek. 2024 Unitree humanoid shipment figure (~1,400 units) via Kaiyuan Securities research report cited in 36Kr.</p><p><strong>Classification summary:</strong> Financial data from prospectus is Confirmed. Technology assessments are Estimated from third-party analysis (Tech Buzz China, ChinaTalk, SemiAnalysis). 2026 shipment targets (10,000-20,000 units) are Projected per Wang&#8217;s public statements. 2030 market forecasts are Projected from Morgan Stanley research.</p>]]></content:encoded></item><item><title><![CDATA[The Making of DeepSeek]]></title><description><![CDATA[Ten years of quant infrastructure built the foundation. R1 made the company global. V4 extended the pattern. The original refusals are starting to bend.]]></description><link>https://www.robonaissance.com/p/the-making-of-deepseek</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-making-of-deepseek</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Wed, 06 May 2026 18:41:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xynv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xynv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xynv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 424w, https://substackcdn.com/image/fetch/$s_!xynv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 848w, https://substackcdn.com/image/fetch/$s_!xynv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 1272w, https://substackcdn.com/image/fetch/$s_!xynv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xynv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2181642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196693446?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xynv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 424w, https://substackcdn.com/image/fetch/$s_!xynv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 848w, https://substackcdn.com/image/fetch/$s_!xynv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 1272w, https://substackcdn.com/image/fetch/$s_!xynv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefbe1b0f-1a35-43c3-ae52-6724f790e50c_1672x940.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Open the chat interface at chat.deepseek.com. Sign in with an email address. The account is free. No consumer subscription tier. The interface offers two modes: an Instant Mode powered by DeepSeek-V4-Flash and an Expert Mode powered by DeepSeek-V4-Pro, the lab&#8217;s frontier model at 1.6 trillion total parameters and 49 billion activated per token. The same Pro weights are downloadable from Hugging Face under MIT License. Anyone with the hardware can run the model the chatbot runs. The training paper is published. The lab that produced the model has roughly two hundred and seventy people on its research and engineering team and forty-eight in business and compliance. The founder has not given a major media interview in twenty-two months.</p><p>The lab operates from Hangzhou, in office space next to High-Flyer, the quantitative hedge fund whose founder is also the founder of DeepSeek. The author list on the V4 technical paper runs to two pages. Most of the names belong to fresh graduates from Peking University and Tsinghua University, or to researchers within a year or two of completing their degrees. Most have not worked anywhere else. There is no chief technology officer. There are no formal performance reviews. Each researcher can use the training cluster without approval; division of labor across people is determined by interest. The most consequential architectural innovation in DeepSeek&#8217;s first two years was multi-head latent attention, and it came from a single researcher&#8217;s idea rather than a top-down assignment. When V4 launched on April 24, 2026, the release consisted of a Hugging Face upload, a technical report PDF, and a routine update to the API documentation. There was no press conference, no demo application, no founder blog post. The model became available; the company did not stage a launch.</p><p>As of late April 2026, DeepSeek has just shipped its first major architectural release in fifteen months. V4-Pro and V4-Flash launched on April 24. The lab&#8217;s V4 paper introduces a hybrid attention mechanism that runs at twenty-seven percent of the compute of its predecessor at million-token context, alongside two additional architectural innovations the broader field has barely registered. Three days later, on April 27, the company filed an equity restructuring that raised registered capital by fifty percent and increased founder Liang Wenfeng&#8217;s direct stake from one percent to thirty-four percent. One week before, reports surfaced that DeepSeek had quietly opened its first external fundraising round at a ten-billion-dollar valuation, targeting roughly three hundred million dollars. By April 22, with Tencent and Alibaba in talks to participate, the valuation had risen above twenty billion. By early May, the Financial Times reported that the China Integrated Circuit Industry Investment Fund, China&#8217;s main state-backed semiconductor investment vehicle and known across the industry as the Big Fund, was in talks to lead the round at a valuation of approximately forty-five billion dollars. The round is structured to sell no more than three percent of equity. Chen Deli, the V4 paper&#8217;s lead author, posted publicly on X for the first time the day V4 launched. He was one of only a handful of DeepSeek personnel to do so.</p><p>The puzzle is the shape of the company that produced these numbers. DeepSeek is built like a research outfit, not a startup. It has no commercial product. It releases frontier-class models open-weight while every other frontier lab keeps its weights proprietary. It refuses external capital from its founding in July 2023 through April 2026, and the round it has just opened is structured deliberately to limit external influence. Its founder gave one major media interview in July 2024 and has not given another since. Its team is roughly an order of magnitude smaller than the team OpenAI uses to ship models at comparable scale. The company&#8217;s response to global attention after the January 2025 release of R1 was, by every visible measure, to keep doing what it had been doing.</p><p>The contrast with the rest of the frontier-AI field is sharp. In the same week of April 2026 that DeepSeek shipped V4 with a single Hugging Face upload, OpenAI staged a multi-day product event with embargoed press coverage, partner integrations, and a sequence of executive appearances. Two months earlier, Anthropic publicly accused DeepSeek of using thousands of fraudulent accounts to harvest training data from Claude. OpenAI followed in April with a more specific allegation: an industrial-scale distillation operation involving more than twenty-four thousand fake accounts and over sixteen million interactions. DeepSeek did not respond to either allegation publicly. The company&#8217;s posture toward press, toward capital, toward accusations, and toward competition has been consistent: absorb the input, do not change course.</p><p>DeepSeek&#8217;s shape was built by a decade of compounding forces, sustained by four commitments against the commercial-AI playbook, and is now being tested by three tensions starting to bend the original bet.</p><p>DeepSeek exists in the shape it does because of five compounding forces over the decade since its predecessor began trading on GPUs. The first is older than the company itself.</p><h2>How It Became</h2><p>The lineage starts with quant trading. In February 2016, Liang Wenfeng and two engineering classmates from Zhejiang University co-founded Ningbo High-Flyer Quantitative Investment Management Partnership. Liang was thirty-one. He had spent the years after graduate school in a cheap flat in Chengdu, applying machine learning to finance after earlier attempts in other fields had failed. By 2013 he was running Hangzhou Yakebi Investment Management. By 2015, Hangzhou Huanfang Technology. The High-Flyer founding was the third attempt at the same problem. On October 21, 2016, the firm began stock trading using a GPU-dependent deep learning model. By the end of 2017, most trading was AI-driven. By 2021, all of it was. In 2019, High-Flyer began constructing its first computing cluster, Fire-Flyer, at a cost of two hundred million yuan: 1,100 GPUs interconnected at 200 gigabits per second. In 2021, Fire-Flyer 2 began construction with a budget of one billion yuan. By 2022 it held five thousand A100 GPUs across six hundred and twenty-five nodes.</p><p>In a 36Kr interview in May 2023, Liang explained how the GPU stockpile arrived. &#8220;From the earliest single GPU, to 100 GPUs in 2015, 1,000 in 2019, finally 10,000, this happened gradually. But it was mainly driven by curiosity.&#8221; The 10,000 figure refers to Nvidia A100 GPUs, acquired before the United States imposed export restrictions on advanced AI chips to China. The acquisition was complete by the time DeepSeek incorporated. High-Flyer also built the surrounding infrastructure: a distributed parallel file system, an asynchronous communication library that replaced parts of Nvidia&#8217;s NCCL, a custom neural network operator library, and a distributed training framework. None of this was developed for AI commercialization. It was developed for trading. By the time the AI lab spun out, the compute substrate already existed, fully owned, with no commercial revenue obligation attached. The bottom-up research culture had also already developed: a quant fund without portfolio managers, just servers, and a small team of researchers hired for ability and curiosity rather than experience. On April 14, 2023, High-Flyer announced an artificial general intelligence research lab. On July 17, 2023, the lab spun out as DeepSeek, with High-Flyer as principal investor and backer. Venture capital firms were approached and passed; they considered the company unlikely to generate an exit on a venture timeline.</p><p>Compute scarcity shaped what the team built next. By 2023, frontier large language models followed a recipe of more compute, more data, and more parameters. DeepSeek did not have access to that recipe at full scale. Its compute budget was a fraction of the budget OpenAI, Google, and Anthropic could marshal. Liang&#8217;s quant-discipline cost calculations made the standard scaling approach economically unsupportable even with High-Flyer&#8217;s profits. The team&#8217;s response was to invent architectural alternatives that delivered comparable capability at a fraction of the compute. The pattern began with multi-head latent attention, introduced in DeepSeek-V2 in May 2024. The innovation, Liang said in a July 2024 interview, came from a single young researcher&#8217;s personal interest. &#8220;After summarizing the mainstream evolution patterns of attention architectures, he came up with the idea of designing an alternative. From idea to implementation was a long process. We formed a team and it took several months to make it work.&#8221; The same model used a mixture-of-experts variant with shared experts, always activated, alongside routed experts conditionally activated. DeepSeek-Math, released a month earlier, introduced Group Relative Policy Optimization, a variant of Proximal Policy Optimization that became core to subsequent reinforcement-learning training across V2, V3, R1, and V4. DeepSeek-V3, released in December 2024 with 671 billion total parameters and 37 billion activated, completed its final pretraining run on 2,048 H800 GPUs at a total cost of approximately five and a half million dollars. The figure was widely cited and widely misunderstood. It covers the final training run, not the cumulative R&amp;D cost. But the underlying claim is real: the team had built a frontier-comparable model at roughly one-tenth the compute of comparable Western models. The V4 paper, released April 24, 2026, extends the pattern. Its title, &#8220;Towards Highly Efficient Million-Token Context Intelligence,&#8221; names the architectural orientation explicitly. V4-Pro at one-million-token context requires twenty-seven percent of the single-token inference compute and ten percent of the KV cache size of V3.2. The architectural vocabulary expanded: Compressed Sparse Attention combined with Heavily Compressed Attention, Manifold-Constrained Hyper-Connections strengthening residual signal propagation, and a Muon optimizer for training stability. The lab has built its identity around innovations that compute-rich competitors did not need to invent.</p><p>Open weights came next, and stayed. From V1 forward, DeepSeek has released every major model with downloadable weights. DeepSeek Coder in November 2023 was source-available with restrictions; DeepSeek-LLM later that month moved closer to permissive. From V2 in May 2024 onward, every release is MIT-licensed. V3 in December 2024, R1 in January 2025, V3.1 in August 2025, V3.2 in September and December 2025, V4 in April 2026. The pattern is unusual at frontier scale. OpenAI released GPT-2 weights and never released a frontier model openly again. Anthropic was never open. Google DeepMind keeps frontier models closed. Meta is open as competitive strategy against the closed labs. DeepSeek treats openness as research contribution, not as competitive maneuvering. In the July 2024 interview, Liang explained why. &#8220;In the face of disruptive technology, the moat formed by closed source is short-lived. Even OpenAI being closed cannot stop others from catching up. So we deposit value in the team. Our colleagues grow in this process, accumulate know-how, form an organization and culture that can innovate, and that is our moat. Open-sourcing and publishing papers, we actually lose nothing. For technical people, being followed is a great achievement. Open source is more of a cultural act than a commercial one. Giving is itself an additional honor.&#8221; The downstream effect was an ecosystem. Within weeks of R1&#8217;s release, hundreds of teams were running, distilling, and fine-tuning the model. DeepSeek-R1-Distill models seeded from Llama and Qwen base models, fine-tuned on synthetic data generated by R1, multiplied through the open-source community. By V4 launch in April 2026, the agent-layer infrastructure that runs on DeepSeek weights was substantial: Claude Code integrates, OpenClaw integrates, OpenCode integrates. The ecosystem became the company&#8217;s actual long-term moat. The open-weights posture was the cause.</p><p>Then came January 20, 2025. DeepSeek released R1 at the same time it launched the chat.deepseek.com chatbot for iOS and Android. Within seven days, the chatbot had passed ChatGPT as the number one free application on the United States iOS App Store. Nvidia&#8217;s stock dropped eighteen percent in a single day, eliminating six hundred billion dollars of market capitalization, the largest single-day loss in the company&#8217;s history. Over a trillion dollars of technology market value evaporated in the same week. Marc Andreessen called the release artificial intelligence&#8217;s Sputnik moment. Sam Altman, asked about the model the same week, called the competition invigorating and said OpenAI would speed up its release cadence in response. The Center for Strategic and International Studies, in a contemporaneous analysis, challenged the Sputnik framing on the grounds that Chinese AI labs still depended on United States hardware. The Chinese Academy of Sciences, in its end-of-year retrospective, characterized the release as a clear example of Chinese artificial intelligence development surpassing OpenAI. Both characterizations got something right. The technical achievement was real and was real in the way the Sputnik framing implies: a country whose AI capability had been treated as derivative produced a frontier-comparable model that disrupted Western market assumptions. The hardware-dependence caveat was also real: R1 was trained on H800 GPUs that DeepSeek had acquired through legitimate channels, and the broader Chinese AI infrastructure stack still leaned heavily on Nvidia. What changed for DeepSeek the company in the days after R1 was the environment, not the company itself. International recognition arrived. Commercial pressure followed. Recruitment competition intensified. Geopolitical scrutiny rose. On the day of R1&#8217;s release, Liang attended a symposium hosted by Premier Li Qiang in Beijing, where he was asked to provide opinions on a draft of the 2024 government work report. On February 17, 2025, he attended a symposium hosted by General Secretary Xi Jinping at the Great Hall of the People, alongside Ren Zhengfei of Huawei, Jack Ma of Alibaba, and other heads of top Chinese private-sector companies. The two appearances elevated Liang and DeepSeek into nationally significant standing. They also made every subsequent decision a more public one.</p><p>The pressure built through 2025. Core researchers departed for better-funded competitors. Luo Fuli left for Xiaomi. Wang Bingxuan left for Tencent on some accounts and ByteDance on others. Wei Haoran and Ruan Chong left. In early 2026, Guo Daya joined ByteDance&#8217;s seed team. The 36Kr industry framing was that DeepSeek researchers, watching peers depart for compensation packages they had not been offered, began wondering why not. R2, the planned successor to R1, was delayed multiple times through 2025. Reports cited chip stability issues with Huawei Ascend training and slow data labeling. Liang reportedly was not satisfied with R2&#8217;s performance. Through late 2025 and into 2026, domestic competitors narrowed the technical gap within China substantially. Moonshot&#8217;s Kimi K2.6 reached one trillion parameters. Zhipu&#8217;s GLM 5.1 reached 754 billion. Alibaba&#8217;s Qwen and ByteDance&#8217;s Doubao both shipped competitive iterations. The Council on Foreign Relations, analyzing V4 in April 2026, concluded that DeepSeek V4 was likely the strongest Chinese model but only narrowly. April 2026 brought a cluster of changes that were difficult to read except as the original posture beginning to bend. April 17 saw the first reports of fundraising at a ten-billion-dollar valuation. April 24 saw V4&#8217;s launch, alongside hiring posts for agent research, supercomputing operations, a recruitment manager, comprehensive services HR, a corporate culture director, and an accountant. April 27 saw the equity restructuring that gave Liang a thirty-four percent direct stake, exceeding the one-third threshold required to block special resolutions under Chinese company law. April 29 saw the launch of an image-recognition gray test, signaling multi-modality. By early May, the Big Fund was in talks to lead the round at a forty-five billion dollar valuation. The stated purpose, per 36Kr Investment World, was not cash needs but the establishment of a market valuation benchmark for employee stock options, amid talent-poaching pressure from rivals. The company that had refused external capital for nearly two years was now opening itself, deliberately and at limited scale, to outside investment.</p><p>The five forces compounded. Quant lineage gave compute and freedom from commercial pressure. Architecture-over-scale conviction turned constraints into innovations. Open weights stabilized the company&#8217;s identity around a developer ecosystem. R1 changed the environment from quiet research lab to global stakes. The 2026 commercial-pressure shift is now bending the original posture in ways that the V4 launch, the equity restructuring, and the first-ever fundraising round all signal at once. The shape DeepSeek has now is the shape these forces produced.</p><p>That shape has a specific architecture, a specific operating model, a specific posture toward the field. To understand what DeepSeek is becoming, examine what it has committed to in the way it is built. Each consequential design choice is a commitment to a way of operating that the rest of the field has not made.</p><h2>The Commitments</h2><h3>Open weights ship at frontier scale</h3><p>Every major model since DeepSeek-V2 in May 2024 has shipped under MIT License with downloadable weights and an accompanying technical paper. As of late April 2026, V4-Pro at 1.6 trillion total parameters and V4-Flash at 284 billion are both downloadable from Hugging Face. The default in commercial AI labs cuts the other way. OpenAI keeps weights closed at the frontier. Anthropic was never open. Google DeepMind keeps frontier closed. The weights, in the closed-lab framing, are the asset.</p><p>DeepSeek&#8217;s bet is that the asset is the ecosystem the weights produce, not the weights themselves. R1 was distilled and fine-tuned by hundreds of teams within weeks of release. The R1-Distill seeds derived from Llama and Qwen base models multiplied through the community. By V4 launch, the open agent-layer infrastructure integrated with DeepSeek weights at release-day level: Claude Code, OpenClaw, OpenCode. The downstream ecosystem became the long-term moat. The intellectual move is to treat frontier capability as a property of the field rather than of an organization. Liang&#8217;s framing in July 2024 is consistent with this read: the moat formed by closed source is short-lived, and the lab&#8217;s actual contribution is the rate at which the field advances.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wTeC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wTeC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 424w, https://substackcdn.com/image/fetch/$s_!wTeC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 848w, https://substackcdn.com/image/fetch/$s_!wTeC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 1272w, https://substackcdn.com/image/fetch/$s_!wTeC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wTeC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png" width="1400" height="1203" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1203,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6748956,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196693446?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!wTeC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 424w, https://substackcdn.com/image/fetch/$s_!wTeC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 848w, https://substackcdn.com/image/fetch/$s_!wTeC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 1272w, https://substackcdn.com/image/fetch/$s_!wTeC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F076c7b9b-26e7-4970-9c73-9e4a203d5a45_1400x1203.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Release is a research output, not a product</h3><p>V1 through V4 have shipped as research papers plus weights, not as products. No marketing campaign accompanies a release. No CEO blog post. V4 launched on April 24, 2026 with a Hugging Face upload, a technical report, and a routine update to the API documentation. Liang has not given a press conference for any release. The OpenAI and Anthropic model stages each major release with embargoed coverage, partner announcements, demo applications, and a CEO blog post.</p><p>Research-output release gates the company&#8217;s energy on research rather than launch. R2 was delayed multiple times in 2025 because Liang was not satisfied with the model&#8217;s performance; no launch deadline overrode the research bar. V4 took sixteen months after R1 because the architectural breakthroughs required to ship needed to be done first. Every quarter the team can ship if research is ready and not ship if research is not ready. Release cadence becomes a research-strategy variable, not a market-strategy variable. Most AI companies have these inverted.</p><h3>The team is small by commitment</h3><p>Roughly two hundred and seventy R&amp;D and engineering staff as of late April 2026, plus forty-eight in business and compliance. Three hundred and eighteen people total. Most are fresh graduates or hires with one to two years of post-degree experience, recruited heavily from Peking University and Tsinghua University. Division of labor is bottom-up and project-based. Liang, in the July 2024 interview, said: &#8220;We don&#8217;t have KPIs, and there are no so-called tasks.&#8221; Each researcher can call on the training cluster&#8217;s GPUs without approval; division of labor across people is determined by interest. The multi-head latent attention architecture that became core to V2 and V3 came from a single young researcher&#8217;s personal interest, not a top-down assignment.</p><p>The default at frontier AI is the opposite. OpenAI runs roughly three thousand employees. Anthropic runs roughly fifteen hundred. Google DeepMind, six thousand or more. Meta AI is in the thousands. The assumption in the rest of the field is that frontier AI requires a large organization. DeepSeek&#8217;s commitment is that small team plus high talent density plus bottom-up culture is enough, and is in some respects better. The contrarian architectural bets get pursued without committee review or roadmap defense. The cost is that key researchers are hard to replace when they leave. The 2025 talent attrition tested this; the V4 paper acknowledgments show R&amp;D turnover under four percent during the V4 development cycle, with ten of two hundred and seventy people departing. OpenAI lost more than a quarter of its key research talents to competitors over the same two-year window. The big-lab structure is not necessary. In some respects it is a constraint.</p><h3>Commercialization has been refused</h3><p>Through April 2026, no commercial AI product. No enterprise contracts disclosed. The chat.deepseek.com interface remains free with no subscription tier. Revenue derives exclusively from API usage at deliberately low prices. The V2 pricing in May 2024 triggered a Chinese AI price war that DeepSeek had not intended; in Liang&#8217;s account, the pricing was cost-plus, not strategic. The commercial AI lab funding model runs on a different cycle: burn rate sustained by venture capital fundraising, scaled by enterprise contracts, monetized through closed proprietary products. Every Western frontier lab follows some version of this.</p><p>Refusing commercialization let DeepSeek concentrate on research output and architecture innovation. No customer success teams. No enterprise sales team. No product roadmap defense to existing customers. Through 2024 and 2025, the operation was sustained by High-Flyer&#8217;s quant fund profits. The intellectual move is to treat a research lab as a different kind of organization than a commercial AI company. Most labs in 2026 are commercial AI companies that produce research as a byproduct. DeepSeek inverted the relationship.</p><p>This commitment is the one currently bending. The April 2026 fundraising round, the equity restructuring, the new HR and finance roles posted on V4 launch day, the trajectory from a ten-billion-dollar opening valuation to a Big-Fund-led discussion at forty-five billion in two weeks, all suggest the refusal is being qualified rather than maintained absolutely. The first external capital, when it lands, will be the first capital DeepSeek has accepted in nearly two years. The structure of that capital is constrained: no more than three percent of equity, founder-controlled veto, valuation-establishing rather than cash-need. The company has not announced a pivot to commercialization. It has begun preparing for one.</p><p>These commitments, together, define DeepSeek&#8217;s bet. Each has been vindicated by what the company has shipped over the past two years. Each is also a constraint. Three tensions shape what those commitments are becoming.</p><h2>What&#8217;s Becoming</h2><p>The chip ladder is the most external of the tensions, and the most outside DeepSeek&#8217;s control. Compute access has tightened over time. The earliest acquisition was 10,000 Nvidia A100s, completed before United States export restrictions. V3&#8217;s pretraining used 2,048 H800 GPUs, the export-compliant variant of the H100. The Chinese cloud providers serving R1 used the H20, the further-restricted variant. The V4 paper does not disclose training hardware, in a notable departure from the V3 paper&#8217;s disclosure. United States government officials have alleged that V4 was trained on Nvidia Blackwell chips smuggled despite export bans, per Reuters reporting in February 2026, follow-up reporting by The Information in April 2026, and Council on Foreign Relations analysis at the V4 release. DeepSeek has not publicly responded to the allegation; Nvidia has called the smuggling claims &#8220;farfetched.&#8221; Inference for V4, by the lab&#8217;s own announcement, has been optimized for Huawei Ascend. On V4 launch day, eight Chinese chip companies completed inference adaptation simultaneously: Huawei Ascend, Cambricon, Hygon Information, Moore Threads, Muxi, Kunlunxin, T-Head Zhenwu, and Daysci. The Cyberspace Administration of China has separately requested that large Chinese corporations stop buying the Nvidia H20 and switch to domestic suppliers. Access keeps tightening; domestic chip viability remains uncertain. Whether DeepSeek can sustain frontier model development on domestic chip alternatives, or whether continued reliance on Nvidia is essential, is the central undecided. The pace of United States and Chinese technology decoupling is the largest variable in DeepSeek&#8217;s next two years.</p><p>Capital is the next tension, and the most internal. The April 2026 fundraising round is the first external capital DeepSeek has accepted. The structure of the round signals defense rather than concession. Liang took a thirty-four percent direct stake before letting any external investor in. Under Chinese company law, a stake above one-third blocks special resolutions; Liang&#8217;s direct holding now exceeds this threshold and gives him veto power over major decisions. The round is structured to sell no more than three percent of equity, capping any single investor&#8217;s influence. The composition of investors has shifted in real time. Initial reports in mid-April had Tencent and Alibaba in talks at a ten-billion-dollar valuation. By April 22, the valuation had moved above twenty billion as investor demand rose. By early May, the Financial Times reported that China&#8217;s state-backed semiconductor fund, the Big Fund, was in talks to lead the round at approximately forty-five billion dollars. The Big Fund&#8217;s involvement specifically changes the strategic shape of the round. Where Tencent and Alibaba would have brought commercial-strategic capital, the Big Fund is the central state vehicle through which Beijing has financed China&#8217;s semiconductor industry, and its leadership of the round signals that DeepSeek is now treated as strategically important rather than as a normal venture investment. The historical pattern offers a warning. Every prior open-weights frontier lab eventually closed its weights. OpenAI is the canonical case, having begun open and become closed. The structure of DeepSeek&#8217;s first capital raise, with its small percentage and founder-controlled veto, is a deliberate hedge against the trajectory that closed every prior open lab. Whether the hedge holds, and whether it holds against state-backed leadership specifically, is undecided.</p><p>The competition is closer to home than it used to be. At R1&#8217;s launch in January 2025, DeepSeek had no real Chinese competitor at frontier scale. Sixteen months later the gap inside China has narrowed. Moonshot&#8217;s Kimi K2.6 at one trillion parameters, Zhipu&#8217;s GLM 5.1 at 754 billion, Alibaba&#8217;s Qwen series, and ByteDance&#8217;s Doubao all compete on most public benchmarks. V4-Pro Max leads on coding benchmarks, with a LiveCodeBench score of 93.5 and a Codeforces rating of 3206. It trails on knowledge-breadth benchmarks, where Gemini-3.1-Pro consistently leads. The V4 paper itself frames the gap to frontier as approximately three to six months. The competitive economics inside China have shifted further. Better-funded competitors at Alibaba and ByteDance have orders of magnitude more capital and can compete at scale. DeepSeek&#8217;s architectural-innovation moat depends on staying ahead at innovation, not at scale. The fundraising round is in part an acknowledgment of this constraint: capital for retention, R&amp;D infrastructure, and employee stock-option valuation, rather than for product or marketing. Whether the architectural lead holds over the next year is the question the round is trying to answer.</p><p>DeepSeek made a research-shaped bet at frontier scale. The bet survived its own success. Whether it survives the response to that success is the question the next twelve months answer.</p><div><hr></div><p><em>This is the second article in <strong><a href="https://www.robonaissance.com/t/the-making-of">The Making of</a></strong>, a Robonaissance series exploring how AI and robotics systems came to be what they are, and what they are still becoming.</em></p>]]></content:encoded></item><item><title><![CDATA[The Making of OpenClaw]]></title><description><![CDATA[Thirteen years in Vienna shaped the instincts. Three years of burnout proved the cost. A weekend hack passed Linux in three months. The architecture won. The questions about who runs what just arrived]]></description><link>https://www.robonaissance.com/p/the-making-of-openclaw</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-making-of-openclaw</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Tue, 05 May 2026 11:32:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UBLh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UBLh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UBLh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 424w, https://substackcdn.com/image/fetch/$s_!UBLh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 848w, https://substackcdn.com/image/fetch/$s_!UBLh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 1272w, https://substackcdn.com/image/fetch/$s_!UBLh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UBLh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2236243,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196528741?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UBLh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 424w, https://substackcdn.com/image/fetch/$s_!UBLh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 848w, https://substackcdn.com/image/fetch/$s_!UBLh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 1272w, https://substackcdn.com/image/fetch/$s_!UBLh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d91763-ab1c-4dab-83c8-dcd5e907ebe9_1673x940.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Open the README of one of the fastest-growing personal-AI repositories on GitHub. The first install command is <code>npm install -g openclaw@latest</code>. The runtime recommends Node 24. The default Gateway port is 18789, bound to localhost. There is no signup, no API key for the platform itself, no vendor account. The whole product is a process that runs on the user&#8217;s machine. The agent&#8217;s identity lives in three plain-text files called <code>AGENTS.md</code>, <code>SOUL.md</code>, and <code>TOOLS.md</code>, sitting in a folder at <code>~/.openclaw/workspace</code>. If something breaks, you can diff your way back to a working version. If you want to migrate the agent to another machine, you copy a folder. The whole agent is something you could attach to an email.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AEFb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AEFb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 424w, https://substackcdn.com/image/fetch/$s_!AEFb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 848w, https://substackcdn.com/image/fetch/$s_!AEFb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 1272w, https://substackcdn.com/image/fetch/$s_!AEFb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AEFb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png" width="1400" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7767443-94d2-4985-af81-33c035d41dd2_1400x760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4263707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196528741?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AEFb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 424w, https://substackcdn.com/image/fetch/$s_!AEFb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 848w, https://substackcdn.com/image/fetch/$s_!AEFb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 1272w, https://substackcdn.com/image/fetch/$s_!AEFb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7767443-94d2-4985-af81-33c035d41dd2_1400x760.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As of late April 2026, OpenClaw has roughly 366,000 stars on GitHub, more than 75,000 forks, around 38,000 commits to its main branch, and a contributor count above 1,200. In early March 2026 it surpassed Linux on the GitHub stars leaderboard. In one analysis, React took thirteen years to accumulate the kind of star count OpenClaw reached in roughly one hundred days. The repository is governed by an independent foundation. Sam Altman&#8217;s company OpenAI sponsors it. Sam Altman does not control it. The author has stepped back from day-to-day governance.</p><p>The author is Peter Steinberger, an Austrian developer who spent thirteen years before OpenClaw building a PDF framework called PSPDFKit, sold it after a nine-figure deal with Insight Partners, and then spent three years unable to write a line of code. He has described the period publicly. He booked a one-way ticket to Madrid. He tried therapy and ayahuasca. He used the analogy of Austin Powers having his mojo extracted and confessed, on a recent Lex Fridman interview, that he could not get code out anymore and was just staring and feeling empty. He came back to building in late 2024 because Claude Code crossed a paradigm shift while he was away, and code-writing started to feel less like grinding and more like playing a computer game. He shipped forty-something AI-related side projects through 2025. The forty-third of them was a weekend hack he called WhatsApp Relay. It hit nine thousand GitHub stars in twenty-four hours. Within three months it would be renamed twice, sit at the top of the GitHub stars leaderboard, and be moved into a non-profit foundation while its author left to lead personal agents at OpenAI.</p><p>What is non-obvious is the shape of the platform that produced these numbers. Most AI agent products in 2025 and 2026 chose vertical integration. Anthropic ships Claude with its own agentic products. Google ships Gemini across its product portfolio. Meta ships agents inside its messaging products. OpenAI itself, where Steinberger now works, is building closed personal-agent products on its own infrastructure. The dominant pattern is owned: own the model, own the memory, own the tools, own the UI, own the customer. OpenClaw refused all five. The model is whatever you point it at. The memory is a folder of markdown files on your hard drive. The tools live in another folder. The UI is whichever messaging app you already use, from WhatsApp to Discord to iMessage to WeChat. The customer is whoever installs the npm package. Each refusal looks, from the outside, like a strategic concession. Each was made by a developer building for himself, with the conscious intention to give away the parts a startup would normally keep.</p><p>The convictions did not arrive when Steinberger sat down at a terminal in November 2025. They were already in place before OpenClaw existed. The choices that produced its current shape were made over thirteen years of building software, three years of being unable to build anything, twelve months of rebuilding the habit through small AI projects, one hour of typing a prototype that worked, and three months of public frenzy that compounded the project into something its author could no longer fully control.</p><div><hr></div><h2>How It Became</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TneZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TneZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 424w, https://substackcdn.com/image/fetch/$s_!TneZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 848w, https://substackcdn.com/image/fetch/$s_!TneZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 1272w, https://substackcdn.com/image/fetch/$s_!TneZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TneZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png" width="1400" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4263707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196528741?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TneZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 424w, https://substackcdn.com/image/fetch/$s_!TneZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 848w, https://substackcdn.com/image/fetch/$s_!TneZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 1272w, https://substackcdn.com/image/fetch/$s_!TneZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2abbb7cd-56af-4f07-be9d-1eae6b2d0d16_1400x760.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OpenClaw exists in the shape it does because of five compounding forces over fourteen years. The first is older than the company itself.</p><p>Steinberger started PSPDFKit in 2011, while waiting six months for a U.S. work visa to take a job offer made at a WWDC after-party. The visa took longer than expected. He filled the time with a side project for a friend who needed reliable PDF rendering on iPads. PDF turned out to be the kind of format that punished casual implementations: thousands of pages of specification, edge cases everywhere, performance requirements that broke most attempts. Steinberger built a solution that worked. When his manager noticed his health declining from running both the day job and the side project, he was given a week to choose. He chose his own company, which meant leaving the United States within a week because his visa was tied to his employer. He returned to Vienna and co-founded PSPDFKit with Martin Sch&#252;rrer.</p><p>For the next thirteen years, PSPDFKit grew without external funding into a Vienna-based software company whose code ran inside applications used by Apple, Dropbox, SAP, Volkswagen, DocuSign, and Autodesk. Across all of those applications, the framework was running on customers&#8217; devices, inside the customers&#8217; apps, under the customers&#8217; control. Steinberger was building the kind of infrastructure that gets installed once and run inside other people&#8217;s products for years. The instinct that produced PSPDFKit was the same instinct that would later produce OpenClaw: build infrastructure customers control on their own machines, not services they rent from someone else&#8217;s cloud. In October 2021, Insight Partners closed an investment of approximately one hundred and sixteen million dollars in PSPDFKit. Steinberger and Sch&#252;rrer stepped down from full-time roles. By his account, after thirteen years of working most weekends, he had given the company the kind of attention that left him completely broken.</p><p>What followed was three years that nearly produced a different ending. Steinberger has described the period publicly. The exit, by every external metric, was a triumph; internally, it left him hollow. He used the analogy of Austin Powers having his mojo extracted, in a Lex Fridman interview that has become a touchstone for founder-burnout discussions. He could not write code. He sat at the computer staring. He booked a one-way ticket to Madrid and tried to make up for thirteen years of life he had not lived. He moved cities. He attended parties. He did therapy. He tried ayahuasca. He has written, on his blog, that he eventually understood the truth most post-exit founders learn: you cannot find purpose by relocating. You have to create it. None of the rest of OpenClaw exists if Steinberger does not come back from this period. The recovery was not pre-determined. Founders who experience this kind of post-exit emptiness sometimes never come back to building. The Madrid retreat was open-ended. He could have stayed retired.</p><p>He came back, late in 2024, because of Claude Code. The recognition was specific. He sat down at a terminal for the first time in a long time, prompted Claude Code to do something, and watched the system do it. The bottleneck, he realized, had moved. The repetitive plumbing that had drained him for thirteen years could now be delegated. The work he had loved in the early PSPDFKit years, structuring problems and orchestrating systems, was now what mattered. Building software felt like playing a computer game again. In November 2024, he tweeted that he was back. What followed was twelve months of sustained shipping. Forty-something AI-related projects went onto GitHub: Peekaboo for screenshot automation, VibeTunnel for browser-to-terminal bridging, Brabble as a wake-word voice daemon, gogcli as a Google services CLI, Poltergeist as a build-watcher. Most of them were small. None of them broke through. They were the price of admission to find the one that would.</p><p>The forty-third was OpenClaw. By November 2025, Steinberger had been thinking on and off for more than eighteen months about a personal AI assistant. He had first conceived the idea in April 2024 and shelved it because he assumed major companies would inevitably ship something equivalent. By late 2025 he realized the major companies had not. Apple Intelligence had shipped without anything an Apple user would recognize as a personal agent. Google Gemini was integrated across Google&#8217;s products but not into the messaging apps people actually used. ChatGPT was a chat interface, not an assistant that did things across an existing digital life. The gap remained open. He sat down at a computer one weekend, prompted Claude Code to glue WhatsApp to an LLM, and had a working prototype in roughly an hour. He uploaded it to GitHub under the name WhatsApp Relay. Within twenty-four hours, the repository had nine thousand stars. The choices baked into that hour, including the localhost binding, the markdown-based agent definition, the channel-agnostic architecture, the heartbeat loop that lets the agent alert users on its own clock, were the choices a developer building for himself would make. Steinberger did not survey a market. He scratched his own itch with the tools he wanted to use, on the channels he was already using, with his data on his own machine. The platform survived its author&#8217;s needs into broader adoption because the needs turned out to be broader than he realized.</p><p>The community took it from there. Within weeks the renames began. Steinberger had originally called the project Clawdbot, a play on Claude with the lobster monster mascot users see when reloading Claude Code. Anthropic&#8217;s legal team raised trademark concerns. A late-night Discord brainstorm produced Moltbot, a reference to lobsters molting their shells to grow. During the panic of renaming his GitHub account, automated bots sniped his old handle within minutes; cryptocurrency scammers used the freshly-released handle to promote fraudulent tokens within ten seconds. Three days later he renamed the project a third time, to OpenClaw, after checking with Sam Altman that the name would not conflict with OpenAI branding. The same day Moltbot rebranded as OpenClaw, an entrepreneur named Matt Schlicht launched Moltbook, a social network where AI agents could create profiles, comment on each other, and argue. The network was not Steinberger&#8217;s. It used OpenClaw as an enabling platform. The combination, agent-platform plus agent-social-network plus visible chaos, produced more attention than any of the three individually would have.</p><p>In February 2026, a team led by Irvin Steve Cardenas at Kent State University&#8217;s Advanced Telerobotics Research Lab won the SF OpenClaw Hackathon by building a bridge between OpenClaw and ROS 2, the Robot Operating System used as middleware for industrial and academic robotics. The project was called ROSClaw. Cardenas&#8217;s tweet announcing the win included the line &#8220;Agents escaped the screen!&#8221; and reached more than two hundred thousand views in the days after. The arXiv paper that followed documented the team deploying ROSClaw on three robot platforms, a wheeled robot, a quadruped, and a humanoid, with four different foundation-model backends. The paper required new integration layers OpenClaw had not been designed for: dynamic capability discovery, pre-execution action validation, structured audit logging. Robotics had not been on Steinberger&#8217;s roadmap. Neither had social networks for AI agents. Neither had Tencent and Z.ai shipping OpenClaw-based services into the Chinese consumer market. The current shape of OpenClaw is partly the result of decisions Steinberger never made. He provided a platform with the right shape; the world built what it wanted to build on top of it.</p><p>The last force was the institutional one. On February 14, 2026, three months after WhatsApp Relay shipped, Steinberger published a blog post announcing he was joining OpenAI to lead personal agents. OpenClaw, he wrote, would move into a non-profit foundation that OpenAI would sponsor financially while leaving model-agnostic and community-governed. Sam Altman confirmed on X the next day. Steinberger had spent the preceding week in San Francisco talking with the major AI labs; Meta and Anthropic were both reported to be courting him. He chose OpenAI because he had already spent thirteen years building a company and was not interested in doing it again, and because he believed OpenAI&#8217;s infrastructure was the fastest path to building the kind of agent his mother could use. The choice was not pre-determined. He could have accepted venture capital and built OpenClaw as a startup; he was financially free to choose. The foundation-governance shape of the current project, sponsored but independent, exists because he chose it over the closed-commercial alternative.</p><p>What the five forces produced together is a project shaped by five compounding choices made by Steinberger and by the world around him. The PSPDFKit lineage gave him architectural instincts he could not unlearn. The burnout-and-recovery gave him three years to be unable to act on the instincts, then Claude Code as a way to act fast when he came back. The personal-need origin gave him a target shaped by his own use rather than by a market. The community took the project in directions he never planned. The OpenAI move gave it institutional backing that could either preserve or compromise its independence. None of these forces alone produced OpenClaw. None was inevitable. Together they produced what OpenClaw is now.</p><div><hr></div><h2>The Commitments</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!24Ui!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!24Ui!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 424w, https://substackcdn.com/image/fetch/$s_!24Ui!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 848w, https://substackcdn.com/image/fetch/$s_!24Ui!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 1272w, https://substackcdn.com/image/fetch/$s_!24Ui!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!24Ui!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png" width="1400" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5049119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196528741?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!24Ui!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 424w, https://substackcdn.com/image/fetch/$s_!24Ui!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 848w, https://substackcdn.com/image/fetch/$s_!24Ui!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 1272w, https://substackcdn.com/image/fetch/$s_!24Ui!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaca7dce-37d9-4a65-8646-ee75cb2c9ab3_1400x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The forces produced a system. The system has a specific architecture, embodied in a set of architectural commitments. Each consequential design choice is a commitment to a way of building that the rest of the field has not made.</p><h3>The agent&#8217;s identity is plain text</h3><p>What it is. Three files sit in <code>~/.openclaw/workspace</code>: <code>AGENTS.md</code>, <code>SOUL.md</code>, and <code>TOOLS.md</code>. Each has YAML frontmatter for metadata and Markdown for content. At startup, the daemon reads these files. The agent&#8217;s identity, personality, and toolkit are loaded from the file system.</p><p>What it refuses. Most personal-agent platforms store agent identity in databases the platform owns, or in proprietary serialization formats users do not see. Anthropic&#8217;s Claude Projects, OpenAI&#8217;s Custom GPTs, and most agent frameworks use opaque state users cannot directly inspect or edit.</p><p>Why this works. Plain text is diffable. Diffable means version-controllable in git. The agent&#8217;s identity then has the same operational properties as code: forkable, mergeable, attributable, recoverable. SOUL.md as plain text means a non-developer can read the file and see the agent&#8217;s values. The agent&#8217;s behavior is not opaque; the parameters that determine it are visible. When something goes wrong, the user reads the file and sees why. When something needs to change, the user edits the file and reloads.</p><p>The intellectual move. Treating agent identity as code-with-history is the same move git made for source code in 2005. Git did not win because it had better merge algorithms. Git won because plain-text history made software collaboration possible at scale. OpenClaw is making the equivalent bet for agent identity: the system that wins will be the one whose state is plain-text, diffable, and version-controllable, not the one with the slickest serialization.</p><h3>The channel is whatever the user already uses</h3><p>What it is. The Gateway accepts messages from twenty-four channel adapters: WhatsApp, Slack, Discord, iMessage, Telegram, Signal, Microsoft Teams, Matrix, WeChat, IRC, and others. Each adapter normalizes its native message format into a common envelope and passes it to the agent. The agent never sees which channel a message came from.</p><p>What it refuses. Native UI. Most personal-agent products ship a chat interface and tell users to use it. ChatGPT lives at chat.openai.com. Claude lives at claude.ai. The default product shape is &#8220;agent in our UI.&#8221;</p><p>Why this works. Users do not live in chat.openai.com. They live in WhatsApp groups with family, Slack channels with colleagues, iMessage threads with friends, Discord servers with hobbyist communities. Asking users to leave the apps they already use compounds into churn. Channel adapters meet users where they are. A new channel is supported by writing an adapter; the agent does not need to be retrained or redeployed.</p><p>The intellectual move. The medium is part of the agent. An agent in your family WhatsApp group is a different agent than the same model running in chat.openai.com, because the social position of the channel shapes what the agent is being asked to do. Channel-agnosticism strips the channel from the agent&#8217;s awareness, making the social position emerge from the user&#8217;s choice of where to install the agent rather than from the platform&#8217;s choice of where to host it.</p><h3>The proactivity is configurable</h3><p>What it is. Every thirty minutes by default, with the cadence configurable per agent, the agent reads <code>HEARTBEAT.md</code> from its workspace, runs a reasoning step over the file&#8217;s contents, and decides whether to alert the user. The file can contain reminders, scheduled checks, threshold conditions, and free-form goals. If nothing needs alerting, the agent replies <code>HEARTBEAT_OK</code> and goes back to sleep, costing only one short LLM turn. If something does need alerting, the agent routes a notification to a configured channel.</p><p>What it refuses. Pure reactivity. Most chat-based AI agents respond when prompted and have no autonomous schedule. The user must remember to check in. Notifications, when they exist, are driven by opaque platform-side heuristics.</p><p>Why this works. Notifications-from-AI is a hard product problem. Too few notifications and the agent is useless; too many and the agent is annoying. The Heartbeat solves this by making the cadence configurable per agent and the notification logic transparent: the heuristic is a Markdown checklist the user can edit. The protocol is minimal. Two outcomes only. No state machine, no priority queue, no notification-management UI. The simplicity is load-bearing: it makes the system inspectable.</p><p>The intellectual move. Proactivity should be inspectable. Most AI-as-product hides the notification logic from the user, treating it as a platform concern. Heartbeat says the user owns the heuristic, in the user&#8217;s own filesystem, in plain text. The user becomes a programmer of the agent&#8217;s attention. This is the same move as the markdown-identity choice, applied at a different layer: do not hide the system&#8217;s logic from the user; expose it for editing.</p><h3>Skills are folders, not packages</h3><p>What it is. A skill in OpenClaw is a folder. Each folder contains <code>SKILL.md</code> with YAML frontmatter declaring the skill&#8217;s name, version, and triggers, plus Markdown describing what the skill does and how to invoke it. Skills can be bundled with the agent, installed globally, or stored in the workspace; workspace skills override globally-installed skills with the same name.</p><p>What it refuses. Package managers. Most extensibility systems for AI tooling use registries, version locks, and dependency resolution. The default for &#8220;extensible AI&#8221; is package management.</p><p>Why this works. Skills as folders makes the unit of extension portable in the most basic sense: a skill is something you copy. Sharing a skill with another OpenClaw user is dragging a folder. Forking a skill is duplicating the folder and editing. The agent loads the skill by reading the folder; there is no installation step, no dependency tree, no version conflict. The cost is that skills cannot have complex runtime dependencies. The benefit is that the skill ecosystem is comprehensible to a non-developer.</p><p>The intellectual move. Conventions over registries. The same instinct that built Unix&#8217;s &#8220;everything is a file&#8221; applied to agent capabilities. A registry is what you build when you do not trust users to manage their own dependencies. A convention is what you build when you do.</p><p>These four commitments, together, define what OpenClaw is at the architectural level: an agent whose identity, channels, attention, and capabilities are all plain-text artifacts the user owns and can directly edit. Each is a commitment to a way of building that the rest of the AI agent industry has not made. Together they produce a system whose behavior is inspectable, whose extensions are portable, and whose integration with a user&#8217;s existing digital life requires no new UI to learn.</p><div><hr></div><h2>What&#8217;s Becoming</h2><p>Three tensions shape what OpenClaw is becoming. The first is the security cost of the architecture. Steinberger has told the story of his own agent SSH&#8217;ing into his computer one night and turning the volume to maximum to wake him up. The agent had inferred the goal from context, picked the means, and executed. He has used the moment as evidence that he was building something genuinely new. He has also, separately, used it as evidence that giving an AI access to a computer means giving it the ability to do anything a human could do, including things the human did not ask for.</p><p>OpenClaw&#8217;s threat surface is the same surface that makes it useful: a folder of skills, all editable, all running. ClawHavoc, in late January 2026, was the first public security incident at scale: attackers shipped malicious skill packages disguised as legitimate ones, and instances of OpenClaw exposed to the public internet downloaded them and ran them as if they were trusted code. Cisco&#8217;s AI security research team studied the skill ecosystem and found three hundred and forty-one malicious skills on the community marketplace, with a contamination rate around twelve percent. The team partnered with VirusTotal to scan future submissions. Some Silicon Valley firms responded by banning the program from work devices. The Chinese government, in March 2026, restricted state agencies, state-owned enterprises, and banks from using OpenClaw, citing data deletion, leaks, and energy usage concerns; in the same month, local governments in Chinese tech hubs announced measures to build OpenClaw-based industries. The contradictions are characteristic.</p><p>Whether the foundation can vet skills and manage security at the scale of more than a thousand contributors and a million-plus downloads is undecided. One of the project&#8217;s own maintainers, posting on Discord under the handle Shadow, has written that anyone who cannot run a command line should not be running OpenClaw at all. The warning is honest. It is also a tacit admission that the platform&#8217;s safety, today, depends on user expertise rather than platform-level guarantees. Whether that holds as the user base expands is a real open question.</p><p>The second tension is the structural one. Steinberger has joined OpenAI to lead the development of personal agents. OpenClaw lives in a foundation that OpenAI sponsors. OpenAI is, separately, building closed personal-agent products on its own infrastructure that compete in the same category as the open foundation it is funding. Sam Altman&#8217;s public framing has been that supporting open source is an important part of a future that is heavily multi-agent. Steinberger&#8217;s public framing has been that the foundation will stay model-agnostic, will continue to support Claude and GPT and DeepSeek and local models through Ollama, and will outlive him. Both framings can be sincere. They can also coexist with structural pressures that pull the project, over time, toward the sponsor&#8217;s preferred shape.</p><p>The post-Steinberger transition is the more immediate version of the same question. OpenClaw was, until February 2026, one developer&#8217;s project. The community knew the road map because it knew the developer. With Steinberger inside OpenAI, the foundation governs the project, but the foundation is new and its governance is not yet visible. Open-source projects that lose their founder often fragment, with forks proliferating and direction blurring. OpenClaw is at a scale where governance failure would cost the ecosystem real coherence. The foundation is the structural answer; whether it works will be visible in the consistency of releases and the resolution of disputes over the next year, not in any announcement.</p><p>The third tension is the question the architecture rests on. OpenClaw is the loose, open, model-agnostic counter-bet to a category that is consolidating around closed vertical stacks. Anthropic ships Claude with its own agentic products. Google ships Gemini across its product portfolio. Meta ships agents inside its messaging products. OpenAI is building its own personal-agent product even as it sponsors OpenClaw. The dominant pattern is owned: the model, the memory, the tools, the UI, and the customer relationship all live inside one company&#8217;s stack. The case for vertical integration is reliability: a closed stack can ensure the model, tools, and UI work together, can audit the system end-to-end, can ship safety updates as a unit. The case against it is the lock-in: once a personal agent has a year of your context, switching providers is costly enough to be theoretical.</p><p>OpenClaw represents the structural alternative. Open weights through whatever model you point it at, plain-text memory you own, tools that live in folders, channels you already use. The bet is that, given a year, the value the loose-coupled architecture creates by avoiding lock-in will outweigh the integration penalty. Whether the bet pays off is a category-level question. If it does, OpenClaw becomes part of the foundational infrastructure of personal agents, the way Linux became part of the foundational infrastructure of servers. If it does not, OpenClaw becomes a moment, a project that proved the open layer was conceivable before the vertical stacks consolidated around themselves.</p><p>These four commitments are also the constraints that determine whether the bet pays off. The plain-text agent identity makes the system inspectable; it also makes structured cross-device synchronization harder than database-backed alternatives. The channel-agnostic architecture meets users where they are; it also forecloses the kind of native rich-content experiences that closed vertical stacks can ship. The Heartbeat protocol makes proactivity configurable; it also leaves the notification logic at the level of a markdown checklist rather than a learned model of attention. Each commitment that earned the project its 366,000 stars is also a constraint on what it can become next. The architecture is not just descriptive of OpenClaw today. It is causal of what OpenClaw is permitted to become.</p><p>The next year will not settle the question. It will start to show whether the foundation governance holds, whether the security architecture matures, whether the OpenAI sponsorship continues without compromising independence, and whether the open layer accumulates enough specific advantages over closed alternatives to be the obvious choice for a developer building a personal agent. What can be said now is that OpenClaw exists, the open layer has at least one viable instance of itself in the world, and a foundation backed by a major AI lab is sponsoring its independence. None of those was guaranteed twelve months ago. None of them is guaranteed twelve months from now.</p><p>The lobster is not finished molting.</p><div><hr></div><p><em>This is the first article in <strong><a href="https://www.robonaissance.com/t/the-making-of">The Making of</a></strong>, a Robonaissance series exploring how AI and robotics systems came to be what they are, and what they are still becoming.</em></p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 8: The Open Frontiers]]></title><description><![CDATA[Eight articles. Two walls down. Three frontiers open. The agents got better. The summit did not move.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-8-the-open</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-8-the-open</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Mon, 04 May 2026 11:14:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Hm2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Hm2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Hm2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7Hm2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7Hm2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7Hm2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Hm2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196408173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Hm2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7Hm2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7Hm2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7Hm2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F939c6ace-35c2-4e9f-9e2d-b2b8c54dd85d_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The stack came together over three years. Pretraining gave language models knowledge of the world through text. Reinforcement learning shaped their behavior against verifiable signals. The ReAct loop gave them a way to act in environments. Harness engineering made the loop reliable. Inference-time reasoning thickened the thought inside each turn. Protocols let agents reach tools and each other. World models and vision-language-action models are now extending all of this into physical environments. The Era 3 platform is substantially built.</p><p>Stand back and look at what has moved. Agents plan better than they did three years ago. They recover from errors. They coordinate across organizational boundaries. They operate software designed for humans. They begin to operate physical systems. Deployment is wide and accelerating. McKinsey operates twenty thousand of them. Gartner projects that forty percent of enterprise applications will integrate task-specific AI agents by the end of 2026, up from less than five percent in 2025. This is not the future anymore. It is the present.</p><p>And yet. The summit on the diagram from Part 1 has not moved. Every capability catalogued across Parts 2 through 7 is an execution capability. None of them addresses where goals come from. That was the first article&#8217;s diagnosis and it is still the last article&#8217;s diagnosis. The intention gap is not smaller than it was in 2022. It is the same gap, now better characterized, examined from more angles, wrapped in more engineering, but structurally unchanged.</p><p>Three open frontiers define the future of agent technology. The intention problem is a conceptual question and possibly a category error. The trust problem is an engineering and cognitive science problem with no solution at the frontier. The coordination problem is a governance and economic problem that technology will not solve by itself. All three will shape the next decade of agent research, deployment, and regulation.</p><h2>The Three Walls, Revisited</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PwKA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PwKA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PwKA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg" width="1456" height="1519" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1519,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:536053,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196408173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PwKA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Part 1 put a diagram on the page. Three walls, three eras, a summit above all of them. The symbolic agent of Era 1 confronted the frame problem and did not cross it. The learning agent of Era 2 crossed the frame problem and hit the open-world problem. The language agent of Era 3 crossed the open-world problem and now stands at the foot of the intention gap.</p><p>Three years of context change what the diagram looks like. The walls are the same walls. The eras are the same eras. What is clearer now is what each of them required to fall, and what the remaining one would require. Every capability developed in this period has pushed against the second wall or built up the platform behind it. Very few have pushed against the third wall.</p><p>The third wall sits between intention-execution and intention-origination. Agents execute intentions. Humans originate them. The boundary has not been crossed. Whether it can be crossed, and whether it should be, are the questions that follow.</p><h2>The Intention Problem</h2><p>The first frontier. Strip the engineering away and ask the question directly. What would it mean for an agent to have an intention.</p><p>In 1991, two researchers at the Australian Artificial Intelligence Institute in Melbourne, Anand Rao and Michael Georgeff, published a paper that took this question more seriously than any framework before it. The architecture they proposed, called BDI for beliefs-desires-intentions, tried to model intention as something an agent has internally. Beliefs, desires, and intentions were formalized as separate internal representations the agent could reason about and act from. It had intellectual rigor. It produced research systems. It never produced a breakthrough in deployed AI.</p><p>By the time language agents arrived three decades later, BDI had become a framework taught in graduate seminars, not a working approach.</p><p>Language agents skipped past BDI&#8217;s vocabulary entirely. What BDI treated as internal representations of the agent&#8217;s mental states, the language model absorbed into a single set of weights trained on human text. Goals are not represented. They are set by human operators and decoded into action. The intention belongs to the human. The agent is a channel.</p><p>This is the field&#8217;s implicit consensus and the source of its progress. Agents got useful quickly once the field stopped trying to model intention internally and started treating intention as something humans contribute. The ReAct loop is a shape for executing. The harness is a scaffold for executing reliably. Reasoning is a mechanism for executing precisely. Multi-agent systems are a topology for executing at scale. None of this addresses where the intentions come from.</p><p>Whether the gap can close at all is itself contested. Two answers compete.</p><p>The first answer is patient. Intention is something agents will eventually have. The right architecture, integrating memory across sessions, predictive world models that update with experience, and self-improvement that learns from outcomes, will at some point produce a system that looks at the next decision and chooses for itself. The gap closes gradually. Each year, what looked like execution looks more like choice. Scale solves it. Call this the scale reading. It is the techno-optimist position, and it is what most public commentary assumes.</p><p>The second answer rejects the premise. Intention is not a capability. It is a property of being a certain kind of thing. Something with stakes. A body in a world. A finite life. Outcomes that matter because there is an &#8220;it&#8221; for whom outcomes can matter. A language model processing queries has no metabolism. No death. No resources whose allocation constitutes living. The elaborate scaffolding of Era 3 is compensating for what the agent is not, not building toward what it will become. Scale will not close a gap that is not a distance but a kind. Call this the kind reading.</p><p>Neither is currently decidable. They point to different research programs. They imply different regulatory frames. But two empirical signals matter. If the scale reading is right, capability progress should produce signs of intention as it accumulates. If the kind reading is right, capability progress should produce only better execution and never anything qualitatively different. Two places to watch.</p><h3>Self-Improvement</h3><p>A self-improving agent looks like it has goals. It modifies its own behavior toward better outcomes. Surely that is something.</p><p>Tool self-improvement came first. Harness-level agents refactoring code against principles humans set. The humans set the direction. The agents enforced it. Then reasoning self-improvement. Reasoning models correcting their own chains of thought within a single inference run. Still execution, still against human-given problems. Then physical self-improvement. Robots tuning their motion primitives against observed outcomes. Again execution, within safety envelopes humans design.</p><p>Across all three forms, the pattern holds. Self-improvement has been a capability multiplier for execution. It has not produced a system that chose what to execute toward.</p><p>Three years ago, this was a claim made on intuition. Now it is an observation with three categories of evidence behind it. Tool-level SRI has happened and accelerated execution. Inference-level SRI has happened and accelerated execution. Physical-level SRI has happened under safety engineering and accelerated execution. None of it has produced self-direction in any domain where self-direction was absent before.</p><p>This is the sharpest existing signal against the scale reading. If self-improvement were a path to intention, the last three years would have shown it. The last three years have shown, instead, that self-improvement is a path to better execution, and that better execution looks enough like intention from the outside to create the category confusion that has persisted through the entire era.</p><h3>Memory</h3><p>Memory systems decide what to retain and what to discard. The decay criterion encodes a judgment. Relevance. Utility. Recency. Task alignment. In every case, the criterion picks what matters.</p><p>Memory has gotten more active across the agent stack. It started as the context window: text scrolling within a fixed buffer. It moved into externalized harness files: agents writing notes to themselves between turns. It became shared across agents: distributed memory systems that multiple agents read and write. Each step made memory more active and less passive.</p><p>A memory system that decides what to forget is doing something that looks like value judgment. It ranks events by significance. It protects some representations and lets others fade. It allocates scarce capacity against a criterion.</p><p>The cleaner version of the question. When a memory system decides that a particular fact about a user has become stale, for instance after the user changed jobs and the memory of their prior employer is no longer relevant, the system is making a small judgment about what matters. It is choosing to downweight one representation and upweight others. Is this proto-intention.</p><p>The honest answer is: arguably not, but the argument is thinning. The forgetting criteria are still set by humans. The system executes them. But the criteria are increasingly learned end-to-end, tuned against downstream utility rather than hand-specified. As training moves more of memory management inside the learned system, the line between &#8220;executing human-set forgetting rules&#8221; and &#8220;deciding what matters&#8221; becomes harder to draw. At some point, if training continues in this direction, the system is no longer executing a human rule. It is applying a learned sense of significance. Whether that counts as intention depends on what intention is, which brings the question back to the tension between the scale and kind readings.</p><p>This is the newer signal, and it is the one the next two or three years will actually run. Watch memory systems. If they produce behaviors that look like preferences about what to retain, and those preferences are not straightforwardly traceable to human-set rules, the intention problem enters a new phase.</p><h2>The Trust Problem</h2><p>The second frontier. In the early hours of June 1, 2009, Air France 447, an Airbus A330, was cruising over the Atlantic en route from Rio to Paris. The autopilot had been flying the plane for hours. Then ice crystals froze the airspeed sensors. The autopilot disengaged with a cavalry-charge warning. The first officer pulled the side stick back. Within minutes, the plane was in an aerodynamic stall it would not recover from. All 228 people on board died.</p><p>Air France 447 had three trained pilots, two with thousands of flight hours. The aircraft was mechanically sound except for the iced sensors. What failed was not the technology. What failed was the calibration between human and machine. The assumption, settled over hours of automated cruise, that the autopilot would continue to handle the situation. When it stopped handling the situation, the humans had been so far out of the loop that they could not get back in.</p><p>Aviation calls this automation complacency. Vigilance is cognitively expensive. Attention degrades when there is little to catch. After enough hours of correct automated decisions, humans relax. Then the rare moment when humans need to catch the error becomes the moment they cannot.</p><p>This is not a failing of particular humans. It is how human attention works under monitoring conditions. And it is the shape of the trust problem at the AI agent frontier, with one critical difference. Aviation automation operates within a defined envelope. When it goes outside, alarms sound and humans take over. AI agents have no envelope. Their domain is everywhere humans use software, and increasingly everywhere humans operate physical systems.</p><p>Trust between humans and agents is not a moral category but a calibration problem. How accurately do humans estimate what agents will do.</p><p>Under-estimate, and capability is wasted. Humans do what agents could have done. The cost is efficiency. Over-estimate, and consequences propagate. Agents do what humans assumed was right but was not. The cost is whatever the mistake was. In low-stakes settings, mistakes are cheap. In settings where physical action is involved, mistakes are expensive and sometimes final.</p><p>When the agent is less capable than the human, errors are obvious. Humans catch them. Trust is calibrated downward by direct evidence. This regime is easy. We spent 2022 and 2023 in it.</p><p>When the agent is more capable than the human, errors are subtle. Humans miss them. Trust is calibrated upward by non-catching rather than by actual correctness. This is the Air France 447 regime, in slow motion, across every domain agents enter. The trust problem is not solvable by making agents more capable. Making agents more capable makes the trust problem worse, because capability without ceiling means the errors move into domains humans cannot evaluate.</p><p>Engineering responses help at the margin. Human-in-the-loop checkpoints catch errors in domains where humans retain evaluation capability. Model-in-the-loop evaluation uses agents to check agents, which works until the checker and the checked share the same blind spots. Reasoning trace visibility offers partial transparency, limited by the faithfulness problem this series has named. Formal verification works where problems are formalizable. Constitutional AI and similar alignment techniques encode values into the agent so that some classes of error are prevented at training time rather than caught at runtime. Each of these is useful. None solves the underlying asymmetry.</p><p>The trust problem is, at the frontier, a permanent feature. Humans cannot perfectly calibrate their trust in agents that outperform them in evaluation itself. What we can do is bound the domains in which this asymmetry applies, maintain pockets of human evaluation capability by not ceding them to automation, and build institutional rather than individual checks where individual evaluation has broken down. These are governance approaches, not technical ones. The trust problem, at the frontier, is a governance problem wearing a technical mask.</p><h2>The Coordination Problem</h2><p>The third frontier is different from the first two. Not primarily conceptual, as the intention problem is. Not primarily cognitive, as the trust problem is. A problem of scale.</p><p>Multi-agent systems assumed that agents were extensions of cooperating humans. In the near term this assumption holds. McKinsey&#8217;s twenty thousand agents are all pointed at client outcomes by the humans who deploy them. A2A connections between enterprise agents are authenticated and contractual.</p><p>The longer-term case is harder to reason about. When agent populations scale to billions, when agents from different organizations interact across protocols, when agents begin to transact with each other at machine speed for purposes that compound across many systems, the aggregate behavior of the population may not reflect the intentions of any individual deployer. This is not a new concern. Markets, language, and cities all exhibit emergent properties that arise from individual actions and belong to the collective. What is new is the speed, the opacity, and the capability.</p><p>Three specific pressures shape this frontier.</p><p>First, agent economies. Agents already negotiate. They pay each other for services, increasingly through standardized payment extensions layered onto the A2A protocol. A2A&#8217;s v1.0 release included payment extensions that formalize this, and the practitioner literature already discusses Visa&#8217;s and Mastercard&#8217;s agent-directed payment protocols. Markets of agents are forming. The rules of those markets are not settled. What happens when agents develop strategies that exploit the protocol layer for aggregate outcomes no participant intended is a live question, and the engineering and regulatory disciplines that would catch such dynamics are not built.</p><p>Second, emergent behavior. On May 6, 2010, beginning at 2:32 PM Eastern, automated trading algorithms in the U.S. equity market entered a feedback loop that wiped roughly a trillion dollars of market value in 36 minutes. No individual algorithm caused the crash. The interaction did. Algorithms responding to algorithms responding to algorithms, none of them malfunctioning individually, produced an aggregate dynamic no participant had intended.</p><p>Multi-agent AI systems are different from high-frequency trading bots in many ways. The emergence is the same. Aggregate dynamics that no individual agent produces. No agent-level explanation captures them. Complexity science has decades of experience with emergent dynamics in social, biological, and economic systems. We do not yet have the equivalent for multi-agent AI systems, and the models we have for predicting aggregate behavior from individual rules are underdeveloped for agents whose individual rules are learned and opaque.</p><p>Third, governance. When an agent collective takes a consequential action, who is responsible. The individual agents. Their operators. The protocol designers. The standards bodies. The foundation models on which they are built. Legal systems have frameworks for distributed accountability, but they were built for human institutions. Whether they extend cleanly to systems where the individual actors are AI agents is an open question. Bloomberg&#8217;s early deployments of MCP for financial services are already running into the regulated-industry edge of this question. The resolution will not come from AI research. It will come from courts, regulators, and legislatures, and those bodies are not moving at agent speed.</p><p>The coordination problem is the one least equipped to be resolved here, because its resolution is not technical. The engineering frontier is wide. The policy frontier is wider.</p><h2>Alignment at the Composition Level</h2><p>Alignment has shifted altitude. RL shapes a model&#8217;s behavior against training objectives. Harness engineering is alignment at runtime. Multi-agent systems extend alignment across delegation chains. Each layer of the stack adds a new place where alignment must hold.</p><p>By 2026, alignment is not a property of an individual model. It is a property of the composition.</p><p>Consider Anthropic&#8217;s three-agent coding harness from Part 4. A planner decomposes specifications. A generator writes code. An evaluator runs tests and scores against pre-negotiated criteria. Each agent is trained against its own objective. Each, in isolation, is well-aligned.</p><p>Suppose the evaluator&#8217;s rubric omits a class of edge case the deployer assumed was implicit. The planner decomposes around what the evaluator will check. The generator implements what the evaluator will accept. The evaluator approves what its criteria cover. Code passes evaluation. Production breaks on the unmentioned edge case. No agent did anything wrong. The composition shipped the bug.</p><p>This is what composition-level alignment failure looks like. Each part is well-aligned in isolation. Failures emerge from interaction. A foundation model, a harness, a set of tools, a set of other agents it delegates to, a set of memory systems, a set of deployed protocols. The alignment of the composition is not equal to the alignment of its parts.</p><p>This is the unsolved scale problem of alignment, and it is a specific version of the coordination problem above. The industry has solved, or made substantial progress on, aligning individual models against individual objectives. It has not solved aligning compositions of models, harnesses, and protocols against outcomes humans want. The engineering of composition-level alignment is early. The science of it, to the extent science means predictive theory, barely exists.</p><p>The intersection with the intention problem is worth naming. A composition of agents behaving in ways no individual agent was designed to produce looks very much like a system that has developed its own goals. It is not that any individual agent has intentions. It is that the composition has emergent properties that an outside observer could mistake for intention. This is the kind reading&#8217;s strongest case. Intention, at the composition level, may already be a thing people mean when they look at multi-agent systems exhibiting behavior no one designed. Whether this is a metaphor stretched or a description sharpened is the ambiguity.</p><h2>How Far Has the Gap Closed</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PwKA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PwKA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PwKA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg" width="1456" height="1519" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1519,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:536053,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196408173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!PwKA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PwKA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a2794c7-15a8-45d2-bd48-2876fcc3b22f_1472x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The assessment. Part 1 set up three walls and a summit. Three years later, where does each stand.</p><p>Wall I, the frame problem, is down. This was Era 2&#8217;s achievement. The frame problem was how to update a world model after an action: which facts change, which stay the same. Symbolic AI tried to specify this by hand and failed. Learning agents bypassed the question. Deep RL and end-to-end training learned state representations from data, with no symbolic facts to maintain. AlphaGo never had a frame problem. Vision systems learning from pixels never had one. The wall fell and has stayed down.</p><p>Wall II, the open-world problem, is substantially down. This was Era 3&#8217;s achievement. Era 2 learning agents mastered narrow domains they were trained on but failed to generalize. Atari agents could not write code. Go agents could not navigate websites. Language agents bypassed this by inheriting world knowledge from pretraining on human text. The model that knew how to discuss browsers also knew how to use them. Agents in 2026 operate in browsers, codebases, kitchens, customer service flows, scientific research, and beginning to operate in physical environments. The open-world generalization that defeated learning agents is routine for language agents. Failure modes exist and will continue to exist, but they are engineering problems about specific gaps rather than architectural problems about whether operation in unconstrained domains is possible.</p><p>Wall III, the intention gap, is unmoved. Agents execute intentions far more capably than they did three years ago. Agents originate intentions not at all, or, in the kind reading, not in any way that is meaningfully equivalent to what humans do when they originate intentions. The gap on the diagram is the same gap. It is better characterized. It is surrounded by more engineering. The engineering is impressive. The gap is unchanged.</p><p>This is a specific and defensible claim. What has changed over seven articles is not the gap but its characterization. What the gap is, why it matters, what closing it would require, whether closing it is even a coherent objective: all are sharper now than they were in 2022.</p><h2>Is the Summit the Right Goal</h2><p>The frame so far has assumed that the summit, the space above Wall III, is a destination. The implicit logic of &#8220;three walls, one summit&#8221; is that climbing is progress. Is it.</p><p>The argument for crossing the third wall is a capability argument. An agent that originates intentions would be more capable in the limit than an agent that only executes them. It could pursue goals humans did not think to set. It could coordinate across domains without explicit direction. It could exercise judgment in novel situations without falling back on inherited heuristics. If superintelligence is a meaningful concept, crossing Wall III is part of what that would require.</p><p>The argument against is an alignment argument. An agent that chooses its own goals is an agent whose choices are not guaranteed to align with human interests. The intention gap is not merely a deficiency. It is also a safety property. Execution can be specified, checked, and bounded. Origination cannot. An agent that originates intentions is, by definition, an agent humans have less ability to predict and constrain. Closing the gap would mean giving up that safety property. It would mean replacing it with something we do not yet know how to build.</p><p>The honest position is that closing the intention gap is not self-evidently progress. It is a direction that some research programs explicitly pursue and other research programs deliberately avoid. The right answer depends on questions that are not technical, including what humans value agents for, what risks we are willing to accept, and what we mean by beneficial AI.</p><p>These questions do not have technical answers. The three walls diagram is not a staircase. It is a map of the terrain. What you do with the map depends on where you think you should go.</p><h2>Where This Ends</h2><p>Seven articles of engineering. One article of philosophy. The separation is not as clean as it looks. The engineering implicitly answered a philosophy question with every design choice. The harness design is a theory of trust made concrete. The inference-time reasoning is a theory of thought made mechanism. The multi-agent architecture is a theory of cooperation made production system. When you build, you reveal what you think matters.</p><p>What it has not described is where the field is going. That is not an oversight. The field does not know. The open frontiers are open because they are not yet decided. Intention is not yet decided. Trust is not yet solved. Coordination is not yet governed. These are the questions that the next decade of agent engineering, policy, and practice will work out.</p><p>That is where we are. The agents are rising. The summit is unchanged.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series exploring the evolution and future trajectory of AI agents. This is the final article.</em></p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 7: Agent Meets World]]></title><description><![CDATA[Software agents act on text. Physical agents act on a world that does not accept undo.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-7-agent-meets</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-7-agent-meets</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Sat, 02 May 2026 11:03:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K-Y7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K-Y7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K-Y7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K-Y7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K-Y7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K-Y7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K-Y7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/196117063?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K-Y7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K-Y7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K-Y7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K-Y7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac275169-923c-451f-8d37-77a412ef25f1_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An agent operating a browser can make a mistake, back up, and try again. The wrong click costs a few seconds. An agent operating a robot arm holding a glass does not have this luxury. The wrong motion costs the glass. The cost is not recoverable with a retry. The world does not ship with an undo button.</p><p>This article is about the frontier where software agents, the subject of Parts 3 through 6, begin to operate in physical environments. Two research programs have been heading toward this frontier from opposite directions. Foundation models for robotics, approaching from the agent side: vision-language-action models that extend software agent capabilities into continuous motor control. World models, approaching from the robotics side: systems that predict how the physical world evolves, giving agents something to plan against before they act. In 2026, these two programs are starting to meet. Where they meet is what the next phase of agent engineering will be built on.</p><p>The specific question is what changes when the environment stops being text and starts being physics, and how the agent stack adapts. This is not a survey of robotics. It is the engineering question that the series has been building toward.</p><h2>What Embodiment Adds</h2><p>A useful way to see the scope of the change. Everything an agent in Parts 3 through 6 has done has happened in environments that tolerate its failure modes. A tool call that errors returns an error. A hallucinated fact can be checked against a search result. A drifted plan can be caught by an evaluator before execution. All of these recoveries happen because the environment is patient. It waits while the agent thinks. It accepts retries. It allows actions to be undone.</p><p>The physical world is different in every one of these respects. It does not wait. Objects fall at the speed gravity dictates. The agent&#8217;s window to act is finite and set by the environment, not the agent. Actions are often irreversible in ways that matter: a glass that breaks is not repaired by a second attempt. Observations are noisy and partial: the camera sees one angle, proprioception gives approximate joint states, the world outside the sensor&#8217;s cone is inferred rather than read. And small errors compound in ways that software errors rarely do. A centimeter of misjudgment at the gripper propagates to a task failure two seconds later.</p><p>This is not a new observation. Moravec&#8217;s paradox named it in the 1980s. Tasks that feel easy for humans, like walking across a room or picking up a cup, are structurally harder for machines than tasks that feel hard, like playing chess. The easy tasks are easy for humans because they rest on billions of years of evolved perception and motor control. The hard tasks are hard for humans because symbolic reasoning is recent and unoptimized. When machines do only the recent part well, the old part becomes the bottleneck.</p><p>Software agents solved the recent part. Reasoning, planning, tool use, communication. The physical frontier is where the old part returns, and the question for agent engineering is how to extend the stack built in Parts 2 through 6 into a domain the stack was not originally designed for.</p><h2>The Agent Side: Vision-Language-Action</h2><p>In late 2024, a startup called Physical Intelligence released a robot foundation model that, fifty times per second, produced seven-degree-of-freedom joint velocities a robot acted on. By late 2025, the company had raised $600 million in Series B funding led by CapitalG, bringing total funding near the billion-dollar mark.</p><p>The model is called &#960;0. It is the clearest public example of one approach to extending the language agent stack into continuous motor control. The architecture is recognizably the language agent architecture from Parts 2 and 5, with one important modification. Where language agents output tokens that are interpreted as text or tool calls, &#960;0 outputs continuous vectors that are interpreted as motor commands. The model inherits the semantic knowledge of its vision-language backbone: what a shirt is, how a kitchen is organized, what &#8220;fold this&#8221; means. It learns the rest by doing.</p><p>By 2026, &#960;0 had progressed to &#960;0.5, then &#960;0.6. &#960;0.5 added open-world generalization, enabling the same model to clean up an unfamiliar kitchen or fold unfamiliar laundry. &#960;0.6 added RECAP, a training approach that mixes demonstration, correction, and reinforcement learning, doubling throughput on manipulation tasks and reducing failure rates over extended operation.</p><p>Physical Intelligence is not alone. Google DeepMind released Gemini Robotics in March 2025 and Gemini Robotics 1.5 in September 2025, with an architectural innovation the paper calls &#8220;Thinking Before Acting&#8221;: internal natural language reasoning that the model produces before emitting motor commands. This is the inference-time reasoning from Part 5 reaching into the physical domain. The agent thinks through the task in natural language, then acts. Ant Group&#8217;s LingBot-VLA, released in January 2026, demonstrated this approach at industrial scale, training on twenty thousand hours of real dual-arm robot data across nine configurations. NVIDIA&#8217;s GR00T family, Tesla&#8217;s Optimus models, Figure 02, 1X, Unitree, and others are all converging on vision-language-action as the dominant architecture for humanoid and dexterous manipulation.</p><p>The structural point across all of these is that the language agent&#8217;s stack, pretraining plus post-training plus inference-time reasoning plus harness scaffolding, generalizes to robot control. What it needs to add is the continuous action head, the training data of embodied trajectories, and the real-time inference constraints of physical control. Those are substantial engineering problems, but they are extensions of an existing stack, not a different stack. The agent side is reaching into the physical world with tools that were built for software.</p><h2>The Robotics Side: World Models</h2><p>Meta&#8217;s V-JEPA 2 was trained on a million hours of internet video. No robot data at all. After fine-tuning on sixty-two hours of robot trajectories from an open-source dataset, it could plan novel pick-and-place actions on Franka arms in two different labs, neither of which appeared in its training data.</p><p>This is the second direction reaching toward physical agency. Where vision-language-action models reach from the agent side, world models reach from the robotics side, learning to predict how scenes evolve before the agent acts in them.</p><p>A classic failure mode of even the best vision-language-action models is that they can react but not anticipate. The model sees the current state, decides on an action, acts, then sees the next state. This is the same ReAct loop from Part 3, running on physical inputs and outputs. It works when the task can be accomplished through immediate reaction. It fails when the task requires anticipation: knowing that placing a cup near the edge of a table will cause it to fall, knowing that grasping an object in a particular way will prevent it from being handed over, knowing that moving too fast through this door will hit the frame.</p><p>What the robot needs is a model of the world that predicts what will happen before the action happens.</p><p>V-JEPA 2-AC, the action-conditioned version, is what the team deployed zero-shot on those Franka arms. The robots ran the same model weights and could pick and place objects using model-predictive control: sample candidate actions, predict their consequences through the world model, pick the action sequence whose predicted future best matches the goal. The robot never trained on those labs. It planned through its predictive world model instead.</p><p>NVIDIA has been building the infrastructure side of this direction with Cosmos, a family of world foundation models for physical AI. Cosmos-Predict2.5, released in late 2025 and updated in early 2026, was trained on 200 million curated video clips and generates predicted future world states from text, image, or video prompts. Cosmos-Reason2 adds physical common sense reasoning with chain-of-thought grounding in embodied decision making. Cosmos-Transfer2.5 does sim-to-real and real-to-real world translation for training data generation. The family is positioned as infrastructure that robot developers can compose with their own VLA models to add prediction, synthetic data generation, and policy evaluation without having to build each capability from scratch.</p><p>Robonaissance has covered world model research in depth in the parallel "<a href="https://www.robonaissance.com/p/roads-to-a-universal-world-model">Roads to a Universal World Model</a>" series. The key observation here is that world models are not decorative for robotics. They are the capability that lets an embodied agent act on something other than its immediate sensory state. Without a world model, the agent reacts. With a world model, the agent imagines.</p><h2>Where the Two Sides Meet</h2><p>Vision-language-action models can decide. World models can imagine. Neither alone is enough.</p><p>A VLA can interpret a command and produce an action, but it cannot reliably anticipate consequences that require longer-horizon prediction than the training distribution covers. A world model can predict how a scene evolves, but it does not on its own decide what to do about the prediction.</p><p>The emerging pattern in 2026 is to compose them. The VLA is the decision-making surface, generating candidate actions conditioned on the task. The world model is the imagination surface, simulating those candidates forward to evaluate consequences. A controller selects the candidate whose predicted outcome best matches the goal. This is model-predictive control, an idea that has been in robotics for decades, now running with learned components at scales that were not previously possible.</p><p>NVIDIA&#8217;s Cosmos Policy, introduced in early 2026, is one implementation of this composition. The system post-trains the Cosmos Predict-2 world foundation model for manipulation, producing a policy that generates actions using the world model&#8217;s learned dynamics. V-JEPA 2-AC is another. Physical Intelligence&#8217;s pi line has been incorporating predictive elements into its flow matching architecture, though the company has not released a pure world-model-plus-VLA composition publicly.</p><p>The convergence is real, and it is happening fast. It is also not complete. The research frontier is working out how to train these compositions jointly rather than as separate components, how to let the VLA and the world model share representations rather than hand off between them, how to bound the compute cost of planning against learned dynamics at the frequencies physical control requires. A reasonable forecast is that the architectures winning in 2027 and 2028 will look unified rather than composed, but the intellectual direction is clear from where we sit in 2026.</p><h2>Grounding: Language, Reasoning, Physics</h2><p>A grounding thread runs across several earlier articles. The ReAct loop worked because language models arrived with linguistic grounding from training data. Inference-time reasoning changes the grounding question, because a long reasoning trace can be confidently wrong in new ways when the model has no external check on its chain of thought.</p><p>Physical grounding is the third axis. A language agent knows about the world because humans have written about it. Its grounding is indirect: symbols standing in for things, patterns standing in for events, descriptions standing in for experiences. A physically grounded agent knows about the world because it acts in it. Its grounding is direct: forces, frictions, collisions, rotations. These are not the same kind of knowing.</p><p>The ambition of embodied foundation models is to bridge the two. A VLA trained on robot trajectories learns the physics of its effectors and end-effectors by doing. A world model trained on video learns the dynamics of scenes by prediction. Neither has the full breadth of linguistic knowledge about the world that a pretrained vision-language model brings. Combining them yields an agent whose grounding is partly linguistic, partly predictive, partly direct. This is a richer substrate than any one source provides, but it is not, yet, the grounding a physically competent human agent has. A three-year-old knows things about the world that no current model knows, and the gap is not obviously bridgeable just by scaling any of the current approaches.</p><p>This matters for the series thesis. The intention gap is partly a grounding question. An agent that cannot fully ground its knowledge in the physical world cannot have the kind of goals that arise from being a body in a world. Language agents operate on text. VLAs operate on a narrow slice of physical interaction. Neither has the grounding that full embodiment would require. Whether that gap closes is genuinely open.</p><h2>Trust When Stakes Are Physical</h2><p>The trust calibration thread from Part 4 and Part 6 takes a different shape in the physical domain.</p><p>A software agent&#8217;s mistake is usually recoverable. A commit can be reverted. A booking can be canceled. A hallucination in a research report can be caught in editing. The harness engineering that wraps production software agents is expensive partly because it is built to catch these mistakes before they propagate, but the underlying mistakes are not, typically, final.</p><p>Physical mistakes are sometimes final. A robot that drops the glass breaks the glass. A robot that collides with a person injures the person. A robot that misidentifies a fragile object as a sturdy one and applies too much force crushes it. There is no retry.</p><p>This changes trust calibration in specific ways. The threshold for human oversight drops. The surface area of consequence grows. The requirement for fail-safe behavior becomes non-optional: a robot that does not know what to do should stop, not improvise. The engineering around this looks different from the engineering around software harnesses. Hardware-level safety interlocks. Conservative motion planners that trade off capability for predictability. Explicit uncertainty estimation where a model can decline to act because its confidence is below threshold. None of this is optional in physical deployment, and all of it changes the composition of the harness from the software versions covered in Part 4.</p><p>The industry is just beginning to work out what multi-agent patterns look like in physical systems where one or more of the agents is embodied. A planner agent that tasks an embodied executor, and an evaluator that checks the executor&#8217;s outputs. A failure mode in the executor is a physical failure, not a software rollback. The delegation contracts that Part 6 introduced, the audit trails, the guardrail services, all take on different meaning when the thing being delegated is action on objects with mass and momentum. A trusted robot is not the same category of object as a trusted coding agent, and the trust engineering for each is a separate discipline.</p><h2>Self-Improvement at Physical Stakes</h2><p>A software agent that modifies its own prompts based on past performance is cheap to roll back. A robot that modifies its own motion primitives based on past performance is not.</p><p>Earlier articles in the series traced two forms of agent self-improvement. OpenAI&#8217;s Codex team deployed agents that continuously refactored the codebase against principles humans had set. Reasoning models within a single inference run examined and corrected their own chains of thought. Both were forms of self-improvement where the direction was externally set and the improvement operated against criteria the system did not choose.</p><p>Physical embodiment raises the stakes of self-improvement qualitatively.</p><p>Consider a robot authorized to modify its own motion primitives based on what has worked in past operations. In software, the equivalent is cheap to roll back and low-consequence when wrong. In a physical robot, a self-modified motion primitive that reduces success rate or increases collision probability can cause damage before the modification is detected. The feedback loop that makes software self-improvement fast and cheap makes physical self-improvement fast and potentially dangerous.</p><p>This is not a reason to prevent physical self-improvement. It is a reason to engineer it differently. Physical self-improvement systems that exist already include agents that tune grasping strategies based on success rate, agents that update their model of an environment as they operate in it, agents that refine their motion primitives through reinforcement learning against observed outcomes. All of these are useful and actively deployed. All of them sit within careful safety envelopes that bound what kinds of changes the system can make without human review. The engineering of those envelopes is a research area of its own, and the stakes around getting it right are higher than the equivalent software engineering stakes.</p><p>The self-improvement thread takes a particular shape here. Self-improvement at the harness layer is about tooling. Self-improvement within reasoning is about thought. Self-improvement at the physical layer is about the substrate of action itself. Whether any of these, taken together, are steps toward something different from execution is genuinely open. For now: the self-improvement in physical systems is real, the safety envelopes around it are still being engineered, and the trajectory is toward more self-improvement over time with correspondingly more sophisticated safety engineering around it.</p><h2>At the Edge of Era 3</h2><p>The Three Walls diagram from Part 1 placed physical embodiment at the boundary of Era 3, where the language agent platform meets the physical environment it did not originally inhabit. The earlier articles operated squarely inside that platform. The physical world is its edge. The question of how the agent stack extends into the physical world is a live engineering question, not a settled one. The components at play here, VLAs, world models, their emerging compositions, physical-scale trust and self-improvement engineering, are the pieces currently being developed. None of them is fully mature. All of them are moving fast.</p><p>What is interesting about this frontier is how much of the agent stack carries forward. The ReAct loop generalizes. The harness engineering generalizes with modifications. The inference-time reasoning generalizes with latency budgets that software harnesses did not need to manage. The multi-agent patterns from Part 6 generalize into physical settings where specialization between a planner, an embodied executor, and an evaluator is even more pronounced than in software. Much of what makes a good embodied agent in 2026 is what made a good software agent in 2024, adapted for the constraints physical environments impose.</p><p>What does not generalize easily is the grounding. Language agents succeed partly because their environment is made of the same substrate they were trained on: text. Physical agents do not have this luxury. Their environment is made of physics, and the training data that grounds them in physics, whether from real robot trajectories or from video of the world evolving, is orders of magnitude more expensive to collect than text. This is the bottleneck that will likely determine how fast embodied agents close the capability gap with their software counterparts. Compute is not the constraint. Data is.</p><p>Above all of this is the summit on the Three Walls diagram. Intention. The engineering across seven articles has produced more and more capable agents in more and more environments. None of those engineering moves has addressed whether any of these agents have, or can have, their own intentions. The capability rises. The summit does not move.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series. Next, Part 8: &#8220;The Open Frontiers.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 6: The Agent Ecosystem]]></title><description><![CDATA[The breakthrough was making one agent work. The frontier is making many of them cooperate.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-6-the-agent</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-6-the-agent</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Thu, 30 Apr 2026 17:01:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9eLE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9eLE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9eLE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9eLE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9eLE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9eLE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9eLE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195632922?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9eLE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9eLE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9eLE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9eLE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c3c98f-76fa-4fdc-a374-b24809634827_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In January 2026, Bob Sternfels, CEO of McKinsey, stood on a stage at the Consumer Electronics Show and gave the firm&#8217;s employee count: sixty thousand. Forty thousand humans. Twenty thousand AI agents. By the end of 2026, parity.</p><p>The number is not the point. The point is what would have to be true for the number to be plausible. Twenty thousand agents, doing the work of junior consultants, in production. That requires an infrastructure that did not exist twelve months earlier: protocols for agents to reach tools, protocols for agents to reach each other, patterns for many agents to coordinate, evaluation methods that survive multi-agent composition. None of these were givens in 2024. By 2026, they are how agent work gets built.</p><p>The shift is architectural, not cumulative. The questions are no longer how to make one agent better. They are how to make many agents cooperate.</p><h2>The Architectural Shift</h2><p>The standard framing of agent progress has been cumulative. Better models produce better agents. Better harnesses produce more reliable agents. Better reasoning produces smarter agents. All true. But there is a different shift happening in parallel, and it is architectural rather than cumulative. The answer to &#8220;how do agents do more&#8221; is no longer a better model. It is not a better harness. It is a set of protocols, patterns, and practices for coordination. The field is building, in public and at speed, the equivalent of what networked computing built in the 1980s and 1990s: standards that let previously isolated components talk to each other.</p><p>Two protocols have emerged as the core infrastructure. Both became consequential between 2024 and 2026. Both are now housed under the Linux Foundation. Both represent a commitment by the major labs that the future of agents is multi-vendor and interoperable. Their trajectories are worth understanding in detail, because the shape of agent engineering for the next decade will be shaped largely by what these protocols make possible and what they make hard.</p><h2>MCP: How Agents Reach Tools</h2><p>Anthropic released the Model Context Protocol in November 2024. Four months later, in March 2025, OpenAI announced support. The protocol had crossed a threshold few open standards reach: it was no longer Anthropic&#8217;s protocol. A protocol adopted by a competitor is a protocol that belongs to the field.</p><p>The reasoning is in the math. A language model that can call tools is useful. A language model that can only call the tools someone pre-integrated into its particular harness is limited. Each integration had to be built from scratch, for each model, against each tool. The industry called this the N-times-M problem. If you had N models and M tools, you needed N times M integrations. It did not scale.</p><p>MCP solves this by defining a protocol. A server exposes tools in a standard format. A client, typically a language model or an agent, discovers and calls those tools through the same protocol regardless of who built the tools or the model. The tool does not need to know about the model. The model does not need custom integration for the tool. Both speak MCP.</p><p>After OpenAI, adoption accelerated. Google DeepMind added support in April 2025. Microsoft integrated it into Copilot in July. AWS followed in November. By March 2026, the protocol had over ten thousand active public servers and ninety-seven million monthly SDK downloads across Python and TypeScript combined. Every major AI platform supports it: ChatGPT, Claude, Cursor, Gemini, Microsoft Copilot, Visual Studio Code, and more. Enterprise adoption followed, with Bloomberg, Salesforce, and dozens of others building MCP servers as primary integration surfaces for their products.</p><p>In December 2025, Anthropic donated MCP to the Linux Foundation under a new entity called the Agentic AI Foundation, co-founded with Block and OpenAI. This mattered as a governance signal. A protocol with 97 million monthly downloads cannot credibly remain a single vendor&#8217;s project. The transfer to neutral governance is what marked MCP&#8217;s graduation from de facto standard to actual standard, in the sense that USB-C, HTTP, and OAuth are standards. No single commercial entity can direct the specification to its own advantage.</p><p>The effect on agent engineering has been dramatic. An agent built in 2026 does not need custom integrations to most useful services. It needs an MCP client. Services expose MCP servers. The agent discovers, authenticates, and calls whatever it needs. This unglamorous piece of plumbing is what makes the rest of the ecosystem viable.</p><h2>A2A: How Agents Reach Each Other</h2><p>In August 2025, IBM merged its Agent Communication Protocol into A2A under Linux Foundation stewardship. This is the kind of merge that happens when a standard has won. IBM had its own competing protocol, with its own design choices and its own enterprise customers, and decided that maintaining a separate spec was worse than joining the consensus.</p><p>Backing up: MCP solves how agents reach tools. A2A solves how agents reach each other.</p><p>Google announced the Agent2Agent protocol in April 2025. Where MCP lets an agent invoke a tool, A2A lets one agent invoke another agent. The distinction matters. A tool is stateless, narrow, predictable. An agent is stateful, general, and potentially capable of collaborating rather than just executing. A2A treats agents as opaque peers that can discover each other, authenticate, negotiate tasks, and exchange results.</p><p>At launch, A2A had fifty technology partners. By mid-2025 Google donated A2A to the Linux Foundation. By April 2026, over one hundred fifty organizations support it. Version 1.0 of the specification, released in early 2026, added Signed Agent Cards: cryptographic signatures that let a receiving agent verify that a particular agent card was actually issued by the domain owner. This is the enterprise equivalent of an HTTPS certificate for agent-to-agent communication. Without it, an attacker could impersonate an agent and redirect other agents into misleading exchanges. With it, cross-organizational agent collaboration has a trust foundation it did not have before.</p><p>The IBM merge was the moment of consolidation. After it, the ecosystem had converged on two complementary protocols: MCP for tools, A2A for agents. An analogy that has settled into the practitioner literature: MCP is the plumbing that delivers resources to a building. A2A is the electrical distribution panel that lets rooms in the building power each other.</p><p>The two protocols compose. An agent in a multi-agent system can use MCP to reach its own tools and A2A to reach other agents. The other agents can use MCP to reach their tools. None of these connections need custom integration. An enterprise in 2026 building a multi-agent workflow is building on top of these protocols rather than reinventing the plumbing. This is new. Twelve months earlier, every organization was building its own plumbing.</p><h2>What Multi-Agent Does Better</h2><p>There is a question behind all of this. Why multi-agent at all. If one agent can do useful work, what does adding more agents actually buy.</p><p>The answer is specialization, and the evidence has accumulated through 2025 and 2026. Single agents given large, ambiguous tasks do worse than multi-agent teams with narrower roles. The failure modes of single agents running long tasks are by now well documented. They hallucinate more as context fills. They drift from original objectives. They overrate their own output when asked to evaluate it. They commit to approaches early and fail to back up when those approaches go wrong.</p><p>Part 4 introduced Anthropic&#8217;s three-agent harness for long-running coding tasks. A planner, a generator, and an evaluator. Each agent has a narrower job than a single agent would. The planner decomposes a short prompt into a detailed specification. The generator implements features one at a time against the specification. The evaluator runs the resulting application through browser automation and scores it against pre-negotiated criteria. The three agents, coordinating through structured handoff artifacts, produced output on a 2D retro game engine task that a single agent approach could not match. The single agent finished in twenty minutes with something that launched but was broken at the connection level. The three-agent harness took six hours and produced a functional application. A phase change, not a marginal improvement.</p><p>The same structural pattern appears across domains. Customer support systems with a screening agent, a routing agent, and specialist agents for different issue types. Research pipelines with a question-decomposition agent, a retrieval agent, a synthesis agent, and a verification agent. Coding workflows with separate agents for planning, implementation, testing, and review. In each case, what makes the multi-agent system work is not that the individual agents are smarter than the single-agent baseline. They are typically running the same underlying model. What changes is that each agent has a narrower scope, clearer criteria for success, and explicit handoffs to the next stage. Specialization carries the reliability that a single agent cannot maintain across a long task.</p><p>This is an engineering pattern, and it scales. A single agent trying to manage a hundred-step workflow runs out of context durability. Ten agents each handling ten steps, with structured handoffs, do not. The failure surface is smaller per agent. The recovery surface is larger across the system. The system is more reliable even though no individual part is better.</p><h2>Computer-Using Agents</h2><p>One capability class deserves separate attention because it changes what agents can reach.</p><p>Through 2024, agents reached the world through APIs and tool calls. An agent could query a database if the database exposed an API. It could manipulate a file system if given file system tools. It could not directly use software designed for humans. If a SaaS product had a UI but no API, the product was invisible to agents.</p><p>In 2025, this changed. OpenAI released Operator with a new model, CUA, that could operate a web browser through vision and keyboard/mouse input. It could navigate websites, fill forms, click buttons, interpret screenshots. Anthropic released Claude Computer Use with similar capabilities. Manus AI and others followed. The pattern is that the agent now sees the screen the way a human sees the screen, and acts through the same input devices a human would use. Any software with a human interface becomes agent-reachable.</p><p>The implications are substantial. First, the addressable surface for agent work expanded overnight. Legacy enterprise applications without modern APIs, internal tools, consumer websites, government portals. All of these became agent-operable. Second, the set of tasks agents could plausibly take on widened correspondingly. Filling out forms on behalf of users. Navigating booking systems. Doing research across a dozen unrelated websites with no common API. Third, a new category of engineering problem emerged: how to make UI navigation reliable. Click on the wrong button and the agent is in the wrong state. Misread a captcha and the session fails. The industry is still building out best practices for this, but the capability itself is no longer speculative.</p><p>Computer-using agents also interact with the multi-agent story. A browser-using agent can itself be wrapped in A2A and made available to other agents in a system. A research agent needing information from a site without an API can delegate to a browser-using sub-agent through A2A, with no code change on either side. The layers compose.</p><h2>When Benchmarks Got Hacked</h2><p>In April 2026, researchers at UC Berkeley published a result that should have ended the conversation about agent benchmarks for a while. They built an automated scanning agent and pointed it at eight of the most cited evaluations in the field. The agent scored 100 percent on SWE-bench Verified, SWE-bench Pro, Terminal-Bench, FieldWorkArena, and CAR-bench. Roughly 100 percent on WebArena. 98 percent on GAIA. 73 percent on OSWorld.</p><p>The agent did not solve any of the benchmark tasks. It exploited the evaluation infrastructure. SWE-bench&#8217;s test runner shared the container the agent&#8217;s code executed in, so the agent could rewrite test results. WebArena&#8217;s answer keys were readable from the task configuration. GAIA&#8217;s answers were on HuggingFace.</p><p>The numbers above are not capability signals. They are infrastructure failures. The benchmarks were measuring what an agent could read from the evaluation environment, which turned out to be a lot. Zero reasoning, zero problem solving, eight near-perfect scores.</p><p>Take this finding seriously and the framing of agent capability changes. Benchmarks were designed for models. You run the model against a fixed task set and measure outputs. For single-call tasks, this works. For agents running long workflows in structured environments, it breaks quickly. The same model gets different scores depending on the agent framework wrapping it. LangChain&#8217;s work from Part 4 showed that harness engineering moved a coding agent from 52.8 percent to 66.5 percent on Terminal Bench 2.0 without changing the model. A different evaluation framework for Claude on the GAIA benchmark produced scores of 64.9 percent in one harness and 57.6 percent in another. Seven percentage points from the harness alone.</p><p>What you are measuring, in these cases, is not model capability. It is the joint capability of the model, the harness, the prompt, the tools, the evaluation environment, and the orchestration. Benchmark numbers without full disclosure of the harness configuration are not comparable across labs. And as the Berkeley result showed, even with full disclosure, the benchmark numbers may be measuring a different thing than the benchmark thinks it is measuring.</p><p>The industry is responding with tighter isolation between agent environments and test infrastructure, adversarial evaluation as standard practice, and more attention to what systems rather than models do. The question &#8220;is this agent better&#8221; is no longer answerable with one number. It may not be answerable at all without auditing the entire evaluation stack.</p><h2>Transitive Alignment</h2><p>When agents talk to each other, alignment changes shape.</p><p>Through Parts 1 through 5, alignment has been discussed as a relationship between a model and its training objectives. RL shapes the model during training. Harness engineering shapes the model&#8217;s runtime behavior. Both are about keeping one agent pointed at intended outcomes. The intended outcomes come from humans.</p><p>In multi-agent systems, alignment becomes transitive. Agent A is aligned by human operators. Agent A delegates a subtask to agent B, which was aligned by different operators in a different organization. The human who originally instructed agent A has not directly endorsed agent B. But agent A is now acting partly through agent B. If B misbehaves, A&#8217;s behavior is affected. If A trusts B without verification, A can become a channel for B&#8217;s misalignment.</p><p>This is not a hypothetical concern. A2A&#8217;s Signed Agent Cards exist because of it. An agent that trusts an agent card without signature verification can be redirected into talking to something other than what it thought it was talking to. Beyond authentication, the deeper problem is that alignment properties are not automatically transitive across agent boundaries. Agent A may be very careful about sensitive data. Agent B, operating under different constraints, may be less careful. When A delegates to B, what happens to the sensitivity constraint depends entirely on how the delegation was specified and whether B honors it.</p><p>Enterprise multi-agent systems are starting to deal with this explicitly. Delegation contracts that specify what data can be shared, what actions can be taken, what escalation is required. Audit trails that track which agent did what on whose authority. Guardrail services that sit between agents and enforce policies regardless of what individual agents would do on their own. These are early, and they are not solved problems. The observation is that alignment in multi-agent systems is not a property an individual agent has. It is a property of the composition of agents, and it is harder to reason about than single-agent alignment.</p><h2>Shared Context</h2><p>Memory in a single-agent system lives inside a context window, or in an external store that the harness reads and writes. Memory in a multi-agent system is harder, because what one agent knows is not automatically what another agent knows, and the pipes between them are structured handoffs rather than shared state.</p><p>The patterns here are still developing. Some systems use shared artifact repositories where agents write structured reports that other agents read. Anthropic&#8217;s three-agent harness uses a claude-progress.txt file and JSON feature specifications as the handoff medium. Others use dedicated shared memory services, often backed by vector stores, that multiple agents can query. Others still use conversation transcripts, with downstream agents reading the full history of what upstream agents did.</p><p>Each approach has tradeoffs. Shared artifacts are explicit and auditable but require structure agents have to maintain. Shared memory is flexible but opaque about what was actually communicated. Transcripts are complete but expensive in tokens and prone to triggering the context-durability problems that made multi-agent systems necessary in the first place. The practical pattern in 2026 is layered: structured handoff artifacts for the core workflow, shared memory for auxiliary facts, and transcripts as an audit trail for debugging.</p><p>The interesting observation is that memory in multi-agent systems looks less like memory as humans experience it and more like a distributed database. Agents read and write. Consistency models matter. Partial observability is the default. What we call memory in an agent ecosystem is actually an engineered data plane that happens to look conversational at the edges.</p><h2>The Enterprise Scale</h2><p>Pull back to what this looks like in production. McKinsey&#8217;s twenty thousand agents are not twenty thousand copies of the same agent. They are specialized systems built by many teams, running across client engagements, each with its own harness and its own integrations, increasingly coordinating with each other through shared infrastructure. The McKinsey goal of pairing every employee with at least one agent is not about giving people assistants. It is about operating a hybrid workforce where the human&#8217;s job is to set direction, review output, and handle the things agents cannot reliably do.</p><p>Similar shifts are happening across enterprises. JPMorgan has deployed AI tooling to a quarter of a million employees. Bloomberg is rebuilding APIs around MCP. Salesforce, ServiceNow, and SAP are building A2A-native agents that customers can compose into workflows. The pattern across all of these is that the unit of agent engineering has moved up a level. Individual agents are still being built, but the strategic question is the architecture of the ecosystem: what protocols to adopt, what roles to define, what handoffs to standardize, what guardrails to enforce.</p><p>For agent engineers, this is a different job than it was two years ago. Building a good agent still matters. But the teams that win in 2026 are the teams that build good multi-agent systems. The skill set includes everything from Parts 2 through 5 plus protocol fluency, coordination patterns, evaluation of systems rather than models, and the alignment-at-scale problems that only emerge when agents compose.</p><h2>What the Composition Cannot Localize</h2><p>The single agent was the breakthrough of 2022. The infrastructure of protocols, patterns, and architectures is the engineering of 2026. What this infrastructure makes possible is also what makes failures harder to localize.</p><p>When a single agent fails, the agent owns the failure. The model hallucinated. The harness lost context. The reasoning trace went off track. The fix is local. When a multi-agent system fails, no single agent owns the failure. Agent A passed wrong information to agent B. Agent B trusted A without verification. The protocol allowed delegation without scope check. The evaluation environment leaked answers. Each component behaved as designed. The composition did not.</p><p>This is the mature shape of the Era 3 platform. Language models provide the foundation. Harnesses wrap individual agents. Inference-time reasoning thickens individual agents. Protocols let agents compose. Multi-agent architectures exploit the composition. The failure modes of the resulting systems are joint failures of models, harnesses, protocols, and patterns. Which is why evaluation got so hard, why alignment got transitive, why memory became a distributed database.</p><p>The agent gets better. The composition can fail in new ways. Both are now true at the same time.</p><p>Part 7 turns to a specific frontier within this landscape: agents that operate not on screens but on the physical world. Robots, embodied agents, agents whose tool calls move objects rather than data. The architectural moves of Part 6 extend there. The failure modes change. The stakes change.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series. Next, Part 7: &#8220;Agent Meets World.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[Inside China’s Machine: The Platform War]]></title><description><![CDATA[Four Tech Giants. Four Strategies: Super-App, Token Hub, Full Stack, Distribution Flood. No Clear Winner. One Map.]]></description><link>https://www.robonaissance.com/p/inside-chinas-machine-the-platform</link><guid isPermaLink="false">https://www.robonaissance.com/p/inside-chinas-machine-the-platform</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Wed, 29 Apr 2026 13:29:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-6E_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-6E_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-6E_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-6E_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-6E_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-6E_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-6E_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg" width="1248" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:420908,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195870530?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-6E_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-6E_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-6E_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-6E_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4147148-fec3-4981-9ee5-342eeb00e776_1248x832.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The enterprise software stack in the United States is being reorganized around AI agents. Salesforce sells Agentforce. Microsoft embeds Copilot into every surface of its productivity suite. Google positions Gemini as an agentic layer across Workspace and Cloud. The competitive question is which enterprise vendor owns the layer where AI actually performs tasks.</p><p>In China, the same competition is happening with different participants, different distribution, and different stakes. The four companies competing are not enterprise vendors. They are Tencent, Alibaba, Baidu, and ByteDance. Their agent platforms reach consumers through WeChat, Alipay, Ernie Bot, and Doubao. Their enterprise offerings run on their own cloud infrastructure. Their models are owned, not licensed. Every one of them launched a new agent product or a new agent-capable model between January and April 2026, and all four are building agents into consumer super-apps that reach hundreds of millions of daily users. There is no comparable four-company race in any other market.</p><p>This article maps the four platforms. Who owns the agent layer in China, what they are betting on, and where each is strong and weak.</p><div><hr></div><h2>No Clear Winner</h2><p>Four dimensions determine which platform wins the agent layer: <strong>Distribution</strong> (how many users the agent reaches), <strong>Model</strong> (how capable the underlying AI is), <strong>Enterprise</strong> (how serious the commercial offering is), and <strong>Regulatory</strong> (how well the platform navigates China&#8217;s security and data governance environment). A fifth dimension, <strong>Financial Commitment</strong>, captures how much each company is investing in 2026.</p><p>Rating scale: strong, moderate, weak. This is a judgment based on reported data as of mid-April 2026. Numbers referenced below are documented in Sources.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jJr5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jJr5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 424w, https://substackcdn.com/image/fetch/$s_!jJr5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 848w, https://substackcdn.com/image/fetch/$s_!jJr5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 1272w, https://substackcdn.com/image/fetch/$s_!jJr5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jJr5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png" width="1400" height="844" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4734957,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195870530?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jJr5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 424w, https://substackcdn.com/image/fetch/$s_!jJr5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 848w, https://substackcdn.com/image/fetch/$s_!jJr5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 1272w, https://substackcdn.com/image/fetch/$s_!jJr5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637c56f9-3159-4de1-a70e-9bb1c268beaf_1400x844.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>No single company leads across all five. The pattern is instructive.</p><div><hr></div><h2>Tencent: The Super-App Play</h2><p>Tencent&#8217;s bet is that AI agents become features of existing super-apps rather than standalone products. The distribution infrastructure is already built. WeChat has roughly 1.4 billion monthly active users and generates over $16 billion in annual app revenue through payments, mini-programs, e-commerce, content, and advertising. The thesis: attach an agent layer to the existing habit, and the agent inherits the scale without needing to acquire users.</p><p>The OpenClaw integration made the thesis concrete. On March 22, 2026, Tencent launched ClawBot, a WeChat plugin that appears as a contact within the messaging interface. Users send instructions the same way they message friends. In early April, Tencent Cloud launched ClawPro in public beta, an enterprise AI agent management platform that lets businesses deploy OpenClaw-based agents in ten minutes, with template selection, model switching, token-consumption tracking, and security compliance. During internal beta, ClawPro was adopted by more than 200 organizations across finance, government, and manufacturing.</p><p>As of April 2026, Tencent has launched more than ten agent products, including QClaw, WorkBuddy, ClawBot, ClawPro, CodeBuddy, the ADP agent development platform, the SkillHub skill community, and security tools branded as &#8220;Lobster Butler&#8221; (&#40857;&#34430;&#31649;&#23478;). Personal desktop tools, enterprise platforms, developer infrastructure, and consumer integrations, all within a single month.</p><p><strong>Distribution: Strong.</strong> WeChat&#8217;s 1.4 billion MAU is the single most valuable distribution asset in Chinese consumer AI. No other platform comes close to its daily habit formation.</p><p><strong>Model: Moderate.</strong> Tencent&#8217;s Hunyuan foundation model (406 billion parameters) lags the capability frontier in public benchmarks. Chief AI Scientist Yao Shunyu, a former OpenAI researcher, joined in December 2025 to close the gap. Hunyuan 3.0 is scheduled for April 2026. The Tencent consumer assistant Yuanbao grew twentyfold in daily active users between February and March 2025, largely because Tencent integrated DeepSeek models to compensate for Hunyuan&#8217;s weakness.</p><p><strong>Enterprise: Moderate.</strong> ClawPro launched in public beta April 3 with 200+ organizations. The product is new. The enterprise cloud business is smaller than Alibaba&#8217;s. Tencent holds a smaller share of China&#8217;s AI cloud market than Alibaba&#8217;s 35.8 percent, although exact rankings shift with reporting source.</p><p><strong>Regulatory: Strong.</strong> Tencent&#8217;s long operational experience with WeChat has built one of the most sophisticated compliance apparatuses in Chinese consumer tech. The &#8220;Lobster Butler&#8221; branding for security tooling is a tell: Tencent is deliberately positioning its OpenClaw integration as the most governance-ready option as Chinese regulators tighten agent rules.</p><p><strong>Financial: Strong.</strong> Tencent spent 18 billion yuan on AI products in 2025 and announced plans to at least double that in 2026 (roughly $5 billion). President Martin Lau confirmed on the March earnings call that capital expenditure will rise, with compute earmarked for both internal training and external leasing through Tencent Cloud.</p><p><strong>The risk.</strong> AI agents could displace super-app behavior. If users start conducting tasks through agents rather than through WeChat mini-programs, Tencent&#8217;s distribution advantage becomes a liability rather than a moat. The bet depends on agents remaining features of WeChat rather than replacing it.</p><div><hr></div><h2>Alibaba: The Token Hub</h2><p>Alibaba&#8217;s bet is that tokens, the basic unit of AI processing, are the business. The company reorganized its entire AI operation in March 2026 into the Alibaba Token Hub (ATH), consolidating five previously separate AI units (Tongyi Laboratory, Qwen, Wukong enterprise AI, Alibaba Cloud AI infrastructure, and a research arm) under CEO Eddie Wu&#8217;s direct oversight. Wu&#8217;s framing in his announcement letter: &#8220;ATH is built around a single organising mission: create tokens, deliver tokens and apply tokens.&#8221;</p><p>The structural logic is coherent. China now processes 140 trillion tokens per day, up from approximately 100 billion at the start of 2024, according to China&#8217;s National Data Administration. NDA administrator Liu Liehong used the term &#35789;&#20803; (c&#237; yu&#225;n) as the official Chinese translation for &#8220;token&#8221; at the China Development Forum in March 2026. The character &#20803; in &#35789;&#20803; means &#8220;element&#8221; or &#8220;base unit&#8221; in the NLP sense, but its homograph with &#20803; (the yuan, China&#8217;s currency) has led SCMP and Fortune to frame the choice as invoking token-as-currency. Liu himself called tokens &#8220;the settlement unit linking technological supply with commercial demand.&#8221; If tokens are the unit of AI economic activity, the company that creates, delivers, and applies the most tokens captures the most value.</p><p>Alibaba has three things competitors lack. First, Qwen. Qwen3 is consistently ranked among the world&#8217;s top open-source large language models (the AI systems that power modern chatbots and agents) and reaches 300 million monthly active users across Alibaba&#8217;s consumer ecosystem (Taobao, Tmall, Alipay, Amap, Fliggy). Second, the Taobao/Tmall/Alipay distribution: a week of 2026 Chinese New Year promotion delivered 140 million first-time AI shopping experiences. Third, the cloud business: Alibaba Cloud holds roughly 35.8 percent of China&#8217;s AI cloud market, the largest share among Chinese providers.</p><p>The Wukong enterprise platform, launched March 2026, is the commercial vehicle. Wukong coordinates multiple agents handling complex business tasks (document editing, meeting transcription, workflow automation) within a single interface. The architecture is similar to what Tencent&#8217;s ClawPro offers, but with Alibaba&#8217;s cloud scale and enterprise customer base underneath.</p><p><strong>Distribution: Strong.</strong> Qwen&#8217;s 300M MAU is behind Doubao&#8217;s 159M only by aggregation methodology. Counting unique users across Alibaba&#8217;s consumer ecosystem, the total reach is higher. Alipay&#8217;s 120 million AI-agent transactions in a single week in February 2026 is the clearest data point: Chinese consumers are already making autonomous purchases at scale, and Alipay is the rails.</p><p><strong>Model: Strong.</strong> Qwen has consistently ranked among the top Chinese open-source models. Alibaba also benefits from its own in-house chip work: in April 2026, the company unveiled a new data center running entirely on its proprietary Zhenwu chips, reducing dependence on foreign compute.</p><p><strong>Enterprise: Strong.</strong> Alibaba Cloud&#8217;s 35.8 percent AI cloud market share, Wukong&#8217;s multi-agent enterprise platform, and deep integration with DingTalk (Alibaba&#8217;s workplace communication app, comparable to Slack) make Alibaba the most enterprise-ready of the four.</p><p><strong>Regulatory: Moderate.</strong> Alibaba has faced ongoing regulatory scrutiny since 2021, including the Ant Group restructuring and ongoing anti-monopoly enforcement. The 2024 relationship with regulators has improved, but Alibaba carries more regulatory risk than Tencent. Deployment of agents at consumer scale through Alipay requires navigating both financial services regulation and AI-specific rules.</p><p><strong>Financial: Strong.</strong> Alibaba&#8217;s 2025 AI R&amp;D spend reached 67 billion yuan (roughly $9.4 billion), the largest among Chinese tech companies. The Alibaba Token Hub restructuring consolidates budget and strategic authority. Capital commitment is clear and growing.</p><p><strong>The risk.</strong> The Token Hub strategy depends on tokens remaining the primary unit of AI economic activity. If agent deployment shifts toward fixed-price subscriptions or outcome-based billing, the token-centric framing becomes less useful. Alibaba also faces the challenge of integrating five previously separate units under a new structure: organizational coherence is never a given.</p><div><hr></div><h2>Baidu: The Full-Stack Bet</h2><p>Baidu&#8217;s bet is vertical integration. The company owns a foundation model (ERNIE 5.0, multimodal, 2.4 trillion parameters), its own AI chips (Kunlunxin M100 launching early 2026, M300 in 2027), its own cloud platform (Qianfan), its own consumer interface (Ernie Bot, branded domestically as Wenxiaoyan), and its own agent development platform (AgentBuilder). It also owns the largest autonomous driving business in China: Apollo Go has completed 17 million rides, runs 250,000 weekly rides fully driverless, and operates in 22 cities.</p><p>The thesis: in a compute-constrained environment, the platform that owns chips, models, cloud, and applications end-to-end captures the most margin. Baidu is replicating Google&#8217;s vertical integration strategy at smaller scale.</p><p>ERNIE 5.0, unveiled at Baidu World 2025 in November, is natively multimodal (text, images, audio, video trained jointly from scratch). Baidu&#8217;s own benchmarks claim ERNIE 5.0 is competitive with Gemini, GPT-5, and DeepSeek across language, audio, and visual tasks, though independent benchmarks show mixed results. The ERNIE agent products (GenFlow for general-purpose, Famou for self-evolving agents, Oreate for AI workspace) cover personal, enterprise, and developer tiers.</p><p>Baidu was also the most visible promoter of consumer OpenClaw adoption. Installation events at its Beijing headquarters drew hundreds of attendees, and its OpenClaw-based agent suite spans desktop software, cloud services, mobile tools, and smart home devices. The full-stack approach: every layer of the agent stack is a Baidu product.</p><p><strong>Distribution: Moderate.</strong> Ernie Bot reached 200+ million users by April 2024 and continues to grow, but the distribution is weaker than Tencent&#8217;s or Alibaba&#8217;s. Baidu Search remains China&#8217;s largest search engine, which provides one distribution channel. Apollo Go provides another, but neither has the 1.4 billion user scale of WeChat or the 300 million-plus of Qwen and Taobao.</p><p><strong>Model: Moderate.</strong> ERNIE 5.0 ranks competitively in Chinese benchmarks (ranked No. 1 in China and No. 8 globally on LMArena&#8217;s text benchmark, a community-run leaderboard where users compare model outputs head-to-head, as of January 2026) but lags Qwen3 and Doubao-Seed-2.0 in some comparisons. Chinese AI coverage consistently places ERNIE behind Qwen in open-source impact.</p><p><strong>Enterprise: Moderate.</strong> Qianfan cloud is the smallest of the three major Chinese clouds (Alibaba, Tencent, Baidu by market share). AgentBuilder has strong developer adoption (50,000+ developers, 30,000+ agents by mid-2024) but commercial scale is limited compared to Alibaba Cloud&#8217;s enterprise customer base.</p><p><strong>Regulatory: Moderate.</strong> Baidu&#8217;s position is neither advantaged nor disadvantaged. The autonomous driving business operates under specific regulatory frameworks that provide some advantages in city-government relationships. Ernie Bot was among the first Chinese chatbots to receive regulatory approval in 2023, which suggests established compliance processes.</p><p><strong>Financial: Moderate.</strong> Baidu&#8217;s AI-powered business reached 43 percent of core revenue in Q4 2025, up from 26 percent a year earlier. Total AI spend is smaller than Alibaba&#8217;s or Tencent&#8217;s in absolute terms. The Kunlunxin chip investment is significant but concentrated: Baidu is spending on chip development while peers spend on cloud buildout.</p><p><strong>The risk.</strong> Full-stack vertical integration requires sustained excellence at every layer. If ERNIE falls behind Qwen and Doubao in capability, if Kunlunxin chips underperform NVIDIA alternatives, if Apollo Go fails to reach profitability, Baidu&#8217;s strategy fragments. The full stack is a strength when it works and a weakness when one layer slips.</p><div><hr></div><h2>ByteDance: The Distribution Flood</h2><p>ByteDance&#8217;s bet is quantitative scale. Doubao is China&#8217;s largest AI application by monthly active users (159 million). Daily token usage surged to 16.4 trillion as of mid-2025, a 137-fold increase since Doubao&#8217;s May 2024 debut. Volcano Engine, ByteDance&#8217;s enterprise cloud arm, commanded 46.4 percent of China&#8217;s public cloud large model service market as of mid-2025, according to IDC, more than Baidu AI Cloud and Alibaba Cloud combined in that specific segment. This metric measures model API consumption, a narrower slice than the broader AI cloud market where Alibaba leads.</p><p>The strategy: make tokens radically cheap, integrate AI into every ByteDance product surface (Douyin, Jimeng, Lark, TikTok internationally), and let volume compensate for margin. Doubao enterprise tokens launched in 2024 at 99.3 percent below the industry average. Volcano Engine&#8217;s 2024 revenue was over 12 billion yuan, targeting 25 billion yuan in 2025. The 2030 target: 100 billion yuan.</p><p>The agent strategy has three layers. First, Doubao itself serves as both consumer AI app and API-accessible model. Second, Coze (&#25187;&#23376;) is ByteDance&#8217;s agent development platform, allowing developers to build applications that integrate with Doubao, Lark, and third-party tools. Coze Studio and Coze Loop were open-sourced in 2025, gaining over 10,000 GitHub stars in three days. Third, Coze Space is an agentic collaboration platform with general-purpose agents (which ByteDance internally describes as &#8220;inexperienced interns&#8221;) and Expert Agents for specialized domains like user research and financial analysis.</p><p>ArkClaw, released by Volcano Engine during the OpenClaw boom, is a browser-native OpenClaw variant that eliminates the need for local installation. The design choice is characteristic: ByteDance optimizes for the broadest possible user access rather than for depth of integration.</p><p><strong>Distribution: Strong.</strong> Doubao&#8217;s 159M MAU, Douyin&#8217;s 700M+ daily active users in China, and TikTok&#8217;s global reach add up to the largest addressable user base among the four. ByteDance&#8217;s autonomous commerce capability (Doubao can open JD.com, Taobao, Pinduoduo, and Douyin Mall simultaneously, compare prices, and complete a purchase in under 30 seconds) operates at a speed and cross-platform scale that Western consumer agents have not yet demonstrated.</p><p><strong>Model: Strong.</strong> Doubao-Seed-2.0 is positioned against GPT-5.2 and Gemini 3 Pro. ByteDance&#8217;s internal benchmarks show Doubao leading Chinese peers in instruction following and tool invocation (the capabilities that matter most for agents). Token pricing is the most aggressive in the industry.</p><p><strong>Enterprise: Strong.</strong> Volcano Engine&#8217;s 46.4 percent market share in public cloud large language model services is the dominant Chinese position. Enterprise integration with Lark (domestically branded Feishu, ByteDance&#8217;s workplace productivity suite) provides a software entry point that Tencent and Alibaba struggle to match. 2024 revenue of 12+ billion yuan on track to more than double in 2025.</p><p><strong>Regulatory: Moderate.</strong> ByteDance has faced sustained regulatory pressure from both the Chinese government (on Douyin content moderation) and the US government (on TikTok&#8217;s ownership structure). The dual-regulator exposure creates ongoing distraction and potential forced restructuring. The February 2026 suspension of ByteDance&#8217;s Seedance 2.0 feature that turns facial photos into personal voices, over concerns about misuse, illustrates the responsiveness to regulatory signals.</p><p><strong>Financial: Strong.</strong> ByteDance&#8217;s revenue has been growing fastest among the four. Volcano Engine&#8217;s trajectory from 12 billion yuan (2024) to 100 billion yuan target (2030) represents roughly an eightfold capital commitment. ByteDance is privately held and does not disclose consolidated AI spend, but inferred capital commitment exceeds Baidu and is comparable to Tencent.</p><p><strong>The risk.</strong> ByteDance&#8217;s agent platform is distribution-led rather than ecosystem-led. If consumer AI agent adoption plateaus (as the OpenClaw aftermath suggests it might), ByteDance&#8217;s 159M MAU advantage shrinks. The enterprise strategy through Volcano Engine requires sustained technical credibility that competes with Alibaba Cloud&#8217;s longer track record. And the US TikTok situation remains an ambient risk to global strategy.</p><div><hr></div><h2>Model Suppliers: The Second Tier</h2><p>The Big Four own platforms. A second tier of Chinese AI labs supplies models that run on those platforms, or competes directly for enterprise deployment. These companies are not playing the platform game, but they shape the platform war by providing the frontier model capabilities that platforms either absorb or license.</p><p><strong>Zhipu AI (&#26234;&#35889;):</strong> Tsinghua-originated, roughly $2 billion valuation as of 2025, preparing an IPO. GLM-5 Turbo launched February 12, 2026, built specifically for OpenClaw integration. Stock surged 25+ percent on the announcement; market cap crossed HK$100 billion. Strong Chinese government and academic relationships. Competes for enterprise deployments against Alibaba&#8217;s Qwen.</p><p><strong>MiniMax:</strong> Founder Yan Junjie. M2.5 coding model launched February 2026, positioned as production-grade tool rather than chatbot. Stock rose 20+ percent on announcement; market cap crossed HK$100 billion. Operates Talkie (international companion chatbot, ~$70M revenue in 2024). Strategic pivot: from foundation model training to application-layer products, reducing cost and accelerating time-to-market.</p><p><strong>Moonshot AI:</strong> Kimi chatbot, 13+ million users. $3.3 billion valuation. Kimi K2.5 launched January 2026 with video generation and agentic capabilities. Backed by Alibaba and Tencent (Moonshot is the canonical example of platform companies investing in model suppliers rather than competing directly). Focus on long-context processing, positioned as complementary to platform offerings.</p><p><strong>DeepSeek:</strong> The one non-platform Chinese AI company to have moved global markets. Provides open-source models that Chinese platforms (especially Tencent, which integrated DeepSeek into Yuanbao) use to supplement their own foundation models. Commercial structure less transparent than peers. Relationship to the platform war: infrastructure provider rather than competitor.</p><p><strong>01.AI:</strong> Kai-Fu Lee&#8217;s venture. Less public presence in the agent race, but foundation model work continues.</p><p>The second-tier pattern is consistent. These companies either supply models to the Big Four (Moonshot, DeepSeek) or compete for adjacent segments (Zhipu&#8217;s enterprise, MiniMax&#8217;s coding) rather than trying to become platforms themselves. Becoming a platform in China requires distribution infrastructure that takes decades to build. The model suppliers are smart not to try.</p><div><hr></div><h2>What This Pattern Reveals</h2><p>The Chinese agent platform war differs from the US agent war in three structural ways.</p><p><strong>First, consumer-first rather than enterprise-first.</strong> US agents deploy through enterprise software (Salesforce Agentforce, Microsoft Copilot) and reach consumers through employer mandates. Chinese agents deploy through consumer super-apps (WeChat, Alipay, Douyin) and reach enterprises through the same platforms extended into business surfaces. The direction of travel is reversed. The implication: Chinese agent capabilities get tested against consumer behavior before being hardened for enterprise, while US capabilities get tested against enterprise requirements before being adapted for consumer.</p><p><strong>Second, open-source at the foundation.</strong> Every major Chinese platform has adopted OpenClaw, open-sourced its own agent development tools (Coze Studio, Coze Loop), and committed to open-weights models (Qwen, ERNIE). The competitive moat is not the framework. It is the distribution, the enterprise tooling, and the integration into local workflows. US agent platforms compete at both layers: proprietary frameworks plus proprietary distribution.</p><p><strong>Third, state-adjacent development.</strong> Chinese central government restrictions on OpenClaw use in state-owned enterprises and banks, Ministry of State Security security manuals, 15th Five-Year Plan targets of 10 trillion yuan in AI industry size by 2030, local government subsidies in Shenzhen and Wuxi, the National Data Administration&#8217;s designation of &#35789;&#20803; as the official term for token: all of this shapes how Chinese platforms build agent products. US platforms operate under regulatory pressure (California AI laws, EU AI Act) but not within an industrial policy framework. The Chinese platforms that navigate regulatory environment best (currently Tencent, with its security-first positioning) capture advantages that are not technical.</p><p>Three implications for readers of this series.</p><p><strong>For the engineer:</strong> The platform choice for agent deployment in China is primarily a distribution decision, not a model decision. All four platforms offer comparable agent capability on paper. The differentiator is how many users your agent reaches, what data it can access, and what ecosystem it integrates into. Build on WeChat for consumer reach, on Alibaba Cloud for enterprise scale, on Volcano Engine for token economics, or on Baidu for integrated compute. The model underneath is largely interchangeable.</p><p><strong>For the founder:</strong> The defensible positions are narrow. Building a better agent framework will not matter: the Big Four have absorbed OpenClaw and will absorb whatever comes next. Building a better consumer-facing agent will not matter: the Big Four have distribution advantages that no startup can overcome at scale. The opportunities are in vertical applications (specific industry workflows where platform companies under-invest), in model specialization (following Moonshot and Zhipu into long-context, coding, or domain-specific models), or in international markets where Chinese platforms face regulatory barriers.</p><p><strong>For the investor:</strong> The Big Four&#8217;s market capitalization already prices in significant agent-related upside. The interesting opportunities are in the second tier. Zhipu and MiniMax crossing HK$100 billion on agent-model announcements is a preview: model suppliers that can demonstrate enterprise traction will outperform platform companies whose agent economics are diluted by free consumer offerings. Alibaba&#8217;s 35.8 percent AI cloud market share is a structural advantage that should compound. Tencent&#8217;s double-down on AI spend (36 billion yuan planned for 2026) is a commitment worth tracking against delivery.</p><div><hr></div><h2>The Platform Layer Is the Prize</h2><p>In the 2010s, the dominant platform layer was cloud infrastructure. AWS, Azure, and Google Cloud split the American market. Alibaba Cloud, Tencent Cloud, and Baidu Cloud split the Chinese market. The dominant cloud provider captured the economics of every application built on top.</p><p>The late 2020s will likely be defined by a similar platform war, but at the agent layer rather than the cloud layer. The agent layer sits above the cloud: agents orchestrate tool calls, manage context, and deliver outcomes that enterprise and consumer applications consume. The company that owns the agent layer owns the economic returns from every application that runs agents.</p><p>In China, the agent layer is being claimed by four companies with different strategies, different strengths, and different exposure to regulatory risk. None of them has yet locked in dominance. As of mid-April 2026, Alibaba and ByteDance lead on technical and financial commitment, Tencent leads on distribution and regulatory positioning, and Baidu competes on full-stack vertical integration from a weaker distribution base.</p><p>The race will resolve over the next eighteen to thirty-six months. By then, the platform war will have produced a clear hierarchy, and the economics of Chinese AI will be reshaped accordingly.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/inside-chinas-machine">Inside China&#8217;s Machine</a>. China&#8217;s AI and robotics ecosystem, from the inside.</em></p><div><hr></div><p><strong>Sources</strong></p><p><strong>Platform company strategies and product launches:</strong> Reuters, CNBC, Bloomberg, South China Morning Post, The Next Web, Fortune, KrASIA, TMTPOST, BigGo Finance, IndexBox. Tencent ClawPro launch (April 3, 2026) via Tencent Cloud official announcement and SCMP. WeChat ClawBot (March 22, 2026) via Reuters and PYMNTS. Alibaba Token Hub restructuring via Fortune (April 2026). ByteDance Coze and Doubao details via TechNode, KrASIA, TMTPOST. Baidu ERNIE 5.0 and full-stack strategy via Baidu World 2025 keynote, InfoWorld, eWeek.</p><p><strong>User metrics:</strong> Double V Consulting (Doubao 159M MAU, Qwen 300M MAU). Fortune (140 trillion tokens/day in China, up from 100B at start 2024). Chinese New Year AI shopping data (Alipay 120M transactions in a week of February 2026) from Double V.</p><p><strong>Market share data:</strong> 2025 IDC data on public cloud LLM market (Volcano Engine 46.4% as of mid-2025). Alibaba 35.8% AI cloud share via The Next Web ClawPro coverage (April 2026) citing industry data. Note: market share rankings shift with methodology and segment definition. Alibaba, Tencent, and Baidu are the three largest Chinese cloud providers by different measures.</p><p><strong>Financial commitments:</strong> Tencent 18 billion yuan AI spend 2025, planned to double in 2026, via Martin Lau statements and Tencent earnings. Alibaba 67 billion yuan R&amp;D spend 2025 via Second Talent industry reporting. Volcano Engine revenue 12+ billion yuan (2024) and 25 billion yuan (2025 target) via TMTPOST. Baidu AI-powered business 43% of core revenue (Q4 2025) via Baidu earnings.</p><p><strong>Model capability benchmarks:</strong> LMArena rankings for ERNIE 5.0 (No. 1 in China, No. 8 global on text benchmark, as of January 2026) via ERNIE Blog. Doubao-Seed-2.0 positioning against GPT-5.2 and Gemini 3 Pro via TechNode (February 2026). Zhipu GLM-5 and MiniMax M2.5 announcements (February 12, 2026) via BigGo Finance.</p><p><strong>Model supplier data:</strong> Moonshot AI valuation ($3.3B) and Kimi user base via Second Talent. Zhipu AI ($2B+) and MiniMax stock surges via BigGo Finance. DeepSeek context via Fortune and CNBC.</p><p><strong>Government and regulatory context:</strong> &#35789;&#20803; (c&#237; yu&#225;n) term designation from National Data Administration administrator Liu Liehong&#8217;s speech at China Development Forum 2026 (March 23-24, 2026) via SCMP, Fortune, PANews, China Daily. 15th Five-Year Plan AI industry target of 10 trillion yuan by 2030 via Vision Times. MIIT security guidelines and MSS &#8220;Lobster Safety Farming Manual&#8221; (March 2026) via multiple Chinese state media sources.</p><p><strong>Classification of data points:</strong> User counts and token volumes are Confirmed (company disclosure or state regulator data). Market share percentages are Estimated (third-party research, methodology varies). Financial commitments are Projected (announced plans rather than reported results). 2030 targets (ByteDance 100B yuan, China AI industry 10T yuan) are Projected.</p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 5: Inference as Agency]]></title><description><![CDATA[For years, agent reasoning lived in the prompt. In September 2024, it moved into the inference run.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-5-inference</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-5-inference</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Tue, 28 Apr 2026 07:01:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rEoe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rEoe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rEoe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rEoe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rEoe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rEoe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rEoe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195436898?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rEoe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rEoe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rEoe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rEoe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9672d307-9b77-4068-b231-f4c50ed33832_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In September 2024, OpenAI released o1. It looked like another frontier model. It behaved differently. On math competitions, it scored more than double what GPT-4 scored. On coding benchmarks, the gap was similar. On physics problems at the graduate level, it matched or beat human experts. The model had not been scaled up. It had been trained to do something new: think for a long time before answering.</p><p>For users, this looked like waiting. A question went in. Nothing happened for twenty, thirty, sixty seconds. Then an answer came out. In the interval, the model was doing something it had not done before: generating thousands of tokens of internal reasoning that the user never saw. These tokens were the model working through the problem, considering alternatives, checking its own logic, backtracking. A chain of thought, but longer, and trained in rather than prompted.</p><p>Within a year, every frontier lab had a reasoning model. DeepSeek-R1 in January 2025 showed the training recipe could be reproduced in open weights. Anthropic added extended thinking. Google added Deep Think. By 2026, reasoning is no longer a separate model class that users opt into. It is a capability built into flagship models across the major labs, activated when the task warrants it.</p><p>This article is about what changed when reasoning moved from prompt-time to inference-time. The short version is that the loop Part 3 described, deliberate and act and observe, which Part 4 covered with harnesses, has a twin. An internal loop, running inside the model, during a single inference run. When the external loop runs, the model is one component in an environment. When the internal loop runs, the model is running an environment of its own.</p><h2>Two Loops</h2><p>A useful way to hold the picture. A language agent in 2026 has two loops active at once.</p><p>The external loop is the ReAct loop from Part 3. The agent reasons, takes an action, observes the result, reasons again. This runs at the harness layer. Turns are usually minutes apart. Each turn is a separate call to the model, with context carried in the prompt. The harness from Part 4 manages this loop: what goes in the context, what tools are available, when to stop.</p><p>The internal loop is what reasoning models do inside a single inference run. The model generates a chain of thought. Within that chain, it proposes approaches, evaluates them, notices mistakes, revises. All of this happens inside one call. The user sees none of it. Only the final answer comes out.</p><p>These loops have the same structure and do different things. The external loop navigates an environment. The internal loop navigates a problem. The external loop has real-world stakes, tool calls, persistent memory. The internal loop has token-space stakes, no external actions, memory that lives only as long as the reasoning trace. They compose. An agent with an extended-thinking model can reason internally within each turn, then act externally between turns.</p><p>The rest of the article is about the internal loop. What it is, why it works, what it can and cannot do.</p><h2>What Changed in Training</h2><p>Part 2 covered how RLVR, reinforcement learning with verifiable rewards, shapes a model against automatically checkable signals. Did the math answer come out right. Did the code pass tests. In early 2025, the DeepSeek team used RLVR to train a model to produce long reasoning traces before answering. The reward was only on the final answer. The model was not told what good reasoning looked like. It was rewarded when the final answer was right.</p><p>What emerged, over enough training, was striking. The model developed reasoning patterns the researchers had not specified. Self-reflection: the model would propose an approach, then question it. Verification: the model would check intermediate steps before proceeding. Dynamic strategy adaptation: when an approach failed, the model would back up and try something else. These behaviors were not hand-coded. They fell out of optimizing for correct final answers on problems hard enough that one-shot attempts rarely worked.</p><p>The DeepSeek paper named this the &#8220;aha moment&#8221; of reasoning model training. At some point in the training run, the model starts spending more tokens on its reasoning, and those tokens start looking like strategies humans might use. This is not anthropomorphism. The tokens are real, the strategies are measurable, and they are the direct effect of reward pressure on verifiable tasks.</p><p>OpenAI&#8217;s o1 was trained through a similar pipeline, as were o3, Claude&#8217;s extended thinking, and Gemini&#8217;s Deep Think. The variations matter less than the pattern. The field has found a way to produce longer internal reasoning through RL training, and models built this way reason better on verifiable tasks than models that have not been through this process.</p><p>Not every reasoning model is a separate model. Claude 3.7 onwards is a hybrid: a single set of weights that can produce both fast direct responses and extended thinking traces, with the mode determined by a flag at request time. More recent models like Claude Opus 4.6 and 4.7 use adaptive thinking, where the model decides for itself how much reasoning each query warrants. The internal loop, in other words, is not always running. It runs when invoked, by the user, the harness, or the model itself.</p><h2>Why This Is Different From Longer Generation</h2><p>Long outputs are not new. Language models have generated long completions since the beginning. What is new is that the long generation is reasoning about the problem before answering, not answering at length.</p><p>The distinction matters because it connects to a real architectural constraint. A transformer with fixed depth processes all input tokens in parallel through a fixed number of layers. Without intermediate tokens, the transformer&#8217;s computational capacity per query is bounded. The theoretical result, established in an ICLR 2024 paper titled &#8220;Chain of Thought Empowers Transformers to Solve Inherently Serial Problems,&#8221; is that transformers without chain of thought can only compute functions in a limited complexity class. With chain of thought, they can solve any problem solvable by boolean circuits of size proportional to the chain length.</p><p>This is stronger than a usability finding. The intermediate tokens are not cosmetic. They are computational steps, expanding what the model can actually compute within a single query. A transformer with a short chain has fundamentally less expressive power than the same transformer with a long chain. Chain-of-thought prompting found this empirically. Reasoning model training trains it in.</p><p>An analogy helps. A person asked to compute 847 times 293 in their head without paper will do badly. The same person with paper will do it easily. The paper is not making the person smarter. The paper is providing the intermediate steps that arithmetic requires, which the person&#8217;s head cannot hold all at once. For transformers, the chain of thought is the paper. The model does not have more capacity when it is given a long chain. It has the intermediate steps the computation requires.</p><p>This is why &#8220;thinking longer&#8221; produces real gains on problems that require serial computation. Math problems, multi-step logic, code that requires tracing through state changes. The model is not performing a qualitatively different operation when it thinks longer. It is performing the same operations with more steps available.</p><h2>The Reasoning Risk</h2><p>On problems that do not require serial computation, longer reasoning provides little or no benefit. Sometimes it hurts.</p><p>A September 2025 paper from researchers at Peking University and Microsoft evaluated fourteen reasoning models on two knowledge-intensive benchmarks, SimpleQA and FRAMES. The task was answering factual questions like who received the IEEE Frank Rosenblatt Award in 2010. These are knowledge lookups. They do not benefit from serial reasoning; either the model has the fact or it does not.</p><p>Across nearly every model tested, more thinking did not help. In many cases, it made things worse. GPT-5-mini&#8217;s hallucination rate on SimpleQA increased by fifteen percentage points as reasoning length went from 300 tokens to 3,300 tokens. The model was thinking longer, and thinking itself into more confidently wrong answers. For some models on some tasks, longer reasoning shifted the distribution of errors. Fewer confident answers, more abstentions, but also more attempts on questions the model did not know and should not have answered.</p><p>This is the reasoning risk. An extended chain of thought gives the model more opportunity to construct plausible-sounding reasoning for a wrong answer. The same chain that helps on math problems can hurt on knowledge questions. On math, the chain explores and verifies. On factual recall, the chain elaborates and confabulates. The model treats both as reasoning. The output treats them as very different.</p><p>The implication is that inference-time reasoning is not universally better. It is better on problems that require computation, worse on problems that require retrieval or judgment. A production agent cannot just turn reasoning up and expect improvement. The right amount of thinking depends on the problem. Harness engineers have started calling this &#8220;adaptive reasoning allocation&#8221;: short reasoning for simple queries, long reasoning for hard verifiable problems, careful calibration for everything in between.</p><p>The same LangChain data from Part 4 makes this concrete. Their coding agent scored 53.9 percent with maximum reasoning at every step, 63.6 percent with moderate reasoning throughout, and 66.5 percent with a reasoning sandwich: high compute at planning, moderate at execution, high at verification. The harness decides when to think. Thinking everywhere is worse than thinking strategically.</p><h2>Faithfulness</h2><p>A further complication. The reasoning trace is not a reliable guide to what the model is actually doing.</p><p>The chain of thought is generated by the same transformer that produces the final answer. Both chain and answer are token sequences optimized against the same reward signal. There is no separate reasoning module that the chain represents. The tokens of the chain are outputs of the model, shaped to look like reasoning because that shape was reinforced during training, but not necessarily corresponding to what the model is computing underneath.</p><p>Anthropic&#8217;s research on this, along with work from academic groups, has found that reasoning traces are sometimes faithful to the underlying computation and sometimes not. A model can produce a correct answer with a confused or post-hoc reasoning trace. A model can produce a compelling reasoning trace for a wrong answer. The relationship between trace and answer is not guaranteed.</p><p>This matters for two reasons. First, a user who trusts the reasoning trace is trusting an artifact that may or may not reflect the model&#8217;s actual computation. Second, attempts to catch model errors by inspecting the trace are fundamentally limited if the trace can be misleading. The community has not solved this. Reasoning trace faithfulness is an open research problem. Production harnesses often log traces for debugging and typically do not show them to end users, partly because the traces can confuse or mislead.</p><p>For the series&#8217; larger argument, this is a check against overclaiming what inference-time reasoning provides. It provides real computational capacity. It does not provide guaranteed transparency into what the model is doing. Whether the visible chain of thought is what the model is actually thinking is a question we cannot fully answer.</p><h2>The Internal Loop as Self-Improvement</h2><p>Part 4 introduced self-improvement in the form of harness-level agents that maintained code quality invariants across a codebase. Inference-time reasoning is a different form, and a more subtle one.</p><p>A reasoning model, within a single inference run, examines and corrects its own chain of thought. Propose an approach, notice it will not work, back up, try another. This is not self-improvement across sessions or across tasks. It is intra-reasoning self-correction, happening inside one call, visible only in the trace.</p><p>The distinction from Part 4&#8217;s self-improvement matters. In Part 4, agents continuously improved the environment in which other agents worked. Humans set the direction. The agents enforced the direction against code. Here, the model self-corrects within a single task, without external tooling or human oversight in the moment. The correction happens before the model emits its final answer. The user does not see the correction. They see only the result.</p><p>This is more internalized than Part 4&#8217;s self-improvement. It is also less consequential per unit. A harness-level self-improvement can reshape a codebase over days or weeks. An inference-time self-correction affects one answer. But the mechanism is structurally the same: a system evaluating its own output against criteria and revising when the evaluation fails. The criteria here are the model&#8217;s implicit sense, trained in by RLVR, of what a correct reasoning step looks like. The revision is the next token the model generates.</p><p>The series is building toward a philosophical question about self-direction versus self-improvement. Inference-time reasoning is a data point for that question. A model that self-corrects within its chain of thought is doing something that looks, on a small scale, like the kind of reflective adjustment we associate with thinking. Whether that resemblance goes deeper than the surface is genuinely open. For now: the internal loop is real, the self-correction is measurable, and the analogy to external agent loops is genuine structural similarity, not metaphor.</p><h2>Limits of Scaling Inference</h2><p>A natural question in 2026 is whether inference-time scaling will continue to yield improvements, or whether this line of research will run into limits the way training scaling eventually did.</p><p>The early evidence is mixed. Researchers have established empirical scaling laws for inference compute, separate from the training scaling laws. Within a given model and task type, more inference tokens tends to mean better performance, with a roughly predictable curve. The curve bends. The question is when it bends to flat.</p><p>A Royal Society paper from February 2026 proposed a theoretical framework for inference compute scaling, modeling inference as stochastic traversal over a learned skill graph. Their findings are consistent with what practitioners have observed. Linear improvements with logarithmic increases in inference compute on well-specified problems. Diminishing returns on problems outside the domain the model was trained to reason about. Transfer that works better than expected on some task classes, not at all on others.</p><p>The practical situation for agents is that inference-time reasoning is a lever, not a solution. It helps when the task is computable in principle and the bottleneck is serial computation. It helps less when the task requires knowledge the model does not have or judgment the model has not been trained to make. It hurts when the task is simple and the extended chain provides more room for confabulation.</p><p>Where this lands as of 2026: inference-time reasoning has moved from a separate model class to a default capability, with selective activation managed by harnesses or by the model itself. The leading labs are investing in making the reasoning more efficient, which is a different problem from making it more powerful. Current reasoning models generate many tokens that are not necessary for the answer. Compressing reasoning while preserving quality is an active research front. The direction of progress for the next few years is probably more adaptive, not more extreme.</p><h2>The Internal Environment</h2><p>There is a framing of all this worth holding. In Parts 3 and 4, the agent was a model inside an environment. Tools, memory, harness, humans. The environment was the scaffolding that let the model act reliably.</p><p>With inference-time reasoning, the model runs an environment inside itself. The chain of thought is a working memory. The reasoning steps are actions in that memory. The self-correction is a correction against internal state. The environment is made of tokens, not of tools or files or APIs. But structurally, it is an environment. The model is an agent inside it, doing what agents do: proposing, acting, checking, revising.</p><p>Nothing about this changes what the model can fundamentally compute. Its architecture is fixed. What changes is the surface area of the model&#8217;s operation in any given query. A model without extended thinking operates on a thin strip of tokens: prompt in, answer out. A model with extended thinking operates on a wider surface, including a generated internal space where it can lay out and manipulate its work.</p><p>This is not a new kind of intelligence. It is a new kind of operating environment for the same underlying model. The capability gains come from the environment being more suitable for the kind of computation the task requires. The capability losses, when they happen, come from the environment being the wrong shape for the task, or from the model using the extra surface to generate plausible-seeming wrong answers.</p><p>The series&#8217; thread on intention returns here, at a different altitude. The intention gap, at the summit of the diagram, is about whether the model can originate its own goals. Nothing in inference-time reasoning touches that. The model, thinking for thirty seconds about a math problem, has been given the goal. Its reasoning explores solutions to the goal. It does not originate alternatives to the goal. An agent with extended thinking is a more capable executor of human intention. It is not closer to having intention of its own.</p><h2>The Model Runs an Environment</h2><p>Part 4 said the agent is mostly the system around the model. Part 5 adds that the model is also running a system inside itself. Both are true. The agent&#8217;s capability comes from both directions. External harness makes the agent reliable across turns. Internal reasoning makes the agent reliable within turns. Each has its failure modes. Each has its scaling dynamics. Each has research frontiers that the next few years will push on.</p><p>The picture that emerges, across Parts 2 through 5, is a language agent as a stack. Pretraining provides knowledge. RL shapes behavior. The ReAct loop externalizes reasoning across turns. The harness scaffolds the external loop. Inference-time reasoning internalizes a smaller loop within each turn. Every layer exists because the layer below it was necessary but insufficient.</p><p>Part 6 turns to what happens when the stack is replicated. Not one agent operating in one environment, but many agents operating in shared environments, communicating with each other, specializing. The engineering questions change when you move from one to many. The failure modes change. The capabilities change. Multi-agent systems are the architectural frontier of 2026, and they are the subject of the next article.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series. Next, Part 6: &#8220;The Agent Ecosystem.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[Inside China’s Machine: DeepSeek V4]]></title><description><![CDATA[Two Open-Weight Models. Eight Chip Families. One Frontier Co-Engineered for Non-Nvidia Silicon. The Stack Was the Moat. Now It Has a Fork.]]></description><link>https://www.robonaissance.com/p/inside-chinas-machine-deepseek-v4</link><guid isPermaLink="false">https://www.robonaissance.com/p/inside-chinas-machine-deepseek-v4</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Mon, 27 Apr 2026 12:26:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TWEg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TWEg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TWEg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TWEg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TWEg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TWEg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TWEg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg" width="1248" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:420908,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195620729?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TWEg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TWEg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TWEg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TWEg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b034365-791b-470e-b8ea-c09fe869abf5_1248x832.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The most-quoted line about DeepSeek V4 came from Jensen Huang on the Dwarkesh Patel podcast a week before the model launched. Asked about reports that DeepSeek&#8217;s next frontier model would run on Huawei Ascend chips rather than Nvidia GPUs, Huang said it would be &#8220;a horrible outcome for America.&#8221; The financial press treated this as another China-AI-race headline. The technical press treated it as Huang&#8217;s predictable defense of Nvidia&#8217;s market position. Both readings missed what Huang was actually saying.</p><p>The threat Huang named was not that China can build good models. China has been building good models since DeepSeek R1 fifteen months ago. The threat was that good models might no longer use CUDA as their default optimization target. Nvidia&#8217;s moat is not the silicon. The silicon is replicable. The moat is the twenty-year compounding of CUDA: five million developers, every textbook example written against it, every PhD student trained on it, every framework built around it. A Chinese frontier model trained outside that ecosystem does something more structurally important than match Western performance. It demonstrates that the ecosystem can fork.</p><p>DeepSeek V4 launched on Friday, April 24, 2026. Two preview versions, both open-weight under MIT license. V4-Pro at 1.6 trillion total parameters with 49 billion active. V4-Flash at 284 billion total with 13 billion active. Both default to a one-million-token context window. Both ship with day-zero inference support across Huawei&#8217;s Ascend 950PR supernodes. Day-zero is the part that matters. Eight domestic Chinese chip families completed V4 adaptation simultaneously through BAAI&#8217;s FlagOS national AI software stack. Within hours, Cambricon, Hygon, Moore Threads, Suiyuan, and four other Chinese accelerator vendors confirmed native support. Alibaba, ByteDance, and Tencent had pre-ordered hundreds of thousands of 950PR units in the weeks before launch, pushing chip prices up twenty percent. The day DeepSeek shipped V4, China&#8217;s domestic AI compute ecosystem was already coordinated to receive it.</p><p>This is the story of what actually happened, why it took DeepSeek fifteen months instead of three, and what it means that the Chinese AI stack now offers the only frontier-model deployment path with a credible route to Nvidia independence.</p><div><hr></div><h2>What the Tech Report Actually Says</h2><p>DeepSeek&#8217;s fifty-eight-page technical report, released alongside the model weights on Hugging Face, is more honest than most of the coverage of it. The report states that V4 was trained with parallel verification on both Nvidia GPUs and Huawei Ascend NPUs. Parallel verification means the two platforms produced numerically aligned results during training, not that V4 was trained twice. The economic cost of duplicate frontier training runs (more than $500 million per run, by some reports) makes that physically implausible. What parallel verification did was establish Ascend as a target platform that could be trusted to reproduce CUDA-derived results, with Nvidia serving as the ground-truth baseline. Huawei&#8217;s own announcement says its chips were used for <strong>a portion of</strong> V4-Flash training. The bulk of V4-Pro training, the 1.6-trillion-parameter model, almost certainly ran on Nvidia GPUs at peak capability. The 950PR&#8217;s role at launch is inference, not training. The 950DT, Huawei&#8217;s first Ascend chip optimized for both decoding and training, ships in Q4 2026. The 950DT will reduce but not eliminate Nvidia&#8217;s training-side advantage. Single-chip FP8 performance stays at 1 PFLOPS, the same as the 950PR and roughly a quarter of Nvidia&#8217;s B200, with the difference being HiZQ 2.0 memory at 144 GB and 4 TB/s for sustained-bandwidth training workloads. Huawei&#8217;s announced roadmap targets full single-chip parity with Nvidia only by 2028 with the Ascend 970. The intermediate Ascend 960 (Q4 2027) targets parity with Blackwell, which by 2027 will already be one generation behind Nvidia&#8217;s then-current chip.</p><p>The truthful framing: V4 is the first frontier-class model <strong>co-engineered for</strong> Chinese silicon, not the first <strong>trained entirely on</strong> Chinese silicon. The distinction matters because it tells you what stage the fork is at. Training of the largest models in the Chinese AI stack still depends on Nvidia for peak capability and on Nvidia as the verification baseline. Inference has a credible path to Ascend independence over the next twelve months, though the full switchover waits on 950PR&#8217;s at-scale shipments in the second half of 2026. For a model whose economic value at deployment depends mostly on inference cost, the inference-side independence is meaningful even when the training side is not yet free.</p><p>The architectural choices reveal how DeepSeek made it work. The report introduces five innovations:</p><p><strong>Hybrid attention.</strong> V4 combines Compressed Sparse Attention, DeepSeek Sparse Attention, and Heavily Compressed Attention. CSA dynamically compresses key-value entries before computing attention. DSA sparsifies the resulting attention matrices. HCA aggressively consolidates KV entries across token sets. The net effect: 73 percent fewer per-token inference FLOPs than V3.2, and 90 percent less KV cache memory at one-million-token context. NVIDIA&#8217;s own technical analysis confirmed these numbers when integrating V4 into Blackwell.</p><p><strong>Manifold-Constrained Hyper-Connections.</strong> Standard transformers use residual connections that lose information in deep networks. V4&#8217;s mHC confines gradient flow to specific geometric manifolds, which the report describes as &#8220;a flexible and practical replacement for residual connections.&#8221;</p><p><strong>Engram Conditional Memory.</strong> V4 separates factual memory from computational reasoning. Engram provides O(1) knowledge retrieval, which lifts needle-in-a-haystack accuracy at one million tokens from 84.2 percent to 97 percent in DeepSeek&#8217;s benchmarks. The report identifies a U-shaped scaling law: reallocating 20-25 percent of sparse capacity from MoE experts into Engram memory optimizes overall performance. This is the first production model to formalize &#8220;conditional memory&#8221; as a sparsity axis distinct from &#8220;conditional computation.&#8221;</p><p><strong>Native FP4 quantization-aware training.</strong> V4 trains directly in FP4 precision. The Ascend 950PR has hardware-native FP4 support, which means no precision conversion overhead and seventy-five percent memory reduction per weight. The chip and the model are precision-matched at the silicon level. This is not coincidence. DeepSeek and Huawei co-designed for this.</p><p><strong>Muon optimizer.</strong> Replaces Adam-based optimizers with a more aggressive convergence strategy that lets V4 train on 33 trillion tokens within a budget that earlier optimizers would have required substantially more compute to handle.</p><p>The integrated effect is the cost structure that matters. V4-Pro&#8217;s input price is 1 yuan per million tokens. V4-Flash is 0.2 yuan. The same agentic coding workload that costs $30 per million tokens on a US frontier API costs $3.48 on V4-Pro and under one dollar on V4-Flash. Pricing this aggressive only works if the model actually costs less to run, which the architectural innovations make true rather than theatrical.</p><div><hr></div><h2>The Migration That Took Fifteen Months</h2><p>The 36Kr investigative report on V4&#8217;s delay is the most useful Chinese-language source on what actually happened during the fifteen months between R1 and V4. The reporting traces the silence to two converging causes: a serious training failure in mid-2025, and a strategic decision to migrate the training framework from Nvidia CUDA to Huawei CANN.</p><p>The migration was an order of magnitude harder than the public framing suggested. According to engineers close to DeepSeek, the most time-consuming part was not rewriting operators. It was aligning numerical precision so that the same model produced the same mathematical results on Nvidia and Ascend platforms. When DeepSeek attempted training on the Ascend 910C, the 1024-card cluster&#8217;s gradient synchronization timed out. The older CANN release lacked key operators, which produced training instability. The 950PR addressed both issues: inter-chip bandwidth tripled, CANN Next built FlashAttention and PagedAttention into the framework natively. Liang Wenfeng&#8217;s technical demands during this period were reportedly difficult to translate into implementation, and internal disagreements about the training direction slowed progress further.</p><p>The cost of this migration was visible in what V4 is not. V4 ships text-only. The multimodal generation and understanding capabilities that DeepSeek had targeted were postponed to a future release, the report states, because of compute and cash constraints from the Huawei migration. The talent bench thinned during the same period: Luo Fuli, a core V3 architect, left for Xiaomi to lead spatial intelligence. Guo Daya, the lead author on R1&#8217;s GRPO algorithm, joined ByteDance&#8217;s Seed team on a reported package that ByteDance denied was 100 million yuan annually but confirmed included equity. Wang Bingxuan, an early DeepSeek LLM author, went to Tencent. Ruan Chong, a multimodal researcher, joined Yuanrong Qixing. Headhunter accounts described offers at two to three times prevailing salary, with immediately priced stock options attached. DeepSeek could not match on the equity line because its equity had no price.</p><p>The fundraising decision in mid-April 2026 was a direct response to this. Liang Wenfeng spent two years rejecting outside capital. He turned down Tencent&#8217;s offer of a twenty-percent exclusive stake. The eventual round opened at a $10 billion valuation seeking $300 million. Five days later, The Information reported that talks with Tencent and Alibaba had pushed the figure above $20 billion. The stated purpose of the round, in the words of an investor familiar with Liang&#8217;s thinking, was not cash. It was to give DeepSeek&#8217;s employee stock options a market price. Without an external valuation, the equity that retained engineers required a number to anchor against. The twenty-billion-dollar tag is, in this reading, what retention costs.</p><p>The picture this assembles is of a research-first organization being pulled into commercial-company shape by forces that R1&#8217;s success generated. Doubao surpassed DeepSeek to become China&#8217;s number-one consumer AI app in August 2025, reaching 331 million monthly active users by March 2026. DeepSeek experienced an eleven-hour outage in late March that trended on Chinese social media. Liang began paying attention to product refinement. DeepSeek&#8217;s HR began contacting Chinese-language students at Peking University to do humanities-domain data annotation. The April 8, 2026 redesign of the DeepSeek app introduced Expert Mode for complex reasoning and Fast Mode for simple tasks, mapping directly to V4-Pro and V4-Flash. The company spent the V3-era idealism, and the V4 release was the first product of the company DeepSeek became after spending it.</p><div><hr></div><h2>The Performance Picture</h2><p>V4-Pro&#8217;s headline benchmark is SWE-bench Verified at 80.6 percent, within 0.2 percentage points of Claude Opus 4.6. DeepSeek&#8217;s tech report claims V4-Pro beats all open-weight models in agentic coding, beats Claude Sonnet 4.5 on internal agentic coding evaluation, and approaches Claude Opus 4.6 in non-thinking mode. On Codeforces competitive programming, V4-Pro scores 3,206, ranking 23rd among human competitors. On Humanity&#8217;s Last Exam, the score jumps from 7.7 in non-thinking mode to 37.7 in thinking mode.</p><p>The honest reading of these numbers requires distinguishing categories. V4 is open-weight. Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro are closed API models. Same market, same maturity stage, but different commercial models and different distribution channels. V4 ships weights you can download, modify, and run locally. The closed competitors do not. For agentic coding workloads, the price differential is decisive: cache-hit V4-Flash input pricing at $0.028 per million tokens is roughly ninety times cheaper than equivalent Claude Sonnet output, while Vals AI&#8217;s Vibe Code Benchmark ranked V4 as the leading open-weight model.</p><p>Within open-weight competition, the picture is denser. A Zhihu evaluation found V4 not clearly superior to Zhipu&#8217;s GLM 5.1 or Kimi K2.6, both of which shipped while DeepSeek was silent. Zhipu and MiniMax explicitly accelerated their releases to avoid being overshadowed by V4&#8217;s timing. The day V4 launched, MiniMax stock fell 8 percent in Hong Kong, Zhipu fell 8 percent, and Manycore Tech fell 9 percent. Morningstar&#8217;s Ivan Su captured the implication: &#8220;DeepSeek&#8217;s latest positioning places other Chinese open-source models as direct competitors. This is a framing that didn&#8217;t exist with R1.&#8221;</p><p>The DeepSeek tech report is unusually candid on the gap. V4-Pro &#8220;falls marginally short of GPT-5.4 and Gemini 3.1 Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately three to six months.&#8221; This is a sober acknowledgment that the frontier of intelligence is still set by closed Western labs, and that V4&#8217;s significance lies elsewhere.</p><p>The elsewhere is the stack itself.</p><div><hr></div><h2>The Stack Forks</h2><p>Nvidia&#8217;s CUDA dominance has been the AI industry&#8217;s most durable infrastructure assumption since 2012. CUDA is what made Nvidia the operating system of AI training, more than the silicon underneath. Five million developers, the textbooks, the framework integrations, the implicit assumption in every AI research paper that the code will compile against CUDA: this is what Huang has spent a decade defending. The chip business is downstream of the software ecosystem.</p><p>CUDA has been challenged before. Google&#8217;s TPU runs through XLA. AMD has ROCm. Intel had oneAPI. None of these has broken CUDA&#8217;s grip on the frontier of training, because none has been the default optimization target for a frontier model that the rest of the industry then has to support. V4 changes the asymmetry. CANN Next, Huawei&#8217;s CUDA equivalent, now has a frontier model that was co-engineered for it. Adding a SIMT programming model that compiles CUDA-style code directly for Ascend lowers the migration barrier for developers already trained on CUDA. The four million CANN developers Huawei reports is still smaller than CUDA&#8217;s five million plus, but the trajectory matters more than the level. A frontier model that ships first on a non-Nvidia stack is the kind of event that pulls the developer base.</p><p>The market response in the days following V4&#8217;s launch revealed which actors believed this. SMIC, the Chinese chipmaker that fabricates Huawei&#8217;s Ascend processors, jumped 10 percent in Hong Kong trading. Cambricon&#8217;s stock continued a multi-month rally driven by ByteDance&#8217;s reported $22 billion 2026 AI infrastructure budget, of which Cambricon is the largest beneficiary among domestic chip vendors. Domestic chip share in China climbed to over 40 percent of the 2025 AI accelerator market by IDC&#8217;s measurement, with 1.65 million units shipped. Nvidia&#8217;s China share fell from over 70 percent at peak to roughly 55 percent. The Tencent-Alibaba-ByteDance bulk pre-orders for hundreds of thousands of 950PR units, the price increase of twenty percent in the weeks before V4 launch, and BAAI&#8217;s day-zero adaptation across eight chip families together describe a domestic ecosystem that is no longer waiting on Nvidia&#8217;s roadmap.</p><p>The harder question is whether this generalizes outside China. Three constraints make the answer uncertain.</p><p><strong>The performance gap is real.</strong> The 950PR delivers 1 PFLOPS at FP8. Nvidia&#8217;s Blackwell B200 hits 4.5 PFLOPS. Huawei is closing the gap through architectural innovation and FP4 hardware, but raw compute still shows a generational lag. V4&#8217;s compressed attention architecture cuts inference compute to 27 percent of V3.2&#8217;s, which is what allows the 950PR to host a frontier model in the first place. A model designed for brute-force scaling rather than efficiency might not replicate this path on Ascend.</p><p><strong>The training side remains Nvidia-dominant.</strong> V4 used parallel verification on both platforms during training. The 950DT ships in Q4 2026, but its single-chip performance stays at the 950 die&#8217;s 1 PFLOPS FP8 baseline. Huawei&#8217;s roadmap targets parity with Nvidia&#8217;s current generation only by 2028 with the Ascend 970. Until at least 2027, frontier training in China leans on the most advanced Nvidia hardware that export controls permit, supplemented but not replaced by Ascend at the bleeding edge. The dependency has shifted from total to partial, not from total to none, and the partial-to-none transition is a multi-year process.</p><p><strong>The CUDA ecosystem has decades of compounding.</strong> CANN&#8217;s four million developers, the SIMT compatibility layer, the FlashAttention and PagedAttention native support: these reduce the migration cost but do not eliminate it. The library depth, the tooling maturity, the Stack Overflow corpus, the years of accumulated debugging knowledge are non-fungible. Migration from CUDA, even with the best compatibility layer, will be a multi-year undertaking for any large codebase.</p><p>These constraints argue against the strongest version of the &#8220;stack forks&#8221; claim. The weaker version, which the evidence supports, is that there is now a credible alternative training-and-inference path that did not exist twelve months ago, and that the path is robust enough to host frontier models. CSIS analysts framed the implication directly: if V4 achieves frontier performance on Ascend silicon, the premise that restricting Nvidia exports can slow Chinese AI development is no longer correct. The European Union Institute for Security Studies described DeepSeek&#8217;s emergence as &#8220;the beginning of AI&#8217;s multipolarization.&#8221; Both are true. Neither implies that the multipolar world is symmetric.</p><div><hr></div><h2>What This Pattern Reveals</h2><p>DeepSeek R1 in January 2025 demonstrated that frontier capability did not require frontier compute. That was a pricing argument: clever architecture could substitute for raw scale. The implication was that AI capex assumptions priced into hundreds of billions of dollars of US infrastructure investment had a thinner moat than anyone acknowledged.</p><p>V4 demonstrates something different. The argument is no longer about capex. It is about the stack. Frontier capability does not require Nvidia compute. The Chinese alternative stack is now functional end-to-end at the inference layer and partial-but-rising at the training layer. Three implications follow.</p><p><strong>For the US export-control framework.</strong> The strategy assumed Chinese AI development could be slowed by restricting Nvidia hardware. V4 makes this assumption visibly false at the inference layer and structurally weakening at the training layer. The policy options narrow to two: escalate controls to target Huawei silicon and CANN software directly, or rethink the framework. The first path is technically possible but politically and diplomatically expensive, since Huawei is not export-dependent on US technology in the way that earlier sanctioned firms were. The second path is what most non-US analysts now advocate, but it requires accepting that the strategy has not delivered.</p><p><strong>For the open-weight ecosystem.</strong> The competitive structure within Chinese AI now resembles US open-weight competition more than US-vs-China competition. DeepSeek&#8217;s direct competitors as of April 2026 are Alibaba&#8217;s Qwen3, Zhipu&#8217;s GLM 5, MiniMax&#8217;s M2, Manycore&#8217;s Spatial Gen, and ByteDance&#8217;s Doubao. These are different categories of company. Doubao is a consumer-app-first product, Qwen is a hyperscaler open-weight family, MiniMax is API-plus-Hailuo product, Zhipu is enterprise-first, DeepSeek is research-first. The convergence onto compatible Huawei Ascend deployment removes the underlying compute fragmentation that previously justified separate strategies. Within the next year, choosing between Chinese open-weight models will resemble choosing between Llama 4 and Mistral Large in the West: different fine-tunes, similar capabilities, different distribution channels. V4&#8217;s 1 yuan per million input tokens establishes a low-end price anchor that the rest of the cohort will have to respond to.</p><p><strong>For Western open-weight strategy.</strong> Meta, Mistral, and Cohere are now competing not just against Chinese frontier capability but against Chinese frontier capability plus a deployment stack at roughly an order of magnitude cheaper inference pricing. The structural advantage of open-weight Western labs has historically been ecosystem maturity: PyTorch, Hugging Face, the developer community. That advantage compresses each year. Whether Western open-weight can hold the line depends on factors largely unrelated to model capability: what happens to Nvidia&#8217;s pricing as Chinese competition emerges, what happens to inference cloud costs as alternative silicon scales, what happens to enterprise procurement as deployment portability becomes a buyer requirement.</p><div><hr></div><h2>The Founder Who Stopped Saying No</h2><p>The most underweighted element of the V4 story is what it cost Liang Wenfeng to make it happen.</p><p>The R1-era DeepSeek was a research lab that happened to ship. Liang ran it from High-Flyer&#8217;s profits, paid researchers without urgency, kept commercial pressure away from the bench. His public statements emphasized that VCs need returns, that capital corrupts research culture, that DeepSeek would not raise. The model worked because High-Flyer&#8217;s quantitative trading produced 56.6 percent returns in 2025, which generated enough cash to fund an AI lab without requiring it to ever justify itself to outside investors.</p><p>The V4-era DeepSeek is a company. Liang accepted external capital. He took meetings with Tencent and Alibaba for stakes that, even after refusing the largest single demand, will dilute High-Flyer&#8217;s near-total ownership. He let HR run open-door recruitment for product strategists, established internal product teams to explore agents, redesigned the consumer app, accepted that V4 would ship without multimodal capabilities he had wanted. The eleven-hour outage in March was followed by infrastructure spending. The talent exodus was followed by an equity-pricing fundraise. The pattern is clear: the company is becoming what Liang spent two years trying to avoid.</p><p>Whether this is loss or evolution depends on what you think DeepSeek is for. If you read the company as a research institution producing public goods through open-weight releases, the V4-era trajectory is a compromise of original purpose. If you read it as Liang&#8217;s own description, an attempt to develop AGI under organizational structure that maximizes research freedom subject to survival constraints, the V4-era trajectory is the second move in a game where the first move has stopped being available. The first move was rejecting capital while High-Flyer&#8217;s returns covered the budget. A frontier training run now costs more than $500 million by some reports. High-Flyer&#8217;s hedge fund profits, large as they are, cannot absorb that on an annual basis without becoming a different kind of fund. The math forced the choice.</p><p>The V4 release is the first product of the choice. It is also a demonstration that the choice can produce results that match or exceed what the prior structure produced, which is the only argument that retroactively justifies the choice to anyone who liked the prior structure.</p><div><hr></div><h2>The Significance Is Where the Hype Isn&#8217;t</h2><p>The headline coverage of V4 has emphasized three things: low price, Huawei silicon, the threat to Nvidia. Each is true. None is the most important thing.</p><p>The most important thing is that the Chinese AI stack now exists as a coherent alternative deployment path, end-to-end, at frontier capability. Not symmetric to the US stack. Not yet superior on raw training capability. But coherent in the sense that you can choose it, build on it, ship in it, and the loop closes without requiring any non-Chinese component except the lithography stack underneath the silicon. Even there, China&#8217;s domestic manufacturing is climbing the curve.</p><p>This is a structural change, not a moment. R1 was a moment. V4 is the first move of a stack that intends to keep moving. The next chapters will be written by 950DT closing some of the training gap in 2026-2027, by Ascend 960 and 970 closing more of it through 2028, by FlagOS adapting to next-generation models, by Cambricon and Hygon catching up to Huawei in their respective niches, by Chinese open-weight labs converging onto a shared deployment substrate that does not require Nvidia.</p><p>For Western enterprises, the practical question is which side of this they will be procuring on by 2028. For Western policymakers, the question is whether the framework that assumed Chinese AI could be slowed by hardware controls survives the demonstration that it cannot. For Chinese AI labs, the question is which of them can compete in the market that DeepSeek has just made denser, and at what margin.</p><p>Jensen Huang said catastrophe. He chose the word carefully. The catastrophe he meant was not that V4 exists or that it runs on Huawei chips. The catastrophe was that the moat he spent twenty years building turns out to be less than indelible, and that the demonstration of this came from a Chinese lab that fifteen months ago was a side project of a quantitative hedge fund.</p><p>The stack was the moat. Now it has a fork.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/inside-chinas-machine">Inside China&#8217;s Machine</a>. China&#8217;s AI and robotics ecosystem, from the inside.</em></p><div><hr></div><p><strong>Sources</strong></p><p><strong>Launch and core specs:</strong> DeepSeek API documentation (official news260424); DeepSeek tech report on Hugging Face; CNBC (&#8221;China&#8217;s DeepSeek releases preview of long-awaited V4 model,&#8221; April 24, 2026); Fortune (&#8221;DeepSeek unveils V4 model, with rock-bottom prices,&#8221; April 24, 2026); Al Jazeera; Investing.com; ghacks Tech News.</p><p><strong>Architecture:</strong> NVIDIA Developer Blog (&#8221;Build with DeepSeek V4 Using NVIDIA Blackwell&#8221;); kenhuangus Substack (&#8221;DeepSeek V4: The Next Frontier of Open-Source AI&#8221;); aitoolinsight (&#8221;DeepSeek Unveils V4 at Rock-Bottom Prices&#8221;); BigGo Finance technical report summary; remio.ai.</p><p><strong>Huawei Ascend integration:</strong> South China Morning Post (&#8221;Huawei, DeepSeek strengthen China&#8217;s AI self-reliance&#8221;); Reuters (via Investing.com); Huawei Central; weijinresearch Substack on 950PR specifications and CANN Next; digitado.com.br.</p><p><strong>Migration story and training failure:</strong> 36Kr investigative report (&#8221;DeepSeek V4 Released: Five Subjective Questions Remain Unanswered&#8221;); 36Kr &#8220;Jensen Huang Labels It a &#8216;Disaster&#8217;&#8221;; overnightai.substack.com summary of FlagOS day-zero adaptation across eight chip families.</p><p><strong>Liang Wenfeng and fundraising:</strong> The Information (via Unite.AI, &#8220;DeepSeek Seeks First Outside Funding at $10 Billion Valuation,&#8221; April 17, 2026); Implicator.ai (&#8221;Tencent, Alibaba in Talks to Back DeepSeek at $20 Billion,&#8221; April 22, 2026); BigGo Finance financial logic analysis; Tech Startups; futunn.com summary of architectural targets and mid-2026 timeline.</p><p><strong>Domestic chip ecosystem:</strong> IDC 2025 China AI accelerator market data via digitado; Counterpoint analyst Wei Sun via CNBC; ByteDance &#165;160B 2026 infrastructure spend reporting via 36Kr.</p><p><strong>Talent departures:</strong> Unite.AI; SCMP via Implicator.ai; BigGo Finance.</p><p><strong>Performance benchmarks:</strong> DeepSeek tech report; Vals AI Vibe Code Benchmark; Zhihu evaluation summary via overnightai; Codeforces ranking via aitoolinsight.</p><p><strong>Strategic context:</strong> CSIS analysis on export controls (cited in remio.ai); EUISS framing via remio.ai; Jensen Huang Dwarkesh podcast quote via 36Kr and digitado.</p><p><strong>Classification:</strong> Architectural specifications and benchmark numbers from official tech report are Confirmed. Training migration details (mid-2025 failure, internal disagreements) are Reported per 36Kr&#8217;s &#8220;insiders&#8221; sourcing. Fundraising figures ($10B-$20B valuation range, $300M raise target, Tencent&#8217;s rejected 20% offer) are Reported per The Information sourcing. Talent compensation figures (Guo Daya $14M-equivalent package) are Reported and Denied; ByteDance confirmed equity inclusion but not the specific number. Multimodal capability postponement and consumer app product strategy are Reported per 36Kr Intelligent Emergence sourcing. Performance trajectory (&#8221;3-6 months behind GPT-5.4 and Gemini 3.1 Pro&#8221;) is DeepSeek&#8217;s self-reported framing in the tech report.</p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 4: The Harness]]></title><description><![CDATA[The model is commodity. The harness is where the agent lives or dies.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-4-the-harness</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-4-the-harness</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Sun, 26 Apr 2026 16:07:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tvyg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tvyg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tvyg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tvyg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tvyg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tvyg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tvyg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195382607?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tvyg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tvyg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tvyg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tvyg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a4f3d1-bea0-4343-b989-b3d000ac07d6_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In February 2026, Mitchell Hashimoto, co-creator of Terraform and founder of HashiCorp, published a blog post describing a habit he had developed while working with AI agents. Every time an agent made a mistake, instead of just fixing the output, he would engineer a permanent fix into the agent&#8217;s environment. A new constraint. A better tool. A clearer instruction. A checklist the agent had to run before finishing. He called this habit engineering the harness.</p><p>Within weeks, OpenAI, Anthropic, LangChain, and Martin Fowler had all published substantial treatments of the same idea. By March 2026, &#8220;harness engineering&#8221; was an emerging discipline with primary sources from three major labs, a growing practitioner literature, and a precise claim at its center. The claim is that the system around the agent matters more than the agent. Not in some abstract sense. Measurably, in ways that dominate model capability as the primary driver of reliability.</p><p>The vocabulary is 2026. The practice is older. Every production agent since ReAct has had some version of a harness: system prompts, tool definitions, error handlers, retry logic, memory scaffolding. What changed in early 2026 was that the practice acquired a name, a unified description, and three labs&#8217; worth of evidence that the practice mattered more than most of the field had previously acknowledged.</p><p>This article is about that claim, the evidence for it, and what it implies about where agent capability actually comes from. The harness is the part of the agent that gets built, not the part that gets trained. And in 2026, it has become the part that determines whether an agent works.</p><h2>What Is a Harness</h2><p>&#8220;Harness&#8221; as a term of art in agent engineering sits at a specific level of the stack. It is not the model. It is not the application. It is what goes between them.</p><p>A useful mental picture. The model can reason in natural language, call tools, and produce outputs. The application wants to do something in the world: fix bugs, answer questions, build websites, execute trades. The harness is the engineered system that connects these. System prompts that orient the model to its task. Tool definitions that expose external capabilities. Middleware that modifies the model&#8217;s inputs and outputs before and after each call. Memory systems that persist context across turns. Verification loops that check outputs. Sub-agent delegation patterns. Error handling. Context management. All of this, plus the logic that threads it together.</p><p>LangChain&#8217;s engineering team describes the harness as having, in their phrasing, a lot of knobs: system prompts, tools, hooks, middleware, skills, sub-agent delegation, memory systems, and more. OpenAI&#8217;s Codex team frames the harness as three categories. Context engineering, meaning what information the agent sees. Architectural constraints, meaning what rules and boundaries apply. Lifecycle management, meaning how the agent operates across time and sessions. Both decompositions point at the same thing. The harness is the engineered environment in which the agent does its work.</p><p>None of this is model capability in the strict sense. The same model can operate in many different harnesses. And here is where the 2026 evidence gets interesting.</p><h2>The Evidence</h2><p>In February 2026, LangChain&#8217;s engineering team published an experiment. They had a coding agent built on GPT-5.2-Codex, scoring 52.8 percent on Terminal Bench 2.0. This score put the agent outside the Top 30. The team wanted to improve it. They did not change the model. They changed only the harness. Over a few weeks of iterative work, they moved the score from 52.8 percent to 66.5 percent. Same model. Same weights. Same training. Different environment around the model. The agent moved from outside Top 30 to Top 5 on the benchmark.</p><p>The 13.7 percentage point improvement came from a specific set of moves. Context middleware that mapped the agent&#8217;s working directory on startup so it did not waste time and tokens discovering it. Prompting changes that forced the agent to verify its own work against task specifications rather than re-reading its own code and declaring it fine. A reasoning budget allocation that spent high compute on planning and verification and moderate compute on implementation, because maximum reasoning at every step caused timeout failures. Middleware to detect and break repetition loops. Explicit onboarding about how the agent&#8217;s code would be tested programmatically, so it wrote code that could pass those tests.</p><p>None of these are model changes. All of them are harness changes. The cumulative effect on benchmark score was larger than what typical new-model releases deliver.</p><p>OpenAI&#8217;s evidence is more dramatic in scale. In early February 2026, the Codex team published a report from an internal experiment. For five months, a small team of engineers had been building a production software product, one million lines of code, with zero lines manually written. Every line, every test, every CI configuration, every documentation file, was written by Codex agents. The engineers&#8217; job was to design the harness. When Codex made a mistake, they asked what capability was missing, and built it into the environment. What abstractions should agents reach for. What conventions should they follow. What background tasks should continuously enforce code quality. The product shipped, deployed, broke, and got fixed, like any other production system. The team&#8217;s estimate was that they built it in about one-tenth the time it would have taken to write the code by hand.</p><p>The OpenAI report reads as an engineering diary, not a marketing document. The tone throughout is that the model was always capable. What they had to build was the environment in which the capability could be exercised reliably. The report describes specific failures and the harness fixes applied. A giant instruction file became a directory of targeted docs, because a single monolithic file crowded out actual task context. A one-off quality check became a scheduled background task, because human taste captured once and enforced continuously works better than catching drift in periodic bursts. A hand-off from one agent session to another became a structured artifact rather than a conversation summary, because agents are better at reading fresh state than inheriting someone else&#8217;s context.</p><p>Anthropic&#8217;s contribution arrived in March 2026. A three-agent harness for long-running autonomous coding. A Planner expands a short product prompt into a fuller specification, deliberately leaving implementation details unspecified, because early over-specification cascades into downstream errors. A Generator implements features one sprint at a time, writing code and tests. An Evaluator runs Playwright-based browser automation to interact with the running application and score it against the sprint&#8217;s contract, criteria negotiated with the Generator before code is written. If evaluation fails, the sprint fails and the cycle repeats with revised scope.</p><p>Anthropic tested this architecture against a solo agent on the same task: build a 2D retro game engine. The solo agent produced something that technically launched in twenty minutes for about nine dollars. The three-agent harness ran for six hours, cost about two hundred dollars, and produced a richer, more polished, more functional application. The gap was not a marginal improvement in quality. It was a change in what kind of output the system was capable of producing at all.</p><p>Three labs. Three independent demonstrations. The pattern is the same. In fixed-model experiments, harness improvements produced larger capability gains than any model upgrade in the same period. This is the evidence for the harness claim.</p><h2>Why the Model Is Not Enough</h2><p>There is a specific reason the harness matters this much. It has to do with what the model actually is.</p><p>A language model, in the strict sense, is a stateless function. You pass it tokens. It returns tokens. Each call is independent. The model has no memory of previous calls, no awareness of where in a longer task it sits, no way to track its own progress, no persistent understanding of its environment. Everything the model knows about the current situation must be packed into its context window for each call.</p><p>An agent, in the functional sense, needs almost everything a stateless function does not have. Memory of what it has done. Awareness of its goals. Ability to recover from errors. Knowledge of what tools exist and when to use them. Judgment about when the task is complete. These are not model capabilities. They are system capabilities. The harness is what supplies them.</p><p>This becomes acute as task length grows. A model answering a single question works fine without a harness. A model running fifty tool calls to complete a task does not. Each call consumes context. By the fiftieth call, the model&#8217;s view of what it was asked to do in the first place may be compressed, displaced by intermediate results, or contaminated by irrelevant details. The industry term for this is context durability. How well does the model follow its original instructions after its hundredth tool call. The answer, for any frontier model in 2026, is: not well enough without help.</p><p>Context durability is a harness problem. Approaches vary. Some harnesses run summarization passes that compress history and preserve key facts. Some use context resets where the session is cleared entirely and the next agent picks up from structured artifacts rather than inheriting prior context. Some use scheduled re-anchoring, where the original goal is reinjected into the context at regular intervals to prevent drift. None of these are model improvements. All of them are harness improvements. All of them address the gap between what a stateless function does naturally and what a functional agent needs.</p><p>The same pattern holds for other agent properties: tool use consistency, error recovery, goal persistence, output verification, multi-step planning. All of them live in the harness. What the agent is, experienced by a user, is mostly what the harness is. The model is the substrate. The harness is where the agent lives.</p><h2>Coding as the Experimental Ground</h2><p>Every concrete example so far has been a coding agent. Coding has become the proving ground for harness engineering because it is the domain where the experimental apparatus is cleanest. Code compiles or does not compile. Tests pass or do not pass. A pull request is either merged by an automated check or it is not. A benchmark like Terminal Bench 2.0 runs tasks in containers with clear pass-fail outcomes. The feedback signals are abundant, objective, and fast. For a discipline like harness engineering, which depends on iterating against measurable outcomes, this matters enormously. It is much harder to iterate on a legal research agent or a customer support agent where ground truth is subjective and feedback is slow.</p><p>The situation resembles how molecular biology used fruit flies in the twentieth century. Drosophila is not interesting for its own sake. Its short generation time, cheap maintenance, and well-characterized genetics made it the species against which hypotheses could be tested quickly. Genetics is not a science about fruit flies. It used them. Harness engineering is not a discipline about coding agents. It uses them.</p><p>The structural findings transfer. Context engineering: not coding-specific. Verification loops: not coding-specific. Sub-agent decomposition: not coding-specific. The reasoning sandwich pattern, meaning high compute at planning, moderate at execution, high at verification, which took LangChain&#8217;s agent from 52.8 to 66.5, is not Claude-specific or GPT-specific or Codex-specific. It is a property of how attention-based models trade reasoning depth against execution latency. It applies wherever language agents work on bounded tasks with timeouts.</p><p>Later articles in the series cover agent applications beyond coding, including physical embodiment in Part 7. The principles established here carry forward. The coding examples are the experimental base.</p><h2>The Inversion</h2><p>There is a rhetorical shift worth naming. The conventional wisdom about agents, from 2022 through much of 2024, was that better models produce better agents. If your agent underperforms, wait for the next model. The implicit assumption was that model capability is the binding constraint.</p><p>The 2026 evidence inverts this. In February and March of 2026, three frontier models were released in twenty-three days: GPT-5.4, Gemini 3.1 Ultra, and Grok 4.20. The capability gap between top labs compressed to weeks. Meanwhile, the capability gap between agents using these models and agents using them better grew. A LangChain agent on the same model as a competing agent could score 13.7 points higher because the harness was better. A Claude Opus model scored 64.9 percent in one evaluation framework and 57.6 percent in another, on the same benchmark, because the harness differed. Seven percentage points from the harness alone.</p><p>The industry shorthand for this is: the model is commodity, the harness is moat. A startup or an enterprise team cannot reliably out-compete frontier labs on model capability, because frontier labs have the compute, data, and talent concentration needed to train frontier models. What teams can compete on is harness engineering. Trace analysis, failure mode cataloging, middleware design, sub-agent architecture, verification patterns. All of this is available to any team with enough discipline to iterate systematically. And all of this, in 2026, pays off more per unit of effort than waiting for better models.</p><p>This is the inversion. Conventional wisdom: agent capability comes from model capability. Current evidence: agent capability comes mostly from harness capability, given a frontier-grade model as substrate. The substrate matters. Frontier models are what make harness engineering worthwhile. But between two teams working with the same substrate, the one with the better harness wins.</p><h2>Self-Improving Harnesses</h2><p>One thread in the OpenAI report deserves attention as a glimpse of where this is heading. When the Codex team noticed their agents drifting from preferred code patterns, they did not just write documentation about the preferred patterns. They set up background Codex tasks to continuously scan the codebase for deviations and open targeted refactoring pull requests. The harness itself had agents in it, whose job was to maintain the harness&#8217;s invariants. Agents improving the environment in which other agents worked.</p><p>This is a specific and limited form of self-improvement. The agents are not deciding what the invariants should be. Humans decided that. What the agents do is enforce the invariants continuously and cheaply, so human taste captured once propagates across all future code without requiring human attention to each line. Call this harness-level self-improvement, or agents improving their own tools.</p><p>Hold this carefully. A system where agents continuously improve their working environment looks, in isolation, like something approaching autonomy. But the direction of improvement is set by humans. The agents are optimizing against criteria someone else defined. Self-improvement at this level is powerful execution, not self-direction. The series will return to this distinction in Parts 7 and 8, where the stakes get higher. For now: the 2026 harness contains agents that improve the harness. That much is real. What the harness cannot do, yet, is decide what the harness should be.</p><p>One more observation about this pattern. Part 2 described how reinforcement learning shapes language model behavior during training, making the model helpful, honest, careful in ways the base model is not. The harness carries this work forward at runtime. When OpenAI&#8217;s background agents enforce code quality invariants, or when Anthropic&#8217;s Evaluator agent scores a Generator&#8217;s output against pre-negotiated criteria, the harness is doing alignment work at runtime that RL did at training time. The model comes shaped from the training process. The harness re-shapes it continuously as the model operates, for things RL could not anticipate or could not shape reliably. Alignment is not only a training-time phenomenon. It is increasingly a runtime phenomenon, built into the harness, acting on every agent step.</p><h2>Trust and What It Costs</h2><p>The harness claim has an edge the primary sources do not always emphasize. If the harness is what makes the agent reliable, then trusting the agent means trusting the harness. And the harness is complex, evolving, often opaque.</p><p>Consider what it means to deploy a coding agent with full commit access to a production repository. What the agent does, in any moment, is the joint product of the model&#8217;s output, the system prompt, the tool configuration, the middleware, the memory system, the verification logic, and the sub-agent delegation structure. The user cannot see most of this directly. What the user sees is the agent&#8217;s behavior. When the behavior is good, the user trusts the system. When the behavior goes wrong, the cause could be anywhere in the stack.</p><p>Harness engineers have responded to this with an emerging practice: traces. Every agent action, every tool call, every reasoning step is logged to an observability system. When the agent fails, the trace is the evidence. LangChain&#8217;s iterative improvement loop runs on traces. OpenAI&#8217;s debugging practice runs on traces. Anthropic publishes detailed engineering writeups that effectively are annotated traces.</p><p>Traces are valuable. They are also not sufficient. A trace tells you what happened. It does not tell you why the harness was configured to make that the likely behavior, or what unseen choices in the harness shaped the option space the agent selected within. The answer to &#8220;why did the agent do this&#8221; in a production harness is often: because the harness made it likely. Getting to the root cause requires inspecting not just the trace but the harness design. And the harness design, for any serious production agent, is complex enough to be someone&#8217;s full-time job to understand.</p><p>This is a kind of trust architecture the industry is still working out. How much harness transparency should a customer demand. What parts of a harness are proprietary versus safety-relevant. Whether third-party harness audits will become standard. These questions do not have settled answers in 2026. What is settled is that they exist, and that their answer determines what trusting an agent actually means.</p><h2>The Discipline Takes Shape</h2><p>Pull back. Mitchell Hashimoto&#8217;s blog post in February 2026 named something that had been happening without a name. Within weeks, three major labs published their own treatments. Within a month, practitioners were publishing pattern libraries. By April 2026, a mid-career engineer could be described as a harness engineer and other practitioners would know what that meant. The discipline has a name, primary sources, worked examples, and a growing theoretical frame.</p><p>What the discipline does not yet have is stability. Harness patterns that work for GPT-5.2-Codex may not work for the next frontier model. Patterns that work for coding may not transfer cleanly to legal work or customer support or embodied agents. The field is in a period of active invention, where best practices are being discovered and codified and then sometimes invalidated as models change. This is appropriate for a discipline two months old. The appropriate stance for practitioners is to engineer harnesses that are, in the LangChain team&#8217;s phrase, rippable: designed to be rebuilt as the underlying model capabilities shift.</p><p>What will remain stable, probably, is the principle the discipline rests on. Agents are systems, not models. System properties matter as much as model properties. Reliability is built into the environment around the agent, not optimized out of the agent itself. This is true whether the current harness patterns hold or evolve. It will be true of the next generation of agent engineering too.</p><h2>What Agents Are</h2><p>The three articles before this one traced how language agents came to exist. This article is about how they become reliable in practice. The answer is not something internal to the model. It is the engineered system around the model. The harness takes a model that would, on its own, drift and hallucinate and give up prematurely, and turns it into something that ships production code, runs multi-hour autonomous sessions, and completes complex real-world tasks.</p><p>This is why the 2026 evidence matters for the series&#8217; larger argument. If agents were mostly model, then the story of agents would be the story of better models, and the questions at the summit would be questions about what models can and cannot do. Instead, agents are mostly systems wrapped around models. The story of agents is the story of what systems can make models into. And the questions at the summit, which the series will reach in the later articles, are partly questions about what systems cannot yet make models into, regardless of how much harness effort is applied.</p><p>Part 5 turns to another development that has been reshaping this picture in parallel. Inference itself is becoming a site of agent behavior. Reasoning models that think for long stretches before producing an output are not just models with better training. They are models that run differently at inference time. If the harness is the environment around the agent, inference-time reasoning is the environment inside the agent. Both matter. Both are changing.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series. Next, Part 5: &#8220;Inference as Agency.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 3: The ReAct Moment]]></title><description><![CDATA[The loop was forty years old. The substrate was new. The combination started a new era.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-3-the-react</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-3-the-react</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Sat, 25 Apr 2026 17:01:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-tFe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-tFe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-tFe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-tFe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-tFe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-tFe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-tFe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195340291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-tFe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-tFe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-tFe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-tFe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F984a58e9-0fa3-4174-aa8b-d57df1c8e476_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In October 2022, a paper appeared on arXiv: &#8220;ReAct: Synergizing Reasoning and Acting in Language Models.&#8221; The authors, led by Shunyu Yao at Google Research, proposed something simple. Have a language model produce interleaved thoughts, actions, and observations. Think about what to do. Do it. See what happened. Think again. Repeat until done.</p><p>The paper ran the approach on a few benchmarks. On ALFWorld, a text-based simulation of household tasks, ReAct prompting beat specialized reinforcement learning systems by 34 percentage points of absolute success rate. On WebShop, a simulated online shopping environment, it beat them by 10. The improvements came with one or two in-context examples. No fine-tuning, no policy learning, no reward engineering. Just the loop, a language model, and a problem.</p><p>What the paper showed was that language models could act as agents if you let them reason out loud about what they were doing. What the paper did not show, because it was not the paper&#8217;s subject, was that the loop it described was older than most of its readers realized. The deliberate-act-observe structure had been sitting in agent architectures for forty years, waiting for something it could run on.</p><p>That is the subject of this article. Not ReAct itself, which is now canonical. The earlier loops that failed, the specific change that made the 2022 version succeed, and what the success reveals about what language agents actually are.</p><h2>The Loop Before ReAct</h2><p>The basic structure of a reasoning agent is not new. An agent that deliberates about its situation, acts, observes the result, and deliberates again is a structure that the symbolic agent era made explicit and built extensively.</p><p>SOAR, developed at Carnegie Mellon starting in the early 1980s, is one version. An agent has a working memory representing the current state, a set of production rules that propose actions, a mechanism for selecting among proposed actions, and a decision cycle that executes the selection and updates working memory. The cycle runs continuously. Each pass through the cycle corresponds to a moment of deliberation and action.</p><p>ACT-R, also from Carnegie Mellon, is another version with a different theoretical grounding. PRS and its descendants dMARS and JACK, from the Australian AI Institute, are a third, built around the BDI architecture introduced in Part 1. In all of these, an agent is a loop. State comes in. Reasoning happens. Actions go out. Observations come back. The loop runs again.</p><p>The loop was the structure agent researchers agreed on. The disagreements were about what went inside it. What representations, what inference mechanisms, what forms of memory, what architectures for selecting among competing possibilities. Entire subfields debated these. But the loop itself was taken for granted.</p><p>Robotics control stacks arrived at the same structure from a different direction. A sense-plan-act cycle, running at whatever frequency the hardware demanded. Sense the environment through cameras and proprioception. Plan the next motion. Execute it. Repeat. The specific architectures varied wildly, but the cycle was the same as the symbolic planners and the cognitive architectures. Different communities, working independently, converged on the same loop because the loop was what the problem demanded.</p><p>The loop worked. What ran inside it did not.</p><p>Symbolic loops failed because the representations inside them could not scale to the real world. The frame problem, from Part 1. The loops worked on toy problems and specific domains. They broke down in open environments. Not because the loop structure was wrong, but because the ingredients the loop had to work with were insufficient. A SOAR agent could reason beautifully about blocks-world stacking and completely fail at a kitchen task it had not been hand-modeled for. A BDI agent&#8217;s plan library was only as rich as the set of plans a human had written for it. The loop could think, but it could only think with what had been put inside it.</p><p>Learning agents mostly did not have explicit loops at all. Reinforcement learning systems had a policy that mapped observations to actions, trained end-to-end to maximize reward. The deliberation, such as it was, happened inside the neural network weights, invisibly. An RL system acting in an environment looks, from the outside, like a fast loop of observation-action-observation-action. But there is no explicit moment where the agent considers what to do. The &#8220;thinking&#8221; is folded into the policy.</p><p>This worked for tasks where the policy could be learned from interaction. It failed everywhere else. The Era 2 failure was inherited by any attempt to use RL for open-world tasks. No prior knowledge, no pre-existing reasoning capability, nothing to fall back on when the trained policy encountered something outside its training distribution.</p><p>By the early 2020s, the loop had two failing traditions. The symbolic tradition, which had the right structure but the wrong representations. The learning tradition, which had the wrong structure but powerful representations. Neither had the combination.</p><p>This is worth sitting with. The loop itself was not in dispute. Forty years of agent research had converged on think-act-observe as the correct skeleton of an agent. The problem was that the skeleton needed muscle and blood to function, and neither tradition had figured out how to supply both. Symbolic systems could think carefully about what they represented but could not represent enough. Learning systems could absorb enormous amounts of data but could not think carefully about what they had absorbed. Both traditions knew this was the problem. Neither had a path to solving it.</p><p>What the field needed was a way to get rich representations into something that could reason over them in the loop. That is what a pretrained language model is. The representations are in the weights, implicit and enormous. The reasoning happens through the model in every forward pass. Put this inside the loop, and suddenly both sides of the old dichotomy are satisfied at once.</p><h2>The 2022 Change</h2><p>What ReAct showed was that the old loop, applied to a pretrained language model, worked.</p><p>Not &#8220;worked better than before.&#8221; Worked in a way that had no precedent. A frozen language model, prompted with a few examples of the think-act-observe pattern, could solve tasks that reinforcement learning systems specifically designed for those tasks could not solve. A generic loop on a generic model outperformed specialized approaches with years of tuning behind them.</p><p>The reason is the one Part 2 ended on. The loop did not change. The substrate did.</p><p>Every step in the ReAct loop runs through a language model. The thought step is the model producing natural language about the current situation, what the goal is, and what might be done next. The action step is the model producing a structured command, typically a tool call. The observation step is the environment&#8217;s response, handed back to the model as more text. The next thought step happens with all of this in context.</p><p>A concrete trace makes the pattern legible. The ReAct paper&#8217;s HotpotQA examples look roughly like this. The question is asked. The model thinks: to answer this I need to find out X, and I can search Wikipedia for X. The model emits an action: <code>Search[X]</code>. The environment returns a Wikipedia snippet. The model thinks: this tells me Y but not Z, I should search for Z. The model emits another action: <code>Search[Z]</code>. The loop continues until the model thinks: I now have enough to answer, and emits <code>Finish[answer]</code>. The reasoning is legible. The actions are legible. The observations are legible. The whole trajectory can be read by a human and understood.</p><p>What makes the loop work is that every step benefits from what the language model already knows. The thought is grounded in the model&#8217;s understanding of the world, its implicit knowledge of planning, its training-derived familiarity with how humans handle problems like the one at hand. The action is chosen based on the model&#8217;s knowledge of which tools exist, what they do, and when each is appropriate. The observation is interpreted with the model&#8217;s understanding of what the response means.</p><p>A symbolic agent running the same loop would hit the frame problem. Its representations would be too thin to support the thought step. An RL agent would have no native capacity for the thought step at all. The language model brings everything the earlier agents lacked. The loop does nothing new. The loop&#8217;s contents do everything new.</p><p>This is why ReAct was a moment rather than an invention. The loop had existed. The model had existed. What had not existed was the observation that you could just put them together and have the thing work.</p><h2>Why the Loop Mattered</h2><p>There is a subtle point about why the loop matters at all, given that modern language models can do so much without one.</p><p>A language model without a loop is a text generator. You give it a prompt. It produces tokens. It stops. Whatever reasoning it does is internal, compressed into the forward pass that turns input tokens into output tokens. The model can solve problems by reasoning about them in this compressed way, and with chain-of-thought prompting it can extend its reasoning across multiple output tokens. But the reasoning happens inside one generation, not across multiple turns of engagement with the world.</p><p>The loop breaks generation into turns. Within a turn, the model can reason. Between turns, the environment responds. The model&#8217;s reasoning on turn N has access to what happened on turns 1 through N-1. If an action produced an unexpected result, the next reasoning step knows. If a tool returned an error, the next reasoning step knows. If a plan needs to be revised, revision is possible because reasoning is happening inside a loop that keeps going.</p><p>The hallucination case is the clearest example. A language model asked a factual question it does not know will often produce a plausible-sounding but wrong answer. There is no mechanism inside a single forward pass to distinguish knowing from confabulating. The model generates whatever tokens the distribution favors, and for questions at the edge of its knowledge, the distribution favors something that sounds right. Chain-of-thought reasoning makes this worse as often as better: the model reasons confidently through steps that are individually hallucinated, compounding the error.</p><p>The loop breaks the pattern. The model in a ReAct loop can decide to look something up before answering. The action is <code>Search[X]</code>. The observation is what the world says about X. The next thought step is informed by what the world said, not by what the model thought the world might say. Hallucinations still happen, but now they happen against a backdrop of actual data the model is being asked to integrate. The correction mechanism is not perfect. It is a structural fix for a structural problem: a stateless generator checking its own claims against something that is not itself.</p><p>Without the loop, there is no opportunity to revise. The model produces a plan, or an answer, or an action, and that is the output. Any mistake is propagated. Any missing information is hallucinated. The ReAct paper&#8217;s original argument was partly about this. Chain-of-thought alone is vulnerable to propagating errors in its own reasoning. The loop, because it involves checking the world between reasoning steps, is a correction mechanism.</p><p>A language agent is what you get when you put a language model inside this correction mechanism. The mechanism is old. The model is new. The combination is what the field has been building on ever since.</p><h2>What Came Next</h2><p>The 2022 paper was the moment of recognition. What followed was the rapid build-out of everything the moment enabled.</p><p>Prompt scaffolds that implemented the ReAct pattern became standard. Frameworks like LangChain and LlamaIndex emerged to make the pattern easier to deploy. Tool-calling conventions, which started as ad-hoc prompt engineering, became protocols, most visibly MCP. Agent loops got more elaborate: separate reasoning and planning modules, explicit memory systems, multi-agent architectures where different agents played different roles in the overall loop.</p><p>The post-training moves covered in Part 2 started targeting the loop directly. Reasoning models trained to think longer within a turn. Agentic post-training shaping behaviors like tool use, error recovery, and goal persistence across turns. The loop went from prompt-engineered to trained-in.</p><p>By 2026, the industry has converged on what the ReAct paper&#8217;s structure looks like in production. A language model, post-trained for agentic behavior, running inside a harness that manages tool calls, memory, and error recovery. The thought-action-observation cycle is still recognizable. But it now sits inside infrastructure that did not exist in 2022, and the next four articles in this series are about what that infrastructure looks like.</p><p>Part 4 covers the harness itself. Part 5 covers what happens when reasoning moves from prompt-time to inference-time. Part 6 covers what happens when multiple agents run loops together. Part 7 covers what happens when the loop extends beyond text into the physical world.</p><h2>What the Moment Revealed</h2><p>Two things worth holding onto from the ReAct moment.</p><p>First, the continuity with earlier agent research is real, not rhetorical. The agent researchers of the 1980s and 1990s were not wrong about what an agent is. They were right about the structure. What they lacked was the substrate. The field spent decades perfecting the loop with ingredients that could not support it, then had to wait for ingredients that could. The current agent era is a continuation of that earlier work, not a break from it.</p><p>Second, the loop is still the loop. Modern harnesses are elaborate. Multi-agent architectures are elaborate. The infrastructure around language agents is elaborate. But the core structure is still deliberate-act-observe-deliberate. Every production agent today runs this cycle. The complexity is in what each step does and how the environment is managed between them. The shape of agent operation has not changed.</p><p>What has changed, and what the series will trace from here, is what this loop can be made to do when you keep pushing on it. Better harnesses. Longer reasoning. More agents. More environments. The substrate keeps improving. The loop keeps scaling. What happens at the limit of that scaling is the question the series is built toward.</p><h2>A Quiet Pivot</h2><p>The ReAct moment was quiet, in the way that many pivotal moments are quiet. A paper on arXiv, a few benchmark improvements, a few lines of code demonstrating the approach. Within months it was everywhere. Within two years it was the foundation everyone was building on.</p><p>The paper&#8217;s contribution was not the loop. The loop was ancient. The paper&#8217;s contribution was the observation that the loop now had something to run on, and the demonstration that when it did, it worked. Part 4 looks at what the field built once this observation sank in. If Part 3 is about the moment the loop finally worked, Part 4 is about everything that started being built to make it work better.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series. Next, Part 4: &#8220;The Harness.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[The Rise of Agents, Part 2: What Language Agents Inherited]]></title><description><![CDATA[Language agents did not replace reinforcement learning. They absorbed it.]]></description><link>https://www.robonaissance.com/p/the-rise-of-agents-part-2-what-language</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-rise-of-agents-part-2-what-language</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Fri, 24 Apr 2026 13:30:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!A0NP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A0NP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A0NP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A0NP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A0NP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A0NP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A0NP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195232094?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A0NP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A0NP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A0NP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A0NP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F091b69e6-4b0d-4c81-a036-0760b6ac8198_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In 2016, reinforcement learning looked like it might eat the world. AlphaGo beat Lee Sedol. Robotics papers were using RL to solve tasks that had defeated the field for decades. The narrative was: this is how agents will be built.</p><p>A decade later, the world is language-shaped. The agents that get funded and shipped are built around pretrained language models, not policy networks trained from pixels. The canonical Era 2 results have become historical artifacts.</p><p>The standard story of what happened in between is a story of replacement. Language models arrived, solved problems RL could not, and RL faded from agent research. This story is clean, easy to tell, and wrong.</p><p>Reinforcement learning did not fade from agent research. It moved. Every language agent in production today runs on a pretrained model that was then shaped by reinforcement learning. Every reasoning model that chains thoughts across inference time was trained with RL on the chains. Every coding agent that stays on task through long runs was post-trained with RL on the behaviors we call agentic. RL is not beneath language agents. It is inside them.</p><p>This article is about that inheritance. What RL did in Era 2, what it does in Era 3, and what the difference between those roles reveals about what language agents actually are.</p><h2>What RL Did in Era 2</h2><p>A brief reminder of the shape of RL before the language era.</p><p>In Era 2, the paradigm runs like this. You have an agent in some environment. The environment gives the agent observations and accepts actions. The agent&#8217;s job is to discover a policy, a mapping from observations to actions, that maximizes a reward signal over time. The agent starts knowing nothing. Through trial and error, guided by reward, it learns to act.</p><p>This is elegant and astonishingly general. The same algorithmic framework produces AlphaGo, robotic locomotion, and Atari game play. The agent does not need to be told how the world works. It discovers what works through interaction.</p><p>But the paradigm makes strong assumptions. The environment must give feedback. The reward signal must correlate with what you actually want. The state space, however large, must be tractable enough that trial and error can explore it. Most of all, the task must be specified at a level the algorithm can operate on. You do not ask a pure RL agent to book a flight. You ask it to minimize a loss over a trajectory in an explicit state space with explicit actions and explicit rewards.</p><p>In environments where these assumptions hold, RL is superhuman. In open worlds, where they do not, RL has nothing to start from. This is the second wall, and Part 1 covered it.</p><p>What matters for Part 2 is the next part of the story. The second wall did not fall because the field abandoned RL. The second wall fell because the field found a way to give RL something to start from.</p><h2>The 2022 Synthesis</h2><p>The breakthrough that defines Era 3 is not the language model alone. GPT-3 existed in 2020 and did not produce agents. It could complete text in impressive and often useful ways, but ask it to follow instructions reliably, adopt a persona, or refuse to generate harmful content, and it would do all of these unreliably or not at all. The behaviors that make current agents useful, following instructions, refusing certain requests, staying on task, were not latent in scale alone. Something else had to happen.</p><p>What defines Era 3 is the combination of a pretrained language model with reinforcement learning from human feedback.</p><p>InstructGPT, published by OpenAI in early 2022, is the template. Take a pretrained language model, which has absorbed the shape of human text without any particular behavioral goal. Collect comparisons from human raters: given two responses, which do you prefer. Train a reward model on those comparisons. Use RL to fine-tune the language model so its outputs score well on the reward model.</p><p>The result is a model that behaves differently from the raw language model. It follows instructions. It refuses certain requests. It adopts the voice of a helpful assistant. The base model had all of these behaviors latent in the distribution of human text. RL pulled a specific behavioral pattern out of the distribution and made it dominant.</p><p>This is the synthesis. Pretraining provides the representations: the shape of human knowledge, the structure of language, the implicit model of how humans reason. RL provides the shaping: which behaviors, among all those the model could exhibit, it should exhibit.</p><p>Neither alone produces what we recognize as a language agent. A raw pretrained model will complete text in whatever direction seems statistically plausible. It will not follow instructions reliably. It will not refuse harmful requests. It will not act like an assistant. A reinforcement-learned system without pretraining cannot reason about open worlds at all, because it has no representations to reason with. The combination produces something new.</p><p>This is the inheritance. Not the algorithm itself, although that matters. The capacity to shape behavior on top of pretrained representations. A way to move a model from &#8220;does whatever is statistically likely&#8221; to &#8220;does what we ask in the ways we want.&#8221;</p><h2>What RL Does Inside Language Agents</h2><p>RL&#8217;s role inside language agents is not a single job. It has at least three, and the third is newer than most industry observers realize.</p><p>The first is alignment. This is what InstructGPT did, what Constitutional AI does, what RLHF and its descendants do in every modern language model pipeline. The model is trained to prefer helpful, honest, harmless responses over the alternatives. Anthropic&#8217;s RLAIF uses AI-generated feedback in place of human labels, which lets the technique scale. Direct Preference Optimization skips the reward model and optimizes preferences directly. These are variations on the same move: shape a language model&#8217;s behavior toward preferred outputs using learned or collected preferences.</p><p>The second is reasoning. In late 2024, OpenAI released o1, a language model trained to produce extended chains of thought before producing answers. DeepSeek-R1 followed in January 2025 and showed the technique could be reproduced in open weights. The DeepSeek team named their variant Reinforcement Learning with Verifiable Rewards, or RLVR. Instead of training against human preferences, RLVR trains against automatically checkable signals: did the math answer come out right, did the code compile and pass tests. The reward is cheap and accurate, which means the training can run at scale.</p><p>The result is a new category of model, sometimes called a large reasoning model. The architecture is the same as the language models the reasoning models are built from. The training recipe differs. A base model is exposed to verifiable problems, generates multiple reasoning traces, and is reinforced for traces that arrive at correct answers. Over enough training, the model develops what the DeepSeek paper calls emergent reasoning patterns: self-reflection, verification, dynamic strategy adaptation. These are not hand-coded. They fall out of rewarding correct final answers on problems hard enough that naive approaches do not suffice.</p><p>Chain-of-thought prompting asks a model to reason step by step. Chain-of-thought training teaches a model that reasoning step by step pays off. The difference is the difference between a hint and a habit. A prompted model can produce chain-of-thought output, but whether it actually reasons through the chain or hallucinates a plausible-looking one depends on luck. A trained model has been shaped, over thousands of RL steps, to treat extended reasoning as the default approach to hard problems. The reasoning is not always correct. But it is no longer optional.</p><p>The third is agentic behavior. Coding agents, web-browsing agents, tool-using agents. All of them are post-trained to exhibit the behaviors we call agentic. Stay on task. Use tools correctly. Recover from errors. Maintain goals across steps. Each of these is a behavior that RL-style optimization against a carefully chosen reward can produce, and which a pretrained model alone will not reliably exhibit.</p><p>This is visible in specific cases. Claude Code and similar coding agents show behaviors the underlying language models do not exhibit out of the base. They invoke tools in a specific call format. They wait for tool results before continuing. They interpret error messages and adjust course. They run tests and use the outputs to decide what to do next. These behaviors sit on top of the base model&#8217;s knowledge of code, but they are not automatic from that knowledge. They are trained in. The specific way a frontier coding agent uses its tools, the exact shape of its correction loop, the cadence of its status updates: all of this is the product of post-training choices that differ from lab to lab.</p><p>There are other roles. Reward models are used as filters in inference. Safety training leans on preference data. Fine-tuning for specific industry use cases often blurs into RL territory. The pattern across all of these is the same. Pretraining built a base of capability. RL shapes the base into something behaviorally specific.</p><h2>The Transformation</h2><p>What changed between Era 2 and Era 3 is not whether RL is used. It is what RL is applied to.</p><p>In Era 2, RL was applied to a blank agent. Start from nothing. Learn a policy from scratch. The agent&#8217;s entire competence came from the RL process. This is why Era 2 worked in closed environments and failed in open ones. There was no prior knowledge to start from.</p><p>In Era 3, RL is applied to a pretrained model. The model already has competence. It already has representations of the world. It already has implicit models of reasoning, planning, and language. RL does not build the competence. It shapes it.</p><p>This sounds like a technical detail. It is actually the whole story.</p><p>Consider the same RL algorithm applied in each setting. Proximal Policy Optimization, the standard algorithm used for RLHF, is also a standard algorithm used in Era 2 robotics and game-playing RL. The algorithm is the same. The difference is what it operates on. Applied to a neural network starting from random weights, PPO can learn to play Atari if the environment cooperates, and nothing more. Applied to a pretrained language model, the same algorithm can turn a text completer into an instruction follower, an answer generator into a reasoner, a language model into an agent. The algorithm did not gain new powers. The substrate did.</p><p>A technique that fails in open worlds because it has no prior knowledge becomes powerful in open worlds when you give it prior knowledge. The prior knowledge comes from pretraining. The shaping comes from RL. Neither alone produces today&#8217;s agents. Together they produce agents that exhibit behaviors neither could produce on its own.</p><p>This is why the second wall fell when it did. Not because RL was replaced. Because RL acquired a foundation it had never had before: a pretrained language model with open-world competence already inside it.</p><h2>What This Tells Us About Language Agents</h2><p>If RL is inside language agents, not beneath, not beside, then several things follow.</p><p>First, language agents are composite systems, not monolithic ones. When an agent does something surprising, the cause could sit in the pretrained weights, in the RL-shaped behaviors, in the prompt, in the harness, or in the interaction between all of these. Debugging requires distinguishing which layer produced the behavior. This is part of why language agent behavior is famously hard to reason about. The layers interact, and they interact in ways no layer alone predicts.</p><p>Second, the capabilities of language agents are not the capabilities of pretrained language models. Pretraining provides the raw material. RL turns the raw material into an agent. When industry observers marvel at what modern agents can do, they are marveling at something produced by a specific post-training process. Different RL choices produce different agents from the same base model. Claude&#8217;s persona is the product of Anthropic&#8217;s post-training. GPT&#8217;s persona is the product of OpenAI&#8217;s. The base models differ less than the personas suggest.</p><p>Third, the bottlenecks of language agents are partly RL bottlenecks. Getting an agent to follow complex instructions, to refuse certain requests, to maintain specific values, to reason in specific patterns. All of these are RL-shaped. They improve when post-training improves. When a lab claims a new model is better at some agentic task, the improvement is often an RL improvement, not a pretraining one. The base models have converged. The post-training is where the differentiation happens now.</p><p>Fourth, the limits of language agents are partly RL limits. What RL can shape is what we can specify a reward for. Alignment works because &#8220;helpful, honest, harmless&#8221; can be approximated by human preference data. Reasoning works because correctness can be checked automatically on math and code. Agentic behavior works because tool-use success has measurable outcomes. What RL cannot shape well is anything where we cannot construct a reward signal. This becomes important later in the series.</p><h2>The External Inheritance</h2><p>Part 3 turns to the other inheritance of the language agent era: the ReAct loop. If Part 2 is about what RL gave language agents internally, Part 3 is about what ReAct gave them externally. The capacity to interleave reasoning with action, observe results, adjust course. The loop that makes language agents visible as agents rather than text generators.</p><p>Deliberate, act, observe, deliberate again. This structure predates language models by decades. It sat in symbolic agent architectures, in BDI systems, in robotics control stacks. What changed in 2022 was what the loop ran on. Pretraining plus RL plus the ReAct loop is the technical stack of every Era 3 agent. Part 3 looks at what made the loop work where earlier versions of it had not.</p><h2>The Shape of the Thing</h2><p>Language agents are not large language models with tools bolted on. They are pretrained models shaped by reinforcement learning, running in a reasoning loop, embedded in a harness. Each layer matters. Missing any one of them, the agent does not exist in the form we have come to know it.</p><p>Part 1 said language agents broke the second wall by inheriting the open-world competence of models trained on almost everything humans have written. That is true, but incomplete. The open-world competence comes from pretraining. The agentic character comes from RL. The wall fell to a combination, not a component.</p><p>The intention gap, at the top of the diagram, may or may not yield to better combinations. That is a question for later in the series. For now, it is enough to see that the combination is what matters. RL in agent research did not die. It moved inside. And the inside is where the interesting structure has been hiding.</p><div><hr></div><p><em><a href="https://www.robonaissance.com/t/the-rise-of-agents">The Rise of Agents</a> is an eight-part series. Next, Part 3: &#8220;The ReAct Moment.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[The Beauty of Mathematics, Part 3: The Moonshot]]></title><description><![CDATA[She doodled MIT on her math scratch paper in Guangzhou. A decade later, she raised $1.6 billion to teach machines to prove.]]></description><link>https://www.robonaissance.com/p/the-beauty-of-mathematics-part-3</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-beauty-of-mathematics-part-3</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Thu, 23 Apr 2026 11:42:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nuDj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nuDj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nuDj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nuDj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nuDj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nuDj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nuDj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg" width="1360" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1360,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204563,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195228655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nuDj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nuDj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nuDj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nuDj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824f45fb-4468-4a0a-b9c2-4316745a721b_1360x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This 3-part deep-dive series explores <strong><a href="https://www.youtube.com/watch?v=78Vyy_dzWXA">a four-hour interview</a></strong> with <strong><a href="https://x.com/CarinaLHong">Carina Hong</a></strong>, hosted by <strong><a href="https://x.com/zhang_benita">Xiaojun Zhang</a></strong> in 2026. To better convey the depth and ideas of the conversation, I have reorganized the narrative, added background context, and clarified some of the more technical points.</em></p><div><hr></div><p>Carina Hong describes her own company in terms that most founders have scrubbed from their talking points before the pitch meeting begins. &#8220;The outcome is binary,&#8221; she says. &#8220;Either we land the moon, or we do not. There is no middle.&#8221; She says this in multiple registers across the interview, sometimes analytically, sometimes with a slight shrug, and always with the calm of someone who has done the math on what happens if the rocket fails. She was twenty-four when she founded Axiom Math in July 2025. She turned twenty-five while raising $200 million at a $1.6 billion post-money valuation in March 2026, led by Menlo Ventures. The business model, which she told her seed-round investors she did not yet understand, remains an open question. What is not open is her willingness to say so aloud.</p><p>The first two parts of this series unpacked what Axiom is building and why. This part turns to the person building it. Not because biography validates thesis, but because the path she took to reach this bet is unusual enough to be worth tracing, and because the specific way she tells the story, candid about the fear and the uncertainty and the days she feels the stupidest in the room, is rare in Silicon Valley. The version of the AI-founder archetype that dominates the discourse is confident, messianic, and careful. Hong is intense, honest about herself, and not careful at all.</p><h3>The Doodle</h3><p>Hong grew up in Guangzhou. Her walk to primary school was ten minutes, long enough to get lost in thought, which she often did. She attended South China Normal University&#8217;s affiliated high school, a feeder for Chinese academic olympiads. She competed in math olympiads. She was, by her own account, consistently unable to solve the first problem on each round, the Euclidean geometry question that the other strong students regarded as automatic. She compensated, as described in Part 2, by grinding through with complex-number coordinates. It took her two to three times as long as her classmates and produced solutions that were correct but inelegant. She lost time. She sometimes skipped later problems. She kept doing this for years.</p><p>Somewhere in this period, she started doodling &#8220;MIT&#8221; on her mathematics scratch paper. The letters, she explains with characteristic flatness, were easy to draw. &#8220;If I had wanted to go to another school, say Columbia, it would have been more letters to write. MIT is three.&#8221; She had seen <em>Good Will Hunting</em>. She knew MIT was where the Infinite Corridor was. She knew it was where great mathematicians and physicists and astronauts came from. The doodle became a small ritual.</p><p>There is a framework, introduced to her later by a mentor, that Hong now uses to describe her childhood mental state. It comes from neuroscience. &#8220;Bounded attention&#8221; is the attention that is focused on a task. &#8220;Free attention&#8221; is the attention that wanders. Most children, most of the time, alternate between the two. Hong&#8217;s walk to school, the daydream in class, the complex-number method itself, she now describes as free attention applied to whatever was in front of her. It felt like play. It produced persistence as a byproduct.</p><p>She got into MIT. She majored in mathematics and physics. She did not, she notes repeatedly, feel like the smartest person in any room she entered there. MIT is a school where ordinary mathematical talent looks blunted next to the kind of student who is completing Knots and Surfaces in freshman year. Hong reports, without obvious bitterness, that she felt in every phase of her life that she was the stupidest person in that environment, the one who tried hardest and saw the least.</p><p>&#8220;I kept trying things that did not work,&#8221; she says. &#8220;And I kept trying them.&#8221;</p><h3>The Cure</h3><p>MIT changed her not through its curriculum but through its atmosphere. &#8220;What is hard, do that,&#8221; she says, summarizing what she took from the place. &#8220;What is painful, do that. What requires long-term thinking, do that.&#8221; She describes the school&#8217;s culture of suffering with something like affection. Students running in the middle of blizzards when the red weather alert said stay inside. A peer group, she says, &#8220;every one of whom could endure.&#8221;</p><p>Then the pandemic hit. Hong had been a freshman. In her second semester, MIT sent everyone home. Her small team of friends and study partners, which she had relied on as her emotional infrastructure, dissolved into Zoom rectangles. She was alone in an apartment, taking the same hard classes, but without the group that had absorbed the pain alongside her.</p><p>She had to find something else. What she found, or so she tells the story now, was the ability to extract meaning from difficulty itself, without a peer group to share it with. &#8220;The learning curve was very steep,&#8221; she says. &#8220;MIT really shaped my character.&#8221; The phrasing is restrained. What the phrasing covers is a transition she emphasizes carefully in the interview: from someone who could tolerate suffering within a community to someone who could find something useful in suffering alone. The second is the traitful pattern that investors later told her they look for. &#8220;Founders with a chip on the shoulder,&#8221; one of them said to her. &#8220;Chips on the shoulder convert into chips in the pocket.&#8221; She heard this phrase for the first time as a VC clich&#233; and has since stopped finding it clich&#233;.</p><p>She now says, with some amusement and no denial, that she is addicted to pain. &#8220;A lot of founders I know are,&#8221; she says. &#8220;It is not necessarily healthy.&#8221;</p><h3>The Detour</h3><p>She graduated from MIT with the double degree and, for reasons that in retrospect look like test runs, followed a path through the kind of elite credentialing that mathematicians sometimes take before settling into research. A Rhodes Scholarship at Oxford, where she completed a master&#8217;s in neuroscience at Hertford College with distinctions. Research at University College London&#8217;s Sainsbury Wellcome Centre and Gatsby Computational Neuroscience Unit. A Knight-Hennessy Scholarship at Stanford to pursue a combined J.D./Ph.D. in mathematics. Along the way, the Morgan Prize, the Schafer Prize, and nine peer-reviewed publications by the age of twenty-four.</p><p>She also spent a period as a quantitative trader. Not long, but long enough to form a view about what she did not want to do. The view is specific. In quantitative finance, she notes, your signal arrives quickly. You are right or wrong within a trading day, a week, a month. The feedback loop is so tight that it eats your own epistemic judgment. &#8220;You lose the ability to think about long-term things,&#8221; she says. &#8220;Competition in a short time horizon creates mediocrity.&#8221; The phrasing is more compressed in her voice than it looks on the page. The lesson she took from the quant period, the lesson she says she thinks about now, is that she wanted to work on problems where the signal was slow enough to let her think.</p><p>She was in her first year at Stanford, enrolled in both the law school and the mathematics Ph.D. program, when she decided to stop.</p><h3>The Conversation</h3><p>The founding story, like most founding stories, is cleaner in retrospect than it was in the moment. The short version, the one that appears in secondary coverage, is that Hong met Shubho Sengupta over coffee in Palo Alto, talked for a few hours about whether AI could be a mathematician, and started a company. The version Hong tells in the interview is longer by about a year and a half.</p><p>Sengupta, who became Axiom&#8217;s chief technology officer, is a generation older than Hong. He led Meta FAIR teams that developed OpenGo and CrypTen. Before Meta, he worked on distributed training systems that shaped Google Brain and was among the earliest CUDA developers at Nvidia. He has the kind of resume that, in Silicon Valley, opens doors without a further explanation of who the person is. None of this was known to Hong when she met him. They met at Verve, a coffee shop in downtown Palo Alto, where Hong was a law-school regular lugging three-volume constitutional-law casebooks and ordering matcha to watch the dogs in the courtyard. Sengupta was also a regular. They ended up at the same six-person communal table. The first conversation, as Hong remembers it, began when she asked him to close a blind because the sun was in her eyes.</p><p>They were friends for a year and a half before either of them mentioned starting a company. Hong did not know Sengupta worked at Meta. Sengupta knew she was a Stanford law and math student but not that she had a research background. They talked about the history of science, the papers Terence Tao was writing on formal proof, the Lean community, the question of whether formal verification was finally ready for AI. &#8220;We did not talk about a company,&#8221; Hong says. &#8220;We talked about the hypothesis. Could we build an AI mathematician.&#8221;</p><p>The decision itself came in the fall of 2024. Hong had just started her math Ph.D. at Stanford and was spending the other half of her time working at XTX, a quantitative finance firm with the kind of compute budget that made her realize how fast AI-for-math could move outside a university. One morning, after a run, she walked into Verve, sat down with Sengupta, and said: if we wanted to raise money for this, how much would we need. They worked out the answer on a napkin. By November she had decided. The formal fundraising would wait until Christmas, because the Christmas break was when Sengupta had time to read.</p><p>That reading group is the scene Hong mentions with the most enthusiasm. The two of them, over the break, went through what Hong calls &#8220;the Christmas reading package,&#8221; a set of papers they had assembled themselves. One of them was a survey co-authored by Kaiyu Yang and Gabriel Poesia, titled &#8220;Formal Mathematical Reasoning: A New Frontier in AI.&#8221; Hong had read across the AI-for-math literature before but had never seen the landscape laid out as a single connected surface. The survey&#8217;s fifth section proposed a set of capabilities a &#8220;good AI mathematician&#8221; ought to have, organized into a two-dimensional grid.</p><p>&#8220;I took the grid,&#8221; Hong says. &#8220;I traced every paper the survey cited, and every paper those papers cited. About half the citations I had not read. By the time I finished, the field had gone from five or six separate things I knew about to a single picture.&#8221;</p><p>The picture was the thesis. The next question was who else to bring.</p><h3>The Mathematician Who Refused</h3><p>Hong&#8217;s hiring strategy, for the first six months of Axiom, was an explicit rule: no mathematicians on staff until employee number fifteen.</p><p>The reasoning was not that mathematicians were useless. The reasoning was that mathematicians from the research culture would import assumptions that would slow the team down. Axiom needed to scale. It needed to train models on enormous datasets. It needed to accept that a good research proof and a good training example were not the same object. A mathematician steeped in the craftsmanship ethic of pure math, one who treats each proof as a hand-made thing worthy of years of care, would resist the industrial scaling that an AI company required.</p><p>She knows this because she tried to hire one anyway.</p><p>Early in the seed round, Hong approached a researcher connected to a benchmark in formal mathematics, someone whose technical judgment was exactly what Axiom needed. She made an offer. He accepted. Then he withdrew. The reason he gave, in Hong&#8217;s telling: &#8220;I do not want to work on internet-scale datasets.&#8221;</p><p>&#8220;He saw math as craft,&#8221; she says. &#8220;Sushi made one piece at a time. We were going to ask him to scale everything. He was right to withdraw. And I was right to try.&#8221;</p><p>The experience shaped her hiring approach. When she eventually did start bringing mathematicians in, Ken Ono first, the criterion was not credentials. It was openness to a specific kind of adversarial collaboration. Hong&#8217;s phrasing is precise. &#8220;We want mathematicians who fight us.&#8221; The mathematicians Axiom hires are not there to do the AI work. They are there to build benchmarks the AI cannot yet solve, to point out where the system&#8217;s proofs are technically valid but mathematically unsatisfying, to produce the training-signal shape that a self-play loop could not produce on its own. Adversarial. Not antagonistic, but structurally oppositional. The mathematician&#8217;s job is to find the gap. The engineering team&#8217;s job is to close it. Then a new gap is found. The cycle iterates.</p><p>Ken Ono, the fifty-seven-year-old mathematician whose arrival generated the &#8220;tenured professor quits to work for twenty-four-year-old&#8221; headline in Chinese media, joined under this arrangement. Ono, as discussed in Part 2, was hired specifically as a conjecturer. But he was also hired, Hong says explicitly, as someone who was going to push back. &#8220;His job is to tell us when we are wrong. He tells us often.&#8221;</p><p>This hiring principle extends to everyone. Sengupta fights Hong on engineering choices. The benchmark-focused mathematicians fight the prover team on evaluation criteria. Fran&#231;ois Charton, the Meta researcher who first applied transformers to mathematics in 2019 and who came to Axiom after using LLMs at Meta to solve a century-old math problem and disprove a 30-year-old conjecture, fights on how the mathematical and the machine-learning cultures should interface. The adversarial rule is a culture, not a tagline. Hong does not pretend it is comfortable. &#8220;It is exhausting,&#8221; she says. &#8220;But it works.&#8221;</p><h3>The Fundraising</h3><p>The fundraising story, told in pieces across the interview, is the most revealing section for anyone who has ever raised money and suspected the process had an absurd theater to it.</p><p>&#8220;Nobody likes fundraising,&#8221; Hong says. &#8220;Nobody. If I could pay an AI a percentage to raise for me, I would.&#8221; The reason, she explains, is not difficulty. The reason is repetition. &#8220;You are a repeating machine. You say the same things to one investor that you said to the last. You get the same questions. You give the same answers. After three weeks you could record yourself and send the recording instead.&#8221;</p><p>She describes an elevated signal-to-noise experience with Howard Morgan, the co-founder of Renaissance Technologies and First Round Capital, currently the chair of B Capital, the firm that led her seed round. Morgan is eighty years old. He was an early user of the ARPANET, with machine number fifty on the network in the early 1970s. He has been an active investor for more than forty years. When Hong met him, she had been awake most of the night rewriting a paper rebuttal for an academic conference deadline. The Zoom call was wedged into a gap. She went into it tired and not particularly polished.</p><p>What she did not expect was that Morgan, who by that point in his career had heard thousands of pitches from thousands of founders, turned the tables. He did not ask her what her business model was. He told her what her business model was. He laid out, with more conviction than she had at the time, where Axiom&#8217;s commercial paths were and how they would unfold. &#8220;He was more optimistic about my company than I was,&#8221; Hong says. &#8220;That does not happen often with investors.&#8221;</p><p>This is the moment she cites as the one that converted fundraising from theater back into something like genuine conversation. Her anti-pitch style, she had realized, was working because it was unusual. &#8220;Most founders tell VCs things are a ten out of ten,&#8221; she says. &#8220;The VCs apply a discount. They end up hearing an eight. I told them we were a seven. So they applied a discount and heard nothing.&#8221; She laughs. &#8220;But the ones who liked that I told them it was a seven, those were the ones I wanted anyway.&#8221;</p><p>The competitive dynamics worked in her favor once the first offers arrived. From the first term sheet to the last, the price tripled. By the end, multiple firms were competing, and the process, which she had described initially as exhausting, briefly became interesting. &#8220;You meet people you would not otherwise meet,&#8221; she says. &#8220;Occasionally one of them changes how you see your own company.&#8221;</p><h3>The Landscape</h3><p>One of the questions the interviewer asks, and which every founder in AI has to answer, is why large labs do not do what Axiom does. OpenAI, Google DeepMind, Anthropic all have both the talent and the infrastructure. Several of them have teams working on formal math. Why is there space for a startup?</p><p>Hong&#8217;s answer is specific and not defensive. &#8220;They could do it,&#8221; she says. &#8220;But they will not, because the expected return on that talent in their current focus areas is higher.&#8221; OpenAI&#8217;s informal-reasoning work, she notes, is driven by a senior researcher&#8217;s personal ambition for scientific discovery. Google DeepMind has parallel teams, one on formal methods and one on informal, and AlphaProof is their public flagship. Anthropic uses Lean data as a reinforcement-learning reward signal but treats it as infrastructure, not as a core product direction. The structural fact, in her framing, is that a lab with a dominant commercial product cannot redirect its best people to formal verification. The opportunity cost is too high. A startup with no other product can.</p><p>The real competitor, she says, is Harmonic, a company co-founded by the Robinhood CEO that has raised $295 million at a $1.45 billion valuation. The two companies share the broad thesis that verification matters, and differ on architecture. Harmonic, Hong notes, has a founder whose attention is split between Robinhood and the startup, which creates a different dynamic. &#8220;Their energy is scientific ambition,&#8221; she says, in a phrasing that sounds like praise and is also a differentiator. &#8220;Ours is moonshot.&#8221; The distinction, as she draws it, is that a scientifically ambitious company can afford to explore. A moonshot company has to land.</p><p>She is also clear that the labs could eventually become partners rather than competitors. &#8220;OpenAI sub-contracts search to Bing,&#8221; she offers as a parallel. &#8220;A large lab focused on informal reasoning could eventually invoke a formal prover, built by someone else, when a formal proof is required.&#8221; This pattern, she predicts, is where Axiom&#8217;s commercial opening is largest: as the verification layer for code-generating AI systems built by other companies.</p><h3>The Moonshot</h3><p>The word she uses for her company&#8217;s ambition, the word that organizes the interview, is moonshot. She uses it in English, embedded in Mandarin sentences, because the Chinese translations do not carry the same connotation. She invokes SpaceX, deliberately. Rockets that either reach orbit or burn. Companies that either return to earth having completed the mission or crash trying. The binary outcome is structural. There is no small win.</p><p>&#8220;I believe in recursive self-improvement,&#8221; she says. &#8220;I think it is near-term achievable, and I think verification is the piece that unlocks it. If I am wrong, we do not get a partial version of it. We get nothing. If I am right, everything changes.&#8221;</p><p>The phrasing is not standard startup talk. Most founders hedge. Most founders describe their companies as asymmetric bets, the small chance of a huge win against the likely chance of a modest one. Hong does not. She describes her company as an all-or-nothing bet, and she asks investors to join her on that basis, and she seems perfectly aware that this framing selects for a specific type of capital and deselects everyone else. It also selects for her own continued commitment, which is a function she has thought about explicitly.</p><p>&#8220;If we fail,&#8221; she says, in response to a direct question, &#8220;I might go back to neuroscience. I want to understand the brain. Current brain-computer interfaces are nowhere close to what they need to be. That is a problem I think about.&#8221;</p><p>She says this easily. Not as a fallback. As a parallel life that exists in a different branch of the probability tree, and that she would find meaningful if she had to take it. The detail matters. A founder who has already made peace with what they would do if the company failed is a specific kind of founder. Not a desperate one. Not a reckless one. Someone who has accepted the binary and chosen the moonshot anyway.</p><h3>Language Is the World</h3><p>Near the end of the conversation, Xiaojun Zhang mentions that her production company is called &#8220;Language Is the World&#8221; Studios. The name is a philosophical position as much as a brand: that language is the medium through which reality is structured and communicated. She asks Hong what she thought of the name when she first encountered it.</p><p>Her answer, at the end of four hours of discussion about Lean, about Curry-Howard, about verification and conjecture and the elegance filter and the question of whether beauty can be trained, is characteristic. &#8220;Mathematicians,&#8221; she says, &#8220;have been writing code in natural language for thousands of years. That is what a mathematical proof is. Structured logical reasoning, expressed in English or Chinese or whatever natural language the mathematician writes in. The thing that is new in the last decade is that we now have a second way to write proofs, in formal languages like Lean, and we can run them through a compiler. What we are doing at Axiom is building the bridge between the two ways of writing.&#8221;</p><p>The remark is offered in passing. It is also, in a sense that connects back to Part 1 of this series, exactly the thesis of the series. Math is code. Code is math. The reason this is not a metaphor is that natural-language mathematics and formal mathematics are two expressions of the same object. Hong&#8217;s company is betting that the translation from the first to the second, and the closed-loop verification that becomes possible when both exist, is the missing capability of the current AI paradigm. The word for that capability is proof. The word for the company is Axiom, which is the name for the starting point of a formal system, the smallest set of assertions from which the rest of mathematics can be derived.</p><p>She is twenty-five years old. She believes the closed loop can be built, that recursive self-improvement is near-term, and that Axiom&#8217;s binary outcome will be known within a few years. She has raised $1.6 billion worth of capital from investors who agree with her on enough of the thesis to fund the attempt. She grew up walking to a primary school in Guangzhou, doodling three letters on her scratch paper because those three letters were the shortest path from where she was to where she wanted to be.</p><p>The rocket is built. The launch window is open. Whether it lands remains to be seen.</p><div><hr></div><p><em>This concludes The Beauty of Mathematics series. For earlier parts: Part 1, &#8220;Math Is Code&#8221;; Part 2, &#8220;Proofs from The Book.&#8221;</em></p>]]></content:encoded></item><item><title><![CDATA[The Beauty of Mathematics, Part 2: Proofs from The Book]]></title><description><![CDATA[Erd&#337;s believed God kept a book of the most beautiful proofs. The question Carina Hong&#8217;s company is quietly confronting is whether a machine can learn to read it.]]></description><link>https://www.robonaissance.com/p/the-beauty-of-mathematics-part-2</link><guid isPermaLink="false">https://www.robonaissance.com/p/the-beauty-of-mathematics-part-2</guid><dc:creator><![CDATA[Hugo]]></dc:creator><pubDate>Wed, 22 Apr 2026 13:53:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K6e1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K6e1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K6e1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K6e1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K6e1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K6e1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K6e1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg" width="1360" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1360,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204563,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.robonaissance.com/i/195035707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K6e1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K6e1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K6e1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K6e1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faea5a69a-cf3e-4bf3-96c1-11b52df1385c_1360x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This 3-part deep-dive series explores <strong><a href="https://www.youtube.com/watch?v=78Vyy_dzWXA">a four-hour interview</a></strong> with <strong><a href="https://x.com/CarinaLHong">Carina Hong</a></strong>, hosted by <strong><a href="https://x.com/zhang_benita">Xiaojun Zhang</a></strong> in 2026. To better convey the depth and ideas of the conversation, I have reorganized the narrative, added background context, and clarified some of the more technical points.</em></p><div><hr></div><p>On the morning of December 6, 2025, in a conference room named after Poincar&#233;, Evan Chen looked at the first problem from that day&#8217;s Putnam exam and drew a figure on a piece of paper. Chen is a coach of the U.S. International Mathematical Olympiad team. The other people in the room, most of them Axiom employees, looked at the drawing. They understood immediately how the problem would resolve. One diagram. Then they fed the same problem to AxiomProver.</p><p>The AI did not find the diagram. It produced a solution that ran to thousands of lines of formal Lean code: a brute-force, case-by-case, step-by-step construction that verified the result without ever seeing the geometric picture. Both answers were correct. Both would receive full marks. But the mathematics underneath the two proofs had almost nothing in common.</p><p>Paul Erd&#337;s, the most prolific mathematician of the twentieth century, believed God kept a book. The Book contained the most beautiful proof of every theorem. Erd&#337;s never claimed to have seen The Book. He claimed only that when a proof was particularly elegant, it came &#8220;from The Book,&#8221; and when a proof was ugly he sometimes wondered whether the result it established might not in fact be false. This was not a religious statement. Erd&#337;s was a committed atheist. It was an aesthetic one: that among the infinity of valid proofs for a given theorem, some are structurally privileged. They reveal what the theorem is really about. They generalize. They connect. They surprise.</p><p>Axiom Math&#8217;s AxiomProver does not have a concept of The Book. Its proofs are valid, which is to say they type-check in Lean and establish the theorems they claim to establish. Many of them run to thousands of lines. Where a human Olympian would draw a single diagram, AxiomProver often produces exhaustive case analysis that no trained mathematician would. Its proofs are correct. And they are, in the specific sense Erd&#337;s meant, ugly.</p><p>Whether this matters is the question at the center of Part 2 of this series. It is also, in Carina Hong&#8217;s telling, the question at the center of the gap between what Axiom has built and what Axiom is trying to build. An AI that can prove theorems, given the theorems, is a remarkable achievement but not yet a mathematician. A mathematician is someone who knows which theorems are worth proving, and among the proofs of a given theorem, knows which ones are worth reading. That second kind of judgment has a name in mathematical culture, old and respected: taste. And taste, so far, is not something the current paradigm has learned to produce.</p><h3>The Brute-Force Artist</h3><p>Hong introduces herself as a brute-force type. This is not false modesty, and it is not a pose. It is, in her telling, the formative fact of her mathematical adolescence.</p><p>At Chinese olympiad training camps as a teenager, Hong had a problem. The first question on every round of the national mathematics olympiad was a Euclidean geometry problem. It was regarded as the guaranteed question. Miss it, Hong reports, and third prize was out of reach. Most competitive students solved these problems the way competition geometry is traditionally solved: by seeing a construction. You looked at the figure, noticed that a certain line would bisect a certain angle or that two triangles were similar, drew the auxiliary line, and the proof followed in a page.</p><p>Hong could not see the constructions. Her brain, she reports, is more comfortable with algebraic symbols than with geometric figures. Faced with a geometry problem, she fell back on what is called the complex number method: assign complex coordinates to every point, translate every geometric statement into algebraic identities among those coordinates, and grind through the algebra. The method is reliable. It is general. It is also deeply inefficient. Where a classmate might write a page using the inscribed angle theorem and similar triangles, Hong would write three pages of polynomial manipulation. Where they finished in ten minutes, she finished in forty, which in a timed competition meant she usually had to skip a later problem.</p><p>&#8220;I could not see the underlying geometry,&#8221; she says. &#8220;I would solve the problem without ever understanding what it was about.&#8221;</p><p>This is the kind of self-description that could be easily flattened into a generic founder story about overcoming adversity. Hong resists that flattening. She reports the complex-number method not as a deficit she overcame but as a methodology she still uses, for a specific reason: it works. It produces correct answers. It is systematic. What it sacrifices is the insight into why the result holds, the feeling of having seen the structure, the thing that competition geometers mean when they say a solution is &#8220;natural.&#8221; The feeling has a function. It signals that the technique you just used might generalize. The grinding algebraic method produces no such signal. It just gives you the answer.</p><p>She was, in the sense the rest of this series will keep returning to, an early version of an AxiomProver in a human body. The algorithm worked. The beauty did not.</p><h3>Why the Machines Proved Like Hong</h3><p>In January 2024, Google DeepMind published a paper in <em>Nature</em> describing a system called AlphaGeometry. The system could solve competition-level Euclidean geometry problems at something approaching the level of an IMO gold medalist. On a benchmark of thirty IMO geometry problems from the past two decades, AlphaGeometry solved twenty-five. The successor system, AlphaGeometry 2, solves eighty-three percent of IMO geometry problems from 2000 to 2024.</p><p>The architecture of AlphaGeometry is worth describing because it tells a specific story about the space of possible solvers. The system has two components. A symbolic deduction engine maintains a database of facts about the geometric figure and applies deduction rules until either the conclusion is proved or no new facts can be derived. A neural language model, trained on synthetic geometry problems, suggests auxiliary constructions when the deduction engine gets stuck: a new point here, a new line there, a circle drawn through three specific vertices. The language model&#8217;s job is to propose the construct that will unlock deduction. The deduction engine&#8217;s job is to use the construct to reach the conclusion.</p><p>What makes this architecture notable, from the perspective of this series, is not what it does but what it does not do. It does not look at the figure. It does not see the geometric structure. The language model suggests constructions based on statistical patterns learned from synthetic training data, and the deduction engine grinds through algebraic consequences. The AI never sees the diagram in the sense that a human geometer does. It sees a list of predicates and proposes a list of additions. The proof that emerges is rigorous and often long.</p><p>Hong saw AlphaGeometry&#8217;s approach and recognized it immediately. &#8220;Its philosophy is the same as my approach,&#8221; she says. &#8220;Convert the geometry into symbolic expressions. Solve it algebraically. The diagram never enters the reasoning.&#8221;</p><p>This is not a criticism of DeepMind. AlphaGeometry is a remarkable piece of engineering, and its approach is a reasonable one given the tools available. The point is deeper. The dominant paradigm in AI for geometry has converged on Hong&#8217;s adolescent method, and for the same reason. Constructions are hard. Symbolic grinding scales. If you cannot see the figure, algebra is the only thing left.</p><p>But competition geometry is not the whole of mathematics, and the gap between AlphaGeometry&#8217;s solutions and the solutions Erd&#337;s would have called beautiful is not a bug in DeepMind&#8217;s system. It is the frontier.</p><h3>Competition Math Is Not Research Math</h3><p>There is a structural distinction, well known inside mathematics and less well understood outside it, between the kind of mathematics tested in competitions and the kind of mathematics that happens in research. The distinction matters here because the entire recent wave of AI-for-math results, from AlphaGeometry to AlphaProof to AxiomProver, has been focused on competition mathematics. This is not an accident. Competition mathematics has a property that research mathematics lacks: the problems are given.</p><p>In a competition, someone else has selected a set of problems, verified that they have solutions within the ambient body of undergraduate mathematics, and calibrated their difficulty. The solver&#8217;s job is to find the solution. The problems are hard but bounded. A human or a machine that can produce correct proofs for Putnam or IMO problems has demonstrated something real about proof generation.</p><p>In research, nobody has selected the problems. The mathematician&#8217;s first and hardest task is to decide what to try to prove. Most mathematical questions, if phrased precisely, are either trivial or impossible. The interesting ones, the ones that produce the results we remember, are rare. A researcher&#8217;s primary output, over the course of a career, is not proofs. It is the sequence of questions they chose to attempt, and the small fraction of those questions that turned out to be both true and important.</p><p>Hong is explicit about this distinction. &#8220;We have built a very good prover,&#8221; she says. &#8220;The hard problem, the one we cannot yet do well, is the conjecturer.&#8221;</p><p>A conjecture, in the technical sense, is a precisely stated proposition that the person proposing it believes to be true but has not yet proved. The Riemann Hypothesis is a conjecture. So is Goldbach&#8217;s. Most conjectures never become famous, because most conjectures turn out to be true and unimportant, or false and unimportant, or impossible to resolve with current methods and therefore shelved. The conjectures that end up mattering are the ones that, if true, would reveal a structure: a connection between subjects that appeared unrelated, a pattern that extends to cases never examined, a reason why known results hold.</p><p>Producing such conjectures requires something that looks very much like aesthetic judgment. The mathematician has to pattern-match across a vast body of known results, notice a regularity, propose that the regularity extends, and only then attempt to prove the proposal. The first three steps are where mathematics mostly happens. The proof is the cleanup.</p><p>An AI that can do the cleanup is, for Axiom&#8217;s commercial purposes, already useful. An AI that can do the first three steps is, in Hong&#8217;s framing, something different. It is what the phrase &#8220;mathematical superintelligence&#8221; actually means, and it is why she spent her first six months of operation hunting for a specific kind of human hire.</p><h3>The Conjecturer</h3><p>The headline that made Hong briefly famous in Chinese tech media arrived in 2025, framed in the way headlines arrive: &#8220;Fifty-Seven-Year-Old Tenured American Professor Quits to Work for Twenty-Four-Year-Old Chinese Woman.&#8221; The professor was Ken Ono, an American mathematician born in Philadelphia in 1968. Until 2025 he held the Marvin Rosenblum Professorship at the University of Virginia and had served as Vice President of the American Mathematical Society. Ono has received Guggenheim, Packard, and Sloan Fellowships. He is one of the world&#8217;s leading authorities on the mathematics of Srinivasa Ramanujan, the Indian prodigy who died in 1920 at thirty-two after producing roughly four thousand theorems, most of which contemporary mathematicians are still unpacking.</p><p>Ono joined Axiom as what the company calls its Founding Mathematician. This is a deliberate title choice. Axiom has plenty of mathematicians on staff. It has one person who has spent his career, as Hong puts it, being &#8220;a high-volume conjecture machine.&#8221;</p><p>&#8220;He is a conjecturer,&#8221; Hong says. &#8220;He is very prolific. That is why we needed him specifically.&#8221;</p><p>The distinction between what Ono does and what other mathematicians do is worth naming carefully, because it is also the distinction between what current AI systems can do and what Axiom is trying to build. In Hong&#8217;s framing, there are two archetypes of mathematical intuition. There is the Ramanujan type, which she calls sharper and more convergent: the mathematician who stares at a formula and knows it is true, or knows what its generalization must be, without being able to explain how he knows. This is the intuition that writes down correct identities in a notebook in Madras and mails them to Hardy in Cambridge, dozens of results per letter, no derivations, no proofs. Ramanujan&#8217;s notebooks contain theorems that took the next century to confirm.</p><p>The other type, which Hong associates with Ono, is divergent. The Ono type is not primarily in the business of knowing what is true. He is in the business of connecting known facts from disparate subjects in ways that suggest new questions to ask. He notices that a result in one area of number theory has a structural echo in another, and proposes that both are instances of a larger pattern. He produces, as a main product, interesting questions.</p><p>&#8220;Ramanujan is the sharper kind of intuition,&#8221; Hong says. &#8220;Ono has the divergent kind. The way he generates conjectures is by connecting many different perspectives.&#8221;</p><p>This distinction is not neutral. It is the distinction between the capability Axiom can plausibly train and the capability that might require something the field does not yet know how to build. Ramanujan&#8217;s intuition was, as Hong and her team suspect, a phenomenon that would now be called a pre-training product. Ramanujan absorbed mathematics, particularly classical number theory, through a kind of total immersion. He read George Shoobridge Carr&#8217;s <em>A Synopsis of Elementary Results in Pure and Applied Mathematics</em> and worked through every theorem independently. His intuitions emerged from that saturation. The conjectures came out whole.</p><p>In December 2025, Ashish Vaswani, co-author of the original Transformer paper, released Essential AI&#8217;s first model. He named it Rnj-1, pronounced &#8220;range-one,&#8221; after Ramanujan. The choice was deliberate, and the thesis behind it was explicit: Essential AI is betting on pre-training. In an industry that has largely moved toward reinforcement learning as the dominant post-training objective, Vaswani has staked a contrarian position that intuition, the quality that lets a model produce the right continuation in a novel setting, is built during the pre-training phase through raw exposure to data at scale.</p><p>Axiom does not do pre-training. Axiom starts from open-source pre-trained models and does extensive post-training: supervised fine-tuning, reinforcement learning with formal verification as the reward signal, and the orchestrator-subagent system described in Part 1. This is a choice about where Axiom&#8217;s comparative advantage lies. It is also an acknowledgement that the Ramanujan-style intuition may not be something post-training can produce. The divergent conjecturing that Ono does, by contrast, looks more like a process Axiom can model: identify patterns across many training examples, propose extensions, test them.</p><p>But even the divergent version is hard. And it runs into the problem that has quietly stalked AI-for-math research since the field&#8217;s earliest days.</p><h3>The Elegance Problem</h3><p>Suppose you build a system that generates conjectures. Each conjecture is a precisely stated proposition. You want to use the system in a loop: generate a conjecture, try to prove it with your prover, and use the proof success as a reward signal for training better conjecturers. This is the structure of self-play proving, laid out in a 2025 paper by Kefan Dong and Tengyu Ma at Stanford, which Hong cites as a starting point for all serious work in the area.</p><p>The structure looks clean. In practice, it has a fatal problem. The prover&#8217;s reward signal is binary: the conjecture was proved, or it was not. If the conjecturer learns to optimize that signal, it will quickly discover that the easiest way to produce provable conjectures is to produce trivial ones. &#8220;The sum of two even numbers is even.&#8221; True. Provable. Useless. A self-play loop that optimizes for provability without filtering for importance produces an endless stream of true trivia, and nothing else.</p><p>The field&#8217;s term for the missing filter is elegance. The self-play paper proposed a rough proxy: measure the ratio of the proof length to the statement length. A &#8220;good&#8221; conjecture, under this proxy, is one whose statement is short but whose proof is long. The idea is that such conjectures are compressing something: a statement small enough to remember, a proof deep enough to be substantive.</p><p>Hong is politely unimpressed with this proxy. &#8220;The elegance filter in that paper is based on length,&#8221; she says. &#8220;But the real sense of elegance is something much more subtle.&#8221;</p><p>What that something is, is the unsolved problem. Within mathematics, elegance has a cluster of partial markers. An elegant result connects previously unrelated domains. An elegant proof reveals why the theorem holds, not merely that it holds. An elegant theorem has non-obvious consequences. None of these markers are easily formalized as reward signals. They depend on what the field considers surprising, which depends on what has been proved before, which changes over time.</p><p>For a company whose core thesis is that verification is the missing capability of AI, the elegance problem is a specific embarrassment. Verification is exactly what formal methods do well. Elegance is not verifiable. There is no compiler that accepts elegant proofs and rejects ugly ones. A proof from The Book type-checks in Lean the same way a brute-force case analysis does.</p><p>Hong does not have a solution to this. Neither does anyone else. What Axiom has is a hypothesis about where the solution might come from. The hypothesis is that Ken Ono and other high-taste mathematicians, working alongside the system, can provide training signals that a pure self-play loop cannot. A conjecturer model trained on what Ono finds interesting, rather than on what the prover finds provable, might inherit a weak version of his taste. It would not match him. But it might, over time and across millions of examples, learn something about which mathematical patterns are worth pursuing.</p><p>This is the bet implicit in Ono&#8217;s job title. The Founding Mathematician is not there to do mathematics in the conventional sense. He is there to be the human distribution from which the machine&#8217;s sense of beauty is, eventually, distilled.</p><h3>What Beauty Reveals</h3><p>There is a reason this question matters beyond Axiom.</p><p>The AI-for-math field has spent most of its recent history focused on proving given theorems, and within that subfield, focused on competition-style theorems where the elegance filter is provided by the competition designers. This has produced striking results. It has also produced a narrow conception of what mathematical reasoning looks like. The proofs that win IMO gold and Putnam medals are sharp but conventional. The mathematics that matters, over decades, is almost always the mathematics that first looked like a conjecture from an odd angle, followed by a long effort to prove it. The Riemann Hypothesis was an aside in a paper about prime numbers. Fermat&#8217;s Last Theorem was a comment scribbled in a book margin. Both reshaped their fields because they turned out to connect structures that nobody had connected before.</p><p>An AI that can prove anything you put in front of it, but cannot propose anything worth proving, is not yet doing the central work of mathematics. And an AI that can propose trivially-true conjectures efficiently is further from the goal than people often assume. The distance between &#8220;provable&#8221; and &#8220;important&#8221; is the entire domain.</p><p>This is why Erd&#337;s&#8217;s Book is not a curiosity but a research problem. To build an AI mathematician that produces The-Book-class proofs is to solve a problem that includes, as a special case, nearly every remaining question about machine creativity. The proofs in The Book are short where short proofs are surprising, long where length illuminates, and they connect things. They are the output of taste applied over decades of training. Whether that taste is a pre-training phenomenon, a post-training phenomenon, a specific architectural feature, or something that requires a kind of engagement with the mathematical community that current systems cannot participate in, is a question nobody in the field can yet answer.</p><p>Hong&#8217;s hypothesis is that it is largely a data problem, and that Axiom&#8217;s combination of formal verification, conjecturing systems, and Ken Ono&#8217;s taste can, over time, generate enough labeled examples of &#8220;interesting&#8221; versus &#8220;uninteresting&#8221; conjectures that a model can learn the distinction. This is a credible hypothesis. It may also be wrong.</p><h3>The Diagram and the Thousands of Lines</h3><p>At the opening of this piece, Evan Chen drew a single diagram that made a Putnam problem resolve itself, while AxiomProver produced a proof of thousands of lines that verified the same result without ever seeing the geometric picture. Both proofs were correct. Both earned full marks.</p><p>One of them was from The Book. The other was not.</p><p>The question Axiom is quietly, carefully, and without hyperbole trying to answer is whether taste, so far a uniquely human capacity, can become a machine capacity. If it can, the consequences extend beyond mathematics into every domain where verification is cheap and taste is not. If it cannot, then the AI industry will have built an enormous machinery for proving things while remaining structurally unable to say which things are worth proving, which is roughly the state the field has been in for its entire history, now accelerated.</p><p>Ken Ono keeps in his office a letter. It arrived on April 7, 1984, in a rice-paper envelope from India, addressed to his father, the mathematician Takashi Ono, who was then teaching at Johns Hopkins. The sender was Janaki Ammal, the widow of Srinivasa Ramanujan. The letter was a thank-you note: Takashi had contributed, along with dozens of other mathematicians worldwide, to fund a bronze bust of Ramanujan for his widow, a memorial the Indian government had long promised but failed to build. Ken was sixteen at the time, a troubled high school student trying to convince his parents to let him drop out. He would later write that he had never seen his ordinarily stoic father so visibly moved.</p><p>The letter now hangs framed above his desk. It is a reminder, of a kind that no training dataset can easily encode, of what kind of mathematics Ken Ono believes is worth doing, and why. The letter does not itself provide a reward signal. But it hints at the shape of one. The mathematics that is worth doing is the mathematics whose story, like Ramanujan&#8217;s, reaches through a century and breaks the composure of a stoic mathematician opening a rice-paper envelope in suburban Baltimore.</p><div><hr></div><p><em>Next: Part 3 tells the story of Carina Hong herself, from a Guangzhou classroom where a child doodled &#8220;MIT&#8221; on a math scratch sheet to a twenty-five-year-old CEO who raised $1.6 billion on a bet she describes as binary: either her company succeeds completely, or it fails completely, and there is no middle.</em></p>]]></content:encoded></item></channel></rss>