The Gap

What “deployed” actually means. The chasm between demo videos and commercial reality.

Feb 10, 2026

Chapter 10 of A Brief History of Embodied Intelligence

“It works. But ‘works’ and ‘works at scale’ are different sentences.” — Agility Robotics engineer, 2024

The warehouse in Flowery Branch, Georgia, was the size of several football fields, a concrete cathedral of commerce humming with the din of conveyor belts, the beeping of forklifts backing up, and the squeak of sneakers on polished floors. Three hundred human workers moved through its aisles, picking, packing, and shipping apparel for the women’s brand Spanx. The air smelled of cardboard and shrink-wrap.

In one corner, behind a cordon of yellow safety tape, something different was moving.

A bipedal robot named Digit, five feet nine inches tall, headless, with backward-bending knees that gave it the look of a large flightless bird, lifted a plastic tote from an autonomous mobile robot, carried it three meters, and placed it on a conveyor belt. It did this again. And again. And again. No human hand guided it. No operator sat behind a screen. The robot was working.

Peggy Johnson, CEO of Agility Robotics, had been waiting years for this moment. A former executive vice president at Microsoft, Johnson had joined Agility in 2022 with a mandate to take the company’s research robot and turn it into a commercial product. She had spent her career bringing technology to market, not inventing it, but shipping it. And in June 2024, she shipped Digit to GXO Logistics.

“The only metric that matters,” Johnson said, “is delivering value to our customers by putting Digit to work.”

Agility called it “the world’s first commercial humanoid deployment.” TIME Magazine named Digit one of the Top 200 Inventions of 2024. Press releases described a monumental step toward a future where humanoid robots are indispensable to industry worldwide.

All of that was true. But walk past the cordon, and the picture becomes more complicated.

Digit handled one type of tote, a specific size and weight, and placed it in one type of location along a pre-mapped path. The three hundred human workers on the other side of the yellow tape did everything else: sorting, quality-checking, packing fragile items, dealing with the thousand small surprises that fill a workday. Digit was doing the job of perhaps two of them.

By November 2025, Agility announced that Digit had moved over 100,000 totes, proof of sustained, reliable throughput in a commercial environment, not a staged demo. But 100,000 totes over eighteen months was roughly what a single experienced warehouse worker might move in a few weeks. The gap between “first deployment” and “commercially competitive” remained vast.

This is the state of the robot revolution in early 2026: real enough to invest in, too limited to depend on.

Fifty Companies, Ten Pilots, Zero Profits

The previous chapters told the story of acceleration. Foundation models gave robots common sense. Startups raised billions. Tesla threw its manufacturing machine at the problem. China deployed its cost advantage. By the end of 2023, more than fifty companies worldwide had declared humanoid robot ambitions. Capital was flowing. Demos were dazzling. The future seemed close.

Now it’s time to tell the honest story of what happened next.

By mid-2025, of those fifty-plus companies, fewer than ten had placed a humanoid robot into a real commercial environment where it performed useful work without constant human supervision. Fewer still had repeat customers. None had reached profitability from robot deployments.

The numbers were real but small. Agility had its Digit fleet at GXO, moving totes. Figure AI had completed an eleven-month deployment at BMW’s Spartanburg plant, where Figure 02 robots ran ten-hour shifts loading sheet metal parts into welding fixtures. In total: 1,250 operational hours, 90,000 parts loaded, contributing to the production of 30,000 BMW X3 vehicles. UBTECH in China had reached a genuine milestone: 1,000 Walker S2 units manufactured at its Liuzhou plant, with over 500 delivered to customers and 800 million yuan in orders.

These were not lab experiments. They were not promotional videos. They were robots doing work that someone was willing to pay for. That distinction mattered enormously. In an industry saturated with demos, any deployment that generated actual revenue was significant.

But in every case, when you looked past the press release, you found the same constraints.

At GXO, Digit worked in isolation, no humans in its zone, one task type, pre-mapped paths. At BMW, Figure 02 performed a single repetitive task: picking sheet metal from racks and placing it on fixtures. Figure itself described this as “a classic pick-and-place task.” The robot met its cycle time targets, each load completed within eighty-four seconds, and achieved a 99 percent success rate per shift in accurate placement. Impressive reliability. But the task itself was something a $50,000 industrial robot arm could also do, and had been doing for decades. The humanoid form factor wasn’t necessary for this particular job. It was being tested to see if it could be sufficient for this job while eventually being flexible enough to do others.

And at UBTECH, the volume numbers were striking. A thousand units was a genuine manufacturing milestone. But skepticism persisted. When UBTECH released footage of hundreds of Walker S2 robots moving in synchronized formation inside a warehouse, Figure’s own CEO Brett Adcock publicly questioned whether the video was computer-generated. UBTECH responded with unedited drone footage, and the debate became a microcosm of the industry’s credibility problem: in a field where marketing routinely outpaces engineering, even real achievements were met with suspicion.

The Demo Problem

The credibility problem had a name in the industry, though no one wanted to say it too loudly: the demo problem.

Demo videos had become the currency of the humanoid robot world. A robot folds a shirt. A robot cooks an egg. A robot dances to music, plays drums, pours coffee. Each video went viral, each funding round referenced the latest capabilities shown on camera. Investors watched demos. Journalists wrote about demos. The public formed its expectations from demos.

But demos and deployment are different things.

A demo lasts minutes. A deployment lasts months. A demo can be repeated until it works. A deployment has to work the first time, every time, ten hours a day. A demo can be performed in a controlled lab with perfect lighting, known objects, and a team of engineers standing just off-camera. A deployment happens in a warehouse where forklifts roar past, lighting changes by the hour, and objects arrive in positions no one anticipated.

The gap between these two realities was enormous, and the industry knew it.

Consider Figure’s BMW deployment. After eleven months on the factory floor at Spartanburg, Figure published something unusual in the robotics industry: a candid post-mortem on what had gone wrong.

The robot’s forearm, it turned out, was the top hardware failure point. The problem was mundane and devastating in the way that real engineering problems always are: a microcontroller board inside the forearm struggled with thermal constraints in the tight packaging. Wrist actuators communicated through a distribution board and dynamic cabling that proved unreliable over thousands of cycles. In a demo, you’d never notice. The robot would pick up the sheet metal, place it beautifully, and the camera would cut. In a factory running ten-hour shifts, five days a week, the wrist cabling degraded. Connections loosened. The board overheated.

“Six months of daily runtime yielded invaluable insights for our mechanical and reliability teams,” Figure wrote, a sentence that was corporate understatement for “we discovered problems that no amount of lab testing would have revealed.” For Figure 03, the company completely redesigned the wrist electronics, eliminating both the distribution board and the cabling. The redesign was possible only because of the 1,250 hours of failures that preceded it.

None of this showed up in demo videos. The difference between a robot that loads sheet metal perfectly for a two-minute clip and a robot that loads sheet metal reliably for 1,250 hours is an ocean of engineering. Every cable that breaks after ten thousand cycles, every sensor that drifts in temperature, every motor that overheats under sustained load. These are the realities that separate demonstrations from deployments.

“There is no scale without safety,” Agility Robotics wrote in its milestone announcement. The sentence sounded like a platitude. It was actually a confession. Every hour of reliable operation had been earned through failures that no press release mentioned.

Do Robots Actually Need Legs?

In January 2025, Sanctuary AI released the eighth generation of its Phoenix robot. The announcement included a detail that, in most industries, would be unremarkable. In the humanoid robot industry, it was a bombshell.

Phoenix Gen 8 had wheels.

Not legs. Wheels.

Sanctuary’s previous generations had been designed with bipedal locomotion in mind. Phoenix was, after all, supposed to be a humanoid robot, human-shaped, human-sized, designed for human environments. But Gen 8 rolled on a wheeled base instead. The company explained, with notable understatement, that this was “a design decision informed by customer feedback that bipedal legs are too frail to support a strong torso needed to carry out useful work precisely and safely.”

Customer feedback. The customers, the people who would actually buy and deploy these robots, had looked at bipedal walking and said: no thanks.

Geordie Rose, Sanctuary’s co-founder and then-CEO, had addressed this question earlier, in an interview with IEEE Spectrum. “Value as a worker is not primarily defined by legs,” he argued. Sanctuary’s philosophy was that hands and intelligence were the valuable parts. Legs were, for now, a solved problem you could outsource. Wheels worked fine for most factory floors. When someone eventually perfected a walking algorithm, Sanctuary could add it later. The important thing was to get useful robots into the field now.

By the time Gen 8 shipped, Rose was no longer CEO. In November 2024, Sanctuary’s board had replaced him with James Wells, the company’s chief commercial officer. The reasons were not publicly detailed, but the timing was revealing: the company that had been founded on the vision of “the world’s first human-like intelligence in general-purpose robots” was now led by a commercial executive, shipping a robot on wheels. The visionary had been replaced by the pragmatist. It was a small story about one company, but it was also the story of the entire industry in miniature.

This was more than one company’s product decision. It was a data point in the most interesting design debate of 2025: does a humanoid robot actually need to be humanoid?

The argument for legs was intuitive. Human environments are built for humans. There are stairs, curbs, uneven surfaces, narrow passages, thresholds between rooms. A wheeled robot can’t climb stairs. It can’t step over debris on a construction site. It can’t navigate the cluttered floor of a home. If robots were going to work in the full range of human spaces, they needed human-like mobility.

The argument against legs was practical. Bipedal walking is extraordinarily difficult to do reliably, as Chapter 2 explored at length. It adds cost, complexity, weight, and points of failure. Most near-term commercial environments, warehouses, factories, distribution centers, have flat, smooth floors. In these settings, legs were not an engineering requirement. They were overhead.

Bain & Company’s 2025 analysis put it directly: “The most promising short-term value lies in hybrids, human-like perception with wheeled platforms.” The pragmatic middle ground was emerging: a torso with dexterous arms and hands, mounted on a wheeled base. Cheaper, more stable, adequate for eighty percent of warehouse and factory tasks. The other twenty percent, stairs, construction sites, disaster zones, could wait.

Amazon’s milestone, reached in July 2025, was perhaps the most illuminating data point in the entire debate, though it got less attention than any humanoid demo video.

The company announced its one millionth deployed robot. One million. It was, by any measure, the most successful deployment of autonomous machines in commercial history. These robots moved shelves, sorted packages, transported goods across warehouses that spanned millions of square feet. They had transformed Amazon’s logistics from a labor-intensive operation into a hybrid human-machine system of extraordinary efficiency.

And not a single one of those million robots had legs. Not a single one was humanoid. They were squat wheeled platforms, robotic arms on rails, autonomous carts, purpose-built machines that did specific jobs superbly. The contrast with the humanoid industry was stark: fifty humanoid companies, fewer than ten at pilot scale, perhaps a few thousand units deployed worldwide. Versus one company, one million non-humanoid robots, generating billions in operational value. If you wanted to know what “proven” looked like in robotics, it looked like wheels and rails, not bipedal walking.

The companies betting on legs, Agility, Figure, Tesla, Boston Dynamics, had a response. The warehouse of today might not need legs, but the warehouse of tomorrow would have stairs, mixed levels, and shared spaces with humans. The home needed legs. The construction site needed legs. The military needed legs. The long-term market was vastly larger for a robot that could go anywhere a human could.

This was probably true. But in business, the long term has a way of never arriving for companies that can’t survive the short term.

What Actually Works

Amid the hype and the debates, a pragmatic picture was emerging of where robots could actually deliver value. The picture was narrower than the press releases suggested, but it wasn’t nothing. And it had a pattern.

The clearest success was warehouse logistics: moving totes, handling packages, loading and unloading. Agility’s Digit was the proof point, but the reason went beyond any single robot. A warehouse floor is flat. The objects are standardized. The paths can be mapped. The perception-decision-action loop that the previous chapters traced, the robot sees, decides, and acts, is narrowed to almost nothing: see tote, pick up tote, carry tote, place tote. When you made the loop narrow enough, even current AI could close it reliably, thousands of times.

Automotive assembly was next. At BMW’s Spartanburg plant, Figure 02 loaded sheet metal. At Chinese automakers, BYD, Geely, FAW Volkswagen, UBTECH’s Walker S2 tended machines. At Mercedes-Benz, Apptronik’s Apollo was being tested. What these factories shared was structure: known layouts, predictable objects, clear success metrics, and tasks that were ergonomically unpleasant for humans. The robot didn’t need to improvise. It needed to repeat.

Factory inspection was emerging as a quieter success, and an instructive one. Robots that walked or rolled through facilities checking equipment, reading gauges, monitoring conditions required perception without physical interaction. No grasping, no manipulation, no contact. This made the task dramatically easier, and several companies were deploying inspection robots without much fanfare, precisely because the work was unglamorous.

There was an irony here worth noting. The most commercially successful robot in all of healthcare, the da Vinci surgical system, which had generated billions in revenue and was used in hospitals worldwide, was not humanoid at all. It was a set of specialized arms controlled by a surgeon. The most successful warehouse robots were wheeled platforms. The most promising industrial robots were doing one task each. The pattern was clear: success in robotics came not from generality but from specificity, not from mimicking the human form but from solving the human’s problem.

And then there was everything else. Home. Outdoor spaces. Unstructured public environments. For these, the honest assessment was: essentially zero commercial deployment of humanoid robots, with no clear timeline for change. The frontier of the robot revolution was a Spanx warehouse in Georgia. There was something both inspiring and absurd about that.

“General-purpose” was what every company said they were building. “Task-flexible in structured environments” was what was actually being deployed. The gap between those two phrases was the story of the industry.

The Economics of Robot Labor

Understanding why the gap persisted required understanding the economics. And the economics were more nuanced than the simple narrative suggested.

The simple narrative: robots are expensive now, but costs will come down. When a robot costs less per hour than a human worker, adoption will explode. Every investor deck said some version of this. And it wasn’t wrong, exactly. But it missed the point.

Agility pioneered the Robots-as-a-Service model: don’t sell the hardware, sell labor-hours. A warehouse operator didn’t want to buy a $100,000 robot. They wanted to pay for totes moved, hours worked, tasks completed. RaaS converted a capital expense into an operating expense, making the math easier. But the math was still hard. Digit needed to charge for roughly fifteen minutes out of every operating hour. It could do one task. A human worker could switch roles mid-shift, solve problems, and didn’t need to plug in.

The real comparison, though, wasn’t “robot versus human.” It was “robot versus empty station.”

In 2025, warehouse workers were increasingly scarce. Populations in developed countries were aging. Birth rates had fallen below replacement across most of Europe and East Asia. In the United States, warehouse turnover rates exceeded 100 percent annually in some regions, meaning the average worker left within a year. Companies weren’t choosing between a human and a robot. They were choosing between a robot and nobody.

“Available when humans aren’t.” That was the value proposition. Not cheaper. Not better. Present.

But even this framing had limits. A robot that could only move totes was competing not just with scarce human labor but with existing automation, conveyor systems, AMRs, robotic arms, that was often cheaper and more reliable for that specific task. The humanoid form factor was expensive precisely because it was designed for generality. But it could only justify that expense by performing multiple tasks in the same facility: tote moving today, shelf stocking tomorrow, quality inspection next week.

No company had demonstrated this multi-task flexibility in a commercial environment. Not yet.

The Convergence

Despite the gap, there was a pattern forming that the skeptics tended to miss.

In December 2025, Agility announced that Digit would be deployed at Mercado Libre, Latin America’s largest e-commerce company. The task was familiar: tote handling in a fulfillment warehouse. But the significance was in the expansion: a second major customer, a second continent, the same type of work.

“We started with one task at one site,” Daniel Diez, Agility’s chief business officer, said of the Mercado Libre deal. The ambition was clear: prove reliability in narrow tasks first, then expand the task set gradually.

Look at the deployments that actually existed, Digit at GXO, Figure 02 at BMW, UBTECH at Chinese automakers, Apptronik’s Apollo being tested at Mercedes-Benz, and you noticed this same convergence. Semi-structured industrial environments. Repetitive physical tasks. Cordoned safety zones. Single-task deployments with ambitions for multi-task flexibility. The phrase being used internally was “one body, many tasks.” But what was actually happening was “one body, one task, with the option of maybe adding a second task next quarter.”

This was less exciting than the vision of a general-purpose robot that could do anything a human could do. But it was honest. And it mapped onto how previous automation technologies had succeeded. Industrial robot arms started with spot welding and gradually expanded to painting, assembly, inspection. Autonomous mobile robots started with one function before expanding to multiple warehouse workflows. The pattern was: prove one thing works, then add more.

There was a deeper reason this pattern held, and it connected to the central argument of this book. The perception-decision-action loop, the cycle of seeing, choosing, and acting that every chapter has traced in different form, is what determines whether a robot can do a task. The deployments that worked in 2025 were the ones where this loop was narrowest: see tote, decide to pick up tote, pick up tote. The perception was simple (one type of object). The decision was simple (one type of action). The action was simple (one type of movement). Current AI could close that loop reliably. Widen the loop even slightly, add a second object type, an unexpected obstacle, a judgment call, and reliability dropped. The gap between “one narrow task” and “two narrow tasks” was far larger than it appeared.

What made the current moment different from previous automation waves was the foundation model revolution. As Chapter 11 will explore, the same scaling dynamics that had transformed language AI were beginning to appear in physical AI. Foundation models promised to widen the perception-decision-action loop, to let robots perceive more, decide better, and act more flexibly. The “one task” robots of 2025 were deployed on hardware that was increasingly capable. The limitation wasn’t the body. It was the brain. And the brain was improving on a curve that looked, for the first time, like it might be exponential.

The gap was real. But it was narrowing on a specific vector: semi-structured industrial tasks, foundation model-powered flexibility, RaaS economics. The question was whether it would narrow fast enough.

Watching the Right Numbers

For anyone trying to assess whether this revolution was real or just another cycle of hype and disappointment, there was a simple test: ignore the demos, watch the deployments.

Specifically, watch three numbers.

First, deployment hours. Not units shipped, not production announcements, not demo videos. Actual hours of autonomous operation in commercial environments. Agility’s 1,250 hours at GXO and Figure’s 1,250 hours at BMW were the only independently verifiable numbers in the industry as of late 2025. When those numbers started appearing from more companies, at more sites, the revolution would be real.

Second, repeat customers. A pilot is a test. A second deployment at the same customer is validation. When GXO expanded Digit to additional facilities, or when BMW ordered Figure 03 after retiring Figure 02, that meant the technology was delivering enough value to justify continuation. First-time pilots were common. Repeat deployments were rare. Watch for the ratio to shift.

Third, task count per site. The “general-purpose” promise would be tested not by how many robots shipped, but by how many different tasks a single robot performed in a single facility. One task was a specialized machine. Two tasks were promising. Three or more tasks at the same site, autonomously, would be a genuine inflection point.

By these measures, the industry in early 2026 was at the beginning of the beginning. Real, but early. Promising, but unproven at scale. The kind of moment that, looking back in a decade, would either be remembered as the false dawn before another winter, or as the first chapter of the most transformative deployment of machines in human history.

The demos couldn’t tell you which. Only the deployments would.

Notes & Further Reading

On Agility Robotics’ GXO deployment: Agility’s blog posts from June 2024 (”Digit Deployed at GXO”) and November 2025 (”Digit Moves Over 100,000 Totes”) provide the company’s account. GXO’s press releases and Supply & Demand Chain Executive’s 2024 award coverage offer the customer perspective. Robot Report’s 2025 RBR50 profile offers independent assessment.

On Figure AI’s BMW deployment: Figure’s November 2025 post-mortem on the Figure 02 retirement is unusually candid about hardware lessons learned. BMW’s August 2024 press release and Manufacturing Dive’s coverage provide the manufacturer’s perspective. Fortune’s reporting on discrepancies between Figure’s claims and BMW’s actual deployment scope offers important context.

On UBTECH’s Walker S2 production: UBTECH’s November 2025 press release and December 2025 1,000-unit milestone announcement detail the production numbers. The controversy over video authenticity, including Brett Adcock’s CGI accusations and UBTECH’s response, is documented across multiple technology publications.

On Sanctuary AI’s wheeled pivot: Sanctuary’s January 2025 Gen 8 announcement explains the design decision. IEEE Spectrum’s extended interview with Geordie Rose from 2023 provides deeper context on the company’s philosophy regarding legs versus intelligence.

On the form factor debate: Bain & Company’s 2025 analysis of humanoid robot markets articulates the hybrid case. IEEE Spectrum’s October 2025 piece “Why Humanoid Robots Aren’t Scaling” presents the strongest skeptical argument. Agility’s counter-arguments appear in various conference presentations and blog posts.

On RaaS economics and labor shortages: The Robots-as-a-Service model and warehouse labor economics are analyzed in McKinsey’s 2025 automation reports and various logistics industry publications. Amazon’s one-millionth robot milestone, announced July 2025, provides scale context for existing non-humanoid automation. U.S. Bureau of Labor Statistics data on warehouse turnover rates documents the labor shortage driving adoption.

On Agility’s expansion: Agility’s December 2025 announcement of the Mercado Libre deployment, and the company’s OSHA-recognized NRTL approval for Digit, mark the expansion from single-customer pilot to multi-customer operations.

On the da Vinci surgical system: Intuitive Surgical’s financial reports document the most commercially successful robot platform in history, a useful counterpoint to the humanoid form factor debate.

Robonaissance

Discussion about this post

Ready for more?