Roads to a Universal World Model, Part 2: The Physicist’s Road
The physics engine path: building the world from first principles
“All models are wrong, but some are useful.” — George Box
In 1929, a twenty-five-year-old from Binghamton, New York, built the world’s first flight simulator out of organ parts.
Edwin Link had gotten his pilot’s license two years earlier, but he was frustrated by the cost and danger of learning to fly in the air. So he did something strange. He took the pneumatic bellows from his father’s piano and organ factory, mounted them beneath a stubby wooden fuselage with short wings, and connected them to a control stick and rudder pedals. When the pilot pushed the stick forward, the bellows inflated and the cockpit tilted. Pull back, it rose. The whole thing sat on a universal joint, rotating through pitch, roll, and yaw. Inside the cockpit, vacuum-driven instruments read airspeed, altitude, and heading, all fabricated from organ valves and compressed air.
The U.S. Army Air Corps ignored him for five years. Flight instructors dismissed the contraption as a toy. Then, in 1934, the Army was given a contract to fly the postal mail. Pilots who had trained in clear weather suddenly had to fly in storms and fog. Within weeks, nearly a dozen were dead. The Army remembered the young man from Binghamton and his peculiar machine. They bought six Link Trainers. By the end of World War II, over 500,000 Allied pilots had learned to fly in them.
The Link Trainer was not a world model. It was something more specific: a physics engine. It did not learn the laws of aerodynamics from data. Someone encoded those laws into the machine’s mechanical linkages, its bellows and valves and gyroscopes, so that the instrument readings would track what a real cockpit would show for a given set of control inputs. The world inside the blue box was not discovered. It was constructed.
This is the Physicist’s Road: if you know the equations governing a system, you can build a virtual copy of it and train inside that copy instead of risking the real thing. The approach is as old as engineering itself. What has changed is the scale, the fidelity, and the ambition.
From Bellows to Billions of Polygons
For decades after the Link Trainer, simulation followed a simple trajectory: more compute, more realism. Analog bellows gave way to hydraulics. Hydraulics gave way to digital computers. By the 1980s, military flight simulators cost millions and filled entire rooms, with wrap-around displays, six-degree-of-freedom motion platforms, and computational models of atmospheric physics sophisticated enough to train pilots for combat maneuvers they had never performed in real aircraft.
The same idea spread to every domain where real-world training was expensive, dangerous, or both. Nuclear reactor operators practiced meltdown scenarios in software. Surgeons rehearsed procedures on virtual patients. Automakers crash-tested digital prototypes before bending a single sheet of metal. In every case, the logic was the same: write down the physics, solve the equations, and let the computer produce a faithful replica of reality.
The key word is “write down.” This is what distinguishes the Physicist’s Road from the Dreamer’s Road. A dreamer’s world model is learned from experience. A physicist’s world model is authored from knowledge. You do not need to drop a thousand cups to predict that cups fall. You need Newton’s second law, the gravitational constant, and some information about the cup. The model is precise because the equations are precise. It generalizes perfectly within its domain because the laws of physics do not change between Tuesday and Wednesday.
By the 2010s, the games industry had pushed real-time physics simulation to remarkable levels of fidelity. Game engines like Unity and Unreal could render photorealistic environments and simulate rigid body collisions, cloth dynamics, and fluid behavior at interactive frame rates. Robotics researchers noticed. Why build expensive physical test rigs when a game engine could provide an infinite supply of training environments?
The idea was intoxicating. Train a robot in simulation, where data is free, time can be accelerated, and nothing breaks when the robot fails. Then transfer the trained policy to the real robot. Infinite practice at zero cost.
There was just one problem.
The Gap
The simulated world is not the real world.
This sounds obvious. But the depth of the discrepancy surprised almost everyone who tried to cross from simulation to reality. A robot arm trained in a simulated kitchen would fail in a real kitchen, not because it lacked skill, but because the simulated kitchen’s physics were subtly wrong. Friction coefficients were approximated. Contact dynamics between a gripper and an object were simplified. The way light hit surfaces and created shadows, the way cables sagged, the way a table wobbled slightly under pressure: none of these were modeled with perfect accuracy, because perfect accuracy would require solving equations that are either unknown or computationally intractable.
Roboticists called this the sim-to-real gap. The phrase concealed a deeper insight: the physicist’s world model is precise about the physics it includes and silent about the physics it omits. Every simulation is an act of selective attention. The engineer decides which forces matter and which can be ignored. The decision is always wrong at the margins. Gravity is easy. Friction is hard. Contact between deformable objects is harder. The interplay of temperature, humidity, and surface chemistry on the grip between a rubber fingertip and a ceramic mug is harder still.
For simple tasks, the gap was manageable. For complex manipulation, it was devastating.
The Domain Randomization Hack
The first instinct was to make simulations more accurate. Close the gap by modeling more physics, computing more interactions, solving more equations. This worked, to a point. But perfect fidelity is an asymptote. You can always add another decimal of precision, and there is always a phenomenon you have not modeled.
In the mid-2010s, researchers began trying the opposite approach. Instead of making the simulation more accurate, they made it deliberately inaccurate in a wide variety of ways. The technique, called domain randomization, rested on a counterintuitive idea: if you train a robot not in one simulation but in thousands of slightly different simulations, each with randomized friction, randomized lighting, randomized object masses, randomized delays, then the robot learns a policy robust enough to handle whatever the real world throws at it, because the real world is just one more variation it has never seen.
The technique found its most spectacular demonstration in October 2019. OpenAI announced that a robotic hand called Dactyl had solved a Rubik’s Cube, one-handed, using a policy trained entirely in simulation. The hand had never touched a real Rubik’s Cube during training. It had practiced only inside a simulated world, but that world had been randomized along every axis the researchers could think of: the size of the cube, the mass, the friction of each finger, the strength of gravity, even the color and texture of the surfaces.
The system used a technique called Automatic Domain Randomization, which went beyond manual randomization. Each time the neural network mastered the current range of conditions, the simulator automatically widened the distribution, making the next round harder. The robot had to generalize to ever more extreme physics. By the time the randomization had pushed conditions far beyond anything physically plausible, the real world, with its mundane room-temperature friction and ordinary gravity, was comfortably inside the distribution. The researchers tested robustness by tying fingers together, putting a rubber glove on the hand, and poking the cube with a stuffed giraffe mid-solve. The hand kept going.
The success was celebrated as a triumph of simulation. It was. But the celebration obscured what domain randomization reveals about the Physicist’s Road. The technique works not by making the simulation accurate, but by making accuracy irrelevant. You flood the training process with so much variation that the policy learns to be robust rather than precise. The world model does not need to be right. It needs to be wrong in enough different ways that the real world becomes just another variation.
This is a clever hack. It is not a solution. Domain randomization scales to tasks where the physics can be parameterized: friction, mass, lighting. But many real-world phenomena resist parameterization. The way a wet cloth clings to a surface. The way a screw threads into a stripped hole. The way an egg cracks. For these, you cannot just randomize a coefficient, because the relevant physics is not represented in the simulator at all.
NVIDIA’s Bet
If anyone embodies the ambition of the Physicist’s Road at industrial scale, it is NVIDIA.
The company’s transformation is one of the remarkable corporate pivots in technology history. Founded in 1993 to make graphics chips for video games, NVIDIA discovered in the 2010s that the parallel processing architecture designed to render millions of polygons per second was also ideal for training neural networks. GPUs became the hardware foundation of the deep learning revolution. But Jensen Huang, NVIDIA’s CEO, saw a further step. If GPUs could train AI, and if AI needed to understand physics, then the company that built the best physics simulators might own the infrastructure layer of the entire physical AI stack.
At GTC 2019, Huang introduced Omniverse, initially pitched as a collaborative 3D design platform built on Pixar’s Universal Scene Description format. By 2021, the vision had expanded. Omniverse was no longer just a design tool. It was, in Huang’s framing, a physics-based simulation operating system: a platform where companies could build digital twins of their factories, warehouses, and robots, then test and train AI systems inside those virtual replicas before deploying them in the real world. BMW built a digital twin of its assembly plant. Ericsson simulated 5G network coverage across entire cities. Siemens integrated its industrial automation tools directly into the platform.
The underlying physics engine, PhysX, handled rigid body dynamics, soft body interactions, and fluid simulation at GPU-accelerated speeds. Isaac Sim, built on Omniverse, provided a dedicated environment for training robot policies in simulation. Isaac Lab layered reinforcement learning frameworks on top. The stack was integrated, purpose-built, and unapologetically comprehensive.
By 2025, Huang had crystallized the vision into what he called the “three-computer” framework for physical AI. One computer, DGX, trains the AI models. A second, Omniverse running on RTX PRO servers, simulates and generates synthetic data. A third, Jetson AGX, runs the trained model on the robot itself. The three form a loop: train, simulate, deploy, collect data, retrain. Physical AI, in this framing, is not a software problem or a hardware problem. It is a systems problem, and NVIDIA intends to supply the entire system.
Then, in January 2025, Huang announced Cosmos.
Cosmos is NVIDIA’s family of open “world foundation models,” trained on twenty million hours of video depicting physical phenomena: driving, walking, manipulating, colliding, pouring, breaking. The models can generate video of plausible physical scenarios from text, image, or video prompts. They can also take the precise but visually sterile output of an Omniverse simulation and transform it into photorealistic synthetic data, bridging the perceptual gap between simulation and reality.
This is where the Physicist’s Road starts to merge with the Cinematographer’s Road, the subject of Part 3. Cosmos is not a physics engine. It does not solve equations. It generates video that looks like physics. But within NVIDIA’s pipeline, it serves a specific function: it takes the structurally accurate output of a physics simulation and wraps it in the visual diversity of the real world, so that robot perception systems trained on synthetic data transfer more effectively to physical environments. By early 2026, Cosmos models had been downloaded over two million times, and companies from Figure AI to Uber were using them to generate training data for robots and autonomous vehicles.
What the Physicist Built
The Physicist’s Road has produced real achievements. Flight simulators train virtually every commercial pilot in the world. Crash simulations have made cars dramatically safer. Digital twins of semiconductor fabrication plants optimize processes worth billions. Domain randomization has transferred dexterous manipulation from simulation to reality. And NVIDIA’s integrated stack has made physics simulation accessible to thousands of robotics developers who would never have built their own simulators from scratch.
But the road has a fundamental constraint, and it is the same constraint Ed Link faced in 1929. Someone has to write down the physics.
The Link Trainer worked because aerodynamics is well understood. Flight simulators work because the equations of flight are known. Car crash simulations work because the material properties of steel and aluminum have been characterized to extraordinary precision. Domain randomization works because the physics it randomizes, friction, mass, inertia, can be parameterized.
The constraint reveals itself at the boundary of the known. If you can write the equations, you can build the simulator. If you cannot write the equations, the simulator has a gap. And the real world is full of phenomena whose equations we do not have, or whose equations are too complex to solve, or whose relevant variables we have not identified. The feel of fabric. The behavior of granular materials. The dynamics of deformable objects in contact. The thousand small interactions that make a real kitchen different from a simulated one.
The Physicist’s Road gives you precision but not generality. You can simulate a world you understand. You cannot simulate a world you do not.
The Road Ahead
The frontier of the Physicist’s Road is not more accurate simulators. It is the fusion of simulation with learning.
NVIDIA’s Cosmos represents one version of this fusion: use physics engines for structure, then use learned models to fill the gaps. The sim-to-real gap that once seemed like a binary divide, simulation or reality, is becoming a spectrum. On one end, pure first-principles simulation: precise, interpretable, and limited to known physics. On the other end, pure learning from data: flexible, generalizable, and unable to guarantee physical consistency. The most interesting work is happening in between.
But a different community started from the other end of the spectrum. Instead of building virtual worlds from equations and then learning to fill the gaps, they trained neural networks on millions of hours of video and discovered that the networks learned physics on their own, implicitly, from pixels alone.
They did not start with the equations. They started with the camera.
Next: Part 3, “The Cinematographer’s Road,” traces the video generation path to world models, from early frame prediction to Sora and beyond, and asks: when a model trained on video learns to obey gravity, has it understood physics, or merely learned to imitate it?


