The Beauty of Mathematics, Part 2: Proofs from The Book

Erdős believed God kept a book of the most beautiful proofs. The question Carina Hong’s company is quietly confronting is whether a machine can learn to read it.

Apr 22, 2026

This 3-part deep-dive series explores a four-hour interview with Carina Hong, hosted by Xiaojun Zhang in 2026. To better convey the depth and ideas of the conversation, I have reorganized the narrative, added background context, and clarified some of the more technical points.

On the morning of December 6, 2025, in a conference room named after Poincaré, Evan Chen looked at the first problem from that day’s Putnam exam and drew a figure on a piece of paper. Chen is a coach of the U.S. International Mathematical Olympiad team. The other people in the room, most of them Axiom employees, looked at the drawing. They understood immediately how the problem would resolve. One diagram. Then they fed the same problem to AxiomProver.

The AI did not find the diagram. It produced a solution that ran to thousands of lines of formal Lean code: a brute-force, case-by-case, step-by-step construction that verified the result without ever seeing the geometric picture. Both answers were correct. Both would receive full marks. But the mathematics underneath the two proofs had almost nothing in common.

Paul Erdős, the most prolific mathematician of the twentieth century, believed God kept a book. The Book contained the most beautiful proof of every theorem. Erdős never claimed to have seen The Book. He claimed only that when a proof was particularly elegant, it came “from The Book,” and when a proof was ugly he sometimes wondered whether the result it established might not in fact be false. This was not a religious statement. Erdős was a committed atheist. It was an aesthetic one: that among the infinity of valid proofs for a given theorem, some are structurally privileged. They reveal what the theorem is really about. They generalize. They connect. They surprise.

Axiom Math’s AxiomProver does not have a concept of The Book. Its proofs are valid, which is to say they type-check in Lean and establish the theorems they claim to establish. Many of them run to thousands of lines. Where a human Olympian would draw a single diagram, AxiomProver often produces exhaustive case analysis that no trained mathematician would. Its proofs are correct. And they are, in the specific sense Erdős meant, ugly.

Whether this matters is the question at the center of Part 2 of this series. It is also, in Carina Hong’s telling, the question at the center of the gap between what Axiom has built and what Axiom is trying to build. An AI that can prove theorems, given the theorems, is a remarkable achievement but not yet a mathematician. A mathematician is someone who knows which theorems are worth proving, and among the proofs of a given theorem, knows which ones are worth reading. That second kind of judgment has a name in mathematical culture, old and respected: taste. And taste, so far, is not something the current paradigm has learned to produce.

The Brute-Force Artist

Hong introduces herself as a brute-force type. This is not false modesty, and it is not a pose. It is, in her telling, the formative fact of her mathematical adolescence.

At Chinese olympiad training camps as a teenager, Hong had a problem. The first question on every round of the national mathematics olympiad was a Euclidean geometry problem. It was regarded as the guaranteed question. Miss it, Hong reports, and third prize was out of reach. Most competitive students solved these problems the way competition geometry is traditionally solved: by seeing a construction. You looked at the figure, noticed that a certain line would bisect a certain angle or that two triangles were similar, drew the auxiliary line, and the proof followed in a page.

Hong could not see the constructions. Her brain, she reports, is more comfortable with algebraic symbols than with geometric figures. Faced with a geometry problem, she fell back on what is called the complex number method: assign complex coordinates to every point, translate every geometric statement into algebraic identities among those coordinates, and grind through the algebra. The method is reliable. It is general. It is also deeply inefficient. Where a classmate might write a page using the inscribed angle theorem and similar triangles, Hong would write three pages of polynomial manipulation. Where they finished in ten minutes, she finished in forty, which in a timed competition meant she usually had to skip a later problem.

“I could not see the underlying geometry,” she says. “I would solve the problem without ever understanding what it was about.”

This is the kind of self-description that could be easily flattened into a generic founder story about overcoming adversity. Hong resists that flattening. She reports the complex-number method not as a deficit she overcame but as a methodology she still uses, for a specific reason: it works. It produces correct answers. It is systematic. What it sacrifices is the insight into why the result holds, the feeling of having seen the structure, the thing that competition geometers mean when they say a solution is “natural.” The feeling has a function. It signals that the technique you just used might generalize. The grinding algebraic method produces no such signal. It just gives you the answer.

She was, in the sense the rest of this series will keep returning to, an early version of an AxiomProver in a human body. The algorithm worked. The beauty did not.

Why the Machines Proved Like Hong

In January 2024, Google DeepMind published a paper in Nature describing a system called AlphaGeometry. The system could solve competition-level Euclidean geometry problems at something approaching the level of an IMO gold medalist. On a benchmark of thirty IMO geometry problems from the past two decades, AlphaGeometry solved twenty-five. The successor system, AlphaGeometry 2, solves eighty-three percent of IMO geometry problems from 2000 to 2024.

The architecture of AlphaGeometry is worth describing because it tells a specific story about the space of possible solvers. The system has two components. A symbolic deduction engine maintains a database of facts about the geometric figure and applies deduction rules until either the conclusion is proved or no new facts can be derived. A neural language model, trained on synthetic geometry problems, suggests auxiliary constructions when the deduction engine gets stuck: a new point here, a new line there, a circle drawn through three specific vertices. The language model’s job is to propose the construct that will unlock deduction. The deduction engine’s job is to use the construct to reach the conclusion.

What makes this architecture notable, from the perspective of this series, is not what it does but what it does not do. It does not look at the figure. It does not see the geometric structure. The language model suggests constructions based on statistical patterns learned from synthetic training data, and the deduction engine grinds through algebraic consequences. The AI never sees the diagram in the sense that a human geometer does. It sees a list of predicates and proposes a list of additions. The proof that emerges is rigorous and often long.

Hong saw AlphaGeometry’s approach and recognized it immediately. “Its philosophy is the same as my approach,” she says. “Convert the geometry into symbolic expressions. Solve it algebraically. The diagram never enters the reasoning.”

This is not a criticism of DeepMind. AlphaGeometry is a remarkable piece of engineering, and its approach is a reasonable one given the tools available. The point is deeper. The dominant paradigm in AI for geometry has converged on Hong’s adolescent method, and for the same reason. Constructions are hard. Symbolic grinding scales. If you cannot see the figure, algebra is the only thing left.

But competition geometry is not the whole of mathematics, and the gap between AlphaGeometry’s solutions and the solutions Erdős would have called beautiful is not a bug in DeepMind’s system. It is the frontier.

Competition Math Is Not Research Math

There is a structural distinction, well known inside mathematics and less well understood outside it, between the kind of mathematics tested in competitions and the kind of mathematics that happens in research. The distinction matters here because the entire recent wave of AI-for-math results, from AlphaGeometry to AlphaProof to AxiomProver, has been focused on competition mathematics. This is not an accident. Competition mathematics has a property that research mathematics lacks: the problems are given.

In a competition, someone else has selected a set of problems, verified that they have solutions within the ambient body of undergraduate mathematics, and calibrated their difficulty. The solver’s job is to find the solution. The problems are hard but bounded. A human or a machine that can produce correct proofs for Putnam or IMO problems has demonstrated something real about proof generation.

In research, nobody has selected the problems. The mathematician’s first and hardest task is to decide what to try to prove. Most mathematical questions, if phrased precisely, are either trivial or impossible. The interesting ones, the ones that produce the results we remember, are rare. A researcher’s primary output, over the course of a career, is not proofs. It is the sequence of questions they chose to attempt, and the small fraction of those questions that turned out to be both true and important.

Hong is explicit about this distinction. “We have built a very good prover,” she says. “The hard problem, the one we cannot yet do well, is the conjecturer.”

A conjecture, in the technical sense, is a precisely stated proposition that the person proposing it believes to be true but has not yet proved. The Riemann Hypothesis is a conjecture. So is Goldbach’s. Most conjectures never become famous, because most conjectures turn out to be true and unimportant, or false and unimportant, or impossible to resolve with current methods and therefore shelved. The conjectures that end up mattering are the ones that, if true, would reveal a structure: a connection between subjects that appeared unrelated, a pattern that extends to cases never examined, a reason why known results hold.

Producing such conjectures requires something that looks very much like aesthetic judgment. The mathematician has to pattern-match across a vast body of known results, notice a regularity, propose that the regularity extends, and only then attempt to prove the proposal. The first three steps are where mathematics mostly happens. The proof is the cleanup.

An AI that can do the cleanup is, for Axiom’s commercial purposes, already useful. An AI that can do the first three steps is, in Hong’s framing, something different. It is what the phrase “mathematical superintelligence” actually means, and it is why she spent her first six months of operation hunting for a specific kind of human hire.

The Conjecturer

The headline that made Hong briefly famous in Chinese tech media arrived in 2025, framed in the way headlines arrive: “Fifty-Seven-Year-Old Tenured American Professor Quits to Work for Twenty-Four-Year-Old Chinese Woman.” The professor was Ken Ono, an American mathematician born in Philadelphia in 1968. Until 2025 he held the Marvin Rosenblum Professorship at the University of Virginia and had served as Vice President of the American Mathematical Society. Ono has received Guggenheim, Packard, and Sloan Fellowships. He is one of the world’s leading authorities on the mathematics of Srinivasa Ramanujan, the Indian prodigy who died in 1920 at thirty-two after producing roughly four thousand theorems, most of which contemporary mathematicians are still unpacking.

Ono joined Axiom as what the company calls its Founding Mathematician. This is a deliberate title choice. Axiom has plenty of mathematicians on staff. It has one person who has spent his career, as Hong puts it, being “a high-volume conjecture machine.”

“He is a conjecturer,” Hong says. “He is very prolific. That is why we needed him specifically.”

The distinction between what Ono does and what other mathematicians do is worth naming carefully, because it is also the distinction between what current AI systems can do and what Axiom is trying to build. In Hong’s framing, there are two archetypes of mathematical intuition. There is the Ramanujan type, which she calls sharper and more convergent: the mathematician who stares at a formula and knows it is true, or knows what its generalization must be, without being able to explain how he knows. This is the intuition that writes down correct identities in a notebook in Madras and mails them to Hardy in Cambridge, dozens of results per letter, no derivations, no proofs. Ramanujan’s notebooks contain theorems that took the next century to confirm.

The other type, which Hong associates with Ono, is divergent. The Ono type is not primarily in the business of knowing what is true. He is in the business of connecting known facts from disparate subjects in ways that suggest new questions to ask. He notices that a result in one area of number theory has a structural echo in another, and proposes that both are instances of a larger pattern. He produces, as a main product, interesting questions.

“Ramanujan is the sharper kind of intuition,” Hong says. “Ono has the divergent kind. The way he generates conjectures is by connecting many different perspectives.”

This distinction is not neutral. It is the distinction between the capability Axiom can plausibly train and the capability that might require something the field does not yet know how to build. Ramanujan’s intuition was, as Hong and her team suspect, a phenomenon that would now be called a pre-training product. Ramanujan absorbed mathematics, particularly classical number theory, through a kind of total immersion. He read George Shoobridge Carr’s A Synopsis of Elementary Results in Pure and Applied Mathematics and worked through every theorem independently. His intuitions emerged from that saturation. The conjectures came out whole.

In December 2025, Ashish Vaswani, co-author of the original Transformer paper, released Essential AI’s first model. He named it Rnj-1, pronounced “range-one,” after Ramanujan. The choice was deliberate, and the thesis behind it was explicit: Essential AI is betting on pre-training. In an industry that has largely moved toward reinforcement learning as the dominant post-training objective, Vaswani has staked a contrarian position that intuition, the quality that lets a model produce the right continuation in a novel setting, is built during the pre-training phase through raw exposure to data at scale.

Axiom does not do pre-training. Axiom starts from open-source pre-trained models and does extensive post-training: supervised fine-tuning, reinforcement learning with formal verification as the reward signal, and the orchestrator-subagent system described in Part 1. This is a choice about where Axiom’s comparative advantage lies. It is also an acknowledgement that the Ramanujan-style intuition may not be something post-training can produce. The divergent conjecturing that Ono does, by contrast, looks more like a process Axiom can model: identify patterns across many training examples, propose extensions, test them.

But even the divergent version is hard. And it runs into the problem that has quietly stalked AI-for-math research since the field’s earliest days.

The Elegance Problem

Suppose you build a system that generates conjectures. Each conjecture is a precisely stated proposition. You want to use the system in a loop: generate a conjecture, try to prove it with your prover, and use the proof success as a reward signal for training better conjecturers. This is the structure of self-play proving, laid out in a 2025 paper by Kefan Dong and Tengyu Ma at Stanford, which Hong cites as a starting point for all serious work in the area.

The structure looks clean. In practice, it has a fatal problem. The prover’s reward signal is binary: the conjecture was proved, or it was not. If the conjecturer learns to optimize that signal, it will quickly discover that the easiest way to produce provable conjectures is to produce trivial ones. “The sum of two even numbers is even.” True. Provable. Useless. A self-play loop that optimizes for provability without filtering for importance produces an endless stream of true trivia, and nothing else.

The field’s term for the missing filter is elegance. The self-play paper proposed a rough proxy: measure the ratio of the proof length to the statement length. A “good” conjecture, under this proxy, is one whose statement is short but whose proof is long. The idea is that such conjectures are compressing something: a statement small enough to remember, a proof deep enough to be substantive.

Hong is politely unimpressed with this proxy. “The elegance filter in that paper is based on length,” she says. “But the real sense of elegance is something much more subtle.”

What that something is, is the unsolved problem. Within mathematics, elegance has a cluster of partial markers. An elegant result connects previously unrelated domains. An elegant proof reveals why the theorem holds, not merely that it holds. An elegant theorem has non-obvious consequences. None of these markers are easily formalized as reward signals. They depend on what the field considers surprising, which depends on what has been proved before, which changes over time.

For a company whose core thesis is that verification is the missing capability of AI, the elegance problem is a specific embarrassment. Verification is exactly what formal methods do well. Elegance is not verifiable. There is no compiler that accepts elegant proofs and rejects ugly ones. A proof from The Book type-checks in Lean the same way a brute-force case analysis does.

Hong does not have a solution to this. Neither does anyone else. What Axiom has is a hypothesis about where the solution might come from. The hypothesis is that Ken Ono and other high-taste mathematicians, working alongside the system, can provide training signals that a pure self-play loop cannot. A conjecturer model trained on what Ono finds interesting, rather than on what the prover finds provable, might inherit a weak version of his taste. It would not match him. But it might, over time and across millions of examples, learn something about which mathematical patterns are worth pursuing.

This is the bet implicit in Ono’s job title. The Founding Mathematician is not there to do mathematics in the conventional sense. He is there to be the human distribution from which the machine’s sense of beauty is, eventually, distilled.

What Beauty Reveals

There is a reason this question matters beyond Axiom.

The AI-for-math field has spent most of its recent history focused on proving given theorems, and within that subfield, focused on competition-style theorems where the elegance filter is provided by the competition designers. This has produced striking results. It has also produced a narrow conception of what mathematical reasoning looks like. The proofs that win IMO gold and Putnam medals are sharp but conventional. The mathematics that matters, over decades, is almost always the mathematics that first looked like a conjecture from an odd angle, followed by a long effort to prove it. The Riemann Hypothesis was an aside in a paper about prime numbers. Fermat’s Last Theorem was a comment scribbled in a book margin. Both reshaped their fields because they turned out to connect structures that nobody had connected before.

An AI that can prove anything you put in front of it, but cannot propose anything worth proving, is not yet doing the central work of mathematics. And an AI that can propose trivially-true conjectures efficiently is further from the goal than people often assume. The distance between “provable” and “important” is the entire domain.

This is why Erdős’s Book is not a curiosity but a research problem. To build an AI mathematician that produces The-Book-class proofs is to solve a problem that includes, as a special case, nearly every remaining question about machine creativity. The proofs in The Book are short where short proofs are surprising, long where length illuminates, and they connect things. They are the output of taste applied over decades of training. Whether that taste is a pre-training phenomenon, a post-training phenomenon, a specific architectural feature, or something that requires a kind of engagement with the mathematical community that current systems cannot participate in, is a question nobody in the field can yet answer.

Hong’s hypothesis is that it is largely a data problem, and that Axiom’s combination of formal verification, conjecturing systems, and Ken Ono’s taste can, over time, generate enough labeled examples of “interesting” versus “uninteresting” conjectures that a model can learn the distinction. This is a credible hypothesis. It may also be wrong.

The Diagram and the Thousands of Lines

At the opening of this piece, Evan Chen drew a single diagram that made a Putnam problem resolve itself, while AxiomProver produced a proof of thousands of lines that verified the same result without ever seeing the geometric picture. Both proofs were correct. Both earned full marks.

One of them was from The Book. The other was not.

The question Axiom is quietly, carefully, and without hyperbole trying to answer is whether taste, so far a uniquely human capacity, can become a machine capacity. If it can, the consequences extend beyond mathematics into every domain where verification is cheap and taste is not. If it cannot, then the AI industry will have built an enormous machinery for proving things while remaining structurally unable to say which things are worth proving, which is roughly the state the field has been in for its entire history, now accelerated.

Ken Ono keeps in his office a letter. It arrived on April 7, 1984, in a rice-paper envelope from India, addressed to his father, the mathematician Takashi Ono, who was then teaching at Johns Hopkins. The sender was Janaki Ammal, the widow of Srinivasa Ramanujan. The letter was a thank-you note: Takashi had contributed, along with dozens of other mathematicians worldwide, to fund a bronze bust of Ramanujan for his widow, a memorial the Indian government had long promised but failed to build. Ken was sixteen at the time, a troubled high school student trying to convince his parents to let him drop out. He would later write that he had never seen his ordinarily stoic father so visibly moved.

The letter now hangs framed above his desk. It is a reminder, of a kind that no training dataset can easily encode, of what kind of mathematics Ken Ono believes is worth doing, and why. The letter does not itself provide a reward signal. But it hints at the shape of one. The mathematics that is worth doing is the mathematics whose story, like Ramanujan’s, reaches through a century and breaks the composure of a stoic mathematician opening a rice-paper envelope in suburban Baltimore.

Next: Part 3 tells the story of Carina Hong herself, from a Guangzhou classroom where a child doodled “MIT” on a math scratch sheet to a twenty-five-year-old CEO who raised $1.6 billion on a bet she describes as binary: either her company succeeds completely, or it fails completely, and there is no middle.

Robonaissance

Discussion about this post

Ready for more?