The Age of Symbolic AI
When Rules Were Enough
This is Chapter 2 of A Brief History of Artificial Intelligence.
In the summer of 1956, John McCarthy gathered a small group of researchers at Dartmouth College with an audacious proposal. The goal: solve intelligence. The timeline: eight weeks. The budget: $13,500 from the Rockefeller Foundation and other funds.
The proposal was breathtaking in its confidence: “We propose that a 2 month, 10 man study of artificial intelligence be carried out... The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
Every aspect of intelligence. Precisely described. Eight weeks.
It was McCarthy who coined the term “Artificial Intelligence” for this workshop, rejecting alternatives like “complex information processing” or “automata studies.” The name was deliberate—provocative, ambitious, clear about the goal. Not simulating intelligence. Not approximating it. Building it.
Ten men—mathematicians, logicians, computer scientists, cognitive psychologists—arrived in New Hampshire that June. Among them: Marvin Minsky from MIT, Claude Shannon fresh from inventing information theory, Allen Newell and Herbert Simon from RAND Corporation, John Nash (yes, that Nash) who stayed only briefly, and several others who would shape computing for decades.
They had a simple theory. Intelligence is reasoning. Reasoning is logic. Computers execute logic flawlessly. Therefore, encode the right rules and you’d have intelligence.
The logic was impeccable. The theory was elegant. And for certain carefully chosen problems, it actually worked.
The Logic Theorist
Newell and Simon arrived with something to prove. They had spent the previous year building the Logic Theorist, a program that could prove mathematical theorems. Not trivial ones—theorems from Whitehead and Russell’s Principia Mathematica, one of the most rigorous works in formal logic ever written.
At Dartmouth, they demonstrated the program. It worked. Feed it axioms and rules of inference, and it could derive theorems. It proved 38 of the first 52 theorems in Principia Mathematica. More impressively, for one theorem it found a proof more elegant than the original—so elegant that Newell and Simon submitted it to a logic journal (where it was rejected, not because it was wrong, but because the editors weren’t ready to accept papers authored by machines).
This was extraordinary. A machine, reasoning its way to mathematical truth. Following chains of logic, exploring possible inferences, building proofs step by step. It looked like thinking. It behaved like intelligence. What else could it be?
Newell and Simon were confident they’d cracked something fundamental. In later papers, they would claim that the Logic Theorist and its successor, the General Problem Solver, had “solved the venerable mind-body problem, explaining how a system composed of matter can have the properties of mind.”
They believed they’d discovered the basic mechanism of intelligence: symbol manipulation according to rules. Physical symbols (data structures in computer memory) manipulated by physical processes (programs) following formal rules (algorithms). That was it. That was thinking.
The Physical Symbol System Hypothesis, as they called it, seemed self-evident. What else could reasoning be? Humans manipulate symbols—words, numbers, concepts. We follow rules—logical inference, mathematical operations, grammatical structures. Computers could do this too, only faster and more reliably. Build a system with the right symbols and the right rules, and you’d have artificial intelligence.
It made perfect sense. And for a few years, it seemed to work.
The Expert System Gold Rush
Throughout the 1960s and 70s, symbolic AI racked up impressive demonstrations. Programs that proved geometric theorems. Programs that solved algebra word problems. Programs that played checkers and eventually chess. Each success seemed to confirm the core intuition: intelligence is rule-following, and rule-following is programmable.
But the real explosion came in the 1980s, with expert systems.
The idea was simple and seductive. Human experts—doctors, chemists, engineers—possess expertise. That expertise, fundamentally, is knowledge: facts about the domain and rules for reasoning about those facts. If you could extract that knowledge and encode it in a computer, you’d have artificial expertise. Expertise without the expert. Intelligence you could copy, distribute, and sell.
MYCIN was the poster child. Developed at Stanford in the mid-1970s, it diagnosed bacterial infections and recommended antibiotics. At its core were about 600 rules, encoded as IF-THEN statements:
IF the infection is primary-bacteremia
AND the site of culture is one of the sterile sites
AND the suspected portal of entry is the gastrointestinal tract
THEN there is suggestive evidence (0.7) that the organism is bacteroides
Each rule captured a piece of medical knowledge. The inference engine—the part of the system that applied rules to facts—would chain through the rules, asking questions, gathering information, narrowing down possibilities, and finally suggesting a diagnosis and treatment.
Studies showed MYCIN performed as well as infectious disease specialists. Better, in some evaluations, than general practitioners. It could explain its reasoning by showing the chain of rules it had followed. It never forgot details. It didn’t get tired. It applied its knowledge consistently.
This looked like intelligence. More than that, it looked like useful intelligence. The kind that could save lives and make money.
The 1980s saw an expert system gold rush. DENDRAL identified molecular structures from mass spectrometry data. XCON configured computer systems for Digital Equipment Corporation, handling thousands of orders and saving the company millions of dollars annually. R1, PROSPECTOR, CADUCEUS—hundreds of expert systems bloomed across medicine, engineering, finance, manufacturing.
Companies poured billions into AI. Every Fortune 500 company seemed to start an AI division. Universities expanded AI departments. The Japanese government launched the ambitious Fifth Generation Computer Project, aiming to build massively parallel inference machines. The United States responded with increased DARPA funding. Britain, France, and other nations launched their own initiatives.
This was it. AI had arrived. Intelligence was being productized. The Dartmouth dream was coming true, just thirty years late instead of eight weeks.
Except it wasn’t.
Cracks in the Foundation
The first hints of trouble came from the edges—places where expert systems met reality.
MYCIN was brilliant at diagnosing blood infections. But ask it about a simple headache, and it would dutifully apply its bacterial infection rules, suggesting antibiotics for bacteria that weren’t there. Its knowledge was deep but impossibly narrow. Outside its specialty, it was worse than useless—it was confidently wrong.
This wasn’t unique to MYCIN. Every expert system had boundaries, and those boundaries were sharp. DENDRAL understood mass spectrometry but nothing else about chemistry. XCON could configure computers but couldn’t troubleshoot them. The systems worked perfectly within their domains and failed completely outside them.
The brittleness went deeper. Expert systems couldn’t learn from experience. If MYCIN misdiagnosed a case, it didn’t get better—someone had to manually add or modify rules. Every new piece of knowledge required human encoding. Every edge case required more rules. Every exception required exception-handling rules.
The knowledge acquisition bottleneck, researchers called it. Extracting knowledge from experts was slow, expensive, and often impossible. Experts couldn’t always articulate what they knew. They’d say things like “the patient just looks septic” or “this spectrum feels wrong”—knowledge that existed in their intuition but resisted formalization.
Even when you could extract rules, they interacted in unexpected ways. Add a rule to handle case X, and suddenly case Y breaks. The rule base became a tangled mess, where changing one rule required checking hundreds of others. Systems that started manageable grew into nightmares of maintenance.
But the deepest problem was philosophical, not practical. It was called the frame problem.
The Frame Problem
Imagine you want to program a robot to go through a doorway. Simple enough, right? You need rules about doors, walls, walking, collision detection. But what else?
You need to know that doors can be locked. That some doors push, others pull. That doorknobs turn. That doors have hinges, and the hinges determine which direction they open. That you shouldn’t walk through a door if someone’s standing in the doorway. That if you’re carrying something large, you need to check if it fits. That glass doors can be mistaken for open doorways. That automatic doors require sensors. That revolving doors work differently.
Fine. Add rules for all of that.
But wait. You also need to know that doors are in walls. That walls are solid. That floors support weight. That gravity pulls downward. That walking requires alternating steps. That you shouldn’t walk if the floor is icy or wet. That some floors have stairs. That you need to look where you’re going. That if it’s dark, you might need light.
Keep going. How do you know when you’re done? How many rules does “walk through a door” actually require?
This is the frame problem: the impossibility of specifying everything that’s relevant to any action. We can enumerate rules, but we can’t enumerate all the context, all the background knowledge, all the common sense that makes those rules work. There’s no frame—no boundary—where you can say “these are all the rules you need.”
Humans don’t seem to have this problem. We just... know that doors are in walls, that walls are solid, that floors support weight. We’ve learned it from thousands of experiences. It’s not rules we follow. It’s patterns we recognize.
But symbolic AI had no way to acquire those patterns except by having someone explicitly program them. And you can’t explicitly program everything.
Moravec’s Paradox
This limitation revealed something philosophers and AI researchers came to call Moravec’s Paradox, after roboticist Hans Moravec who articulated it in the 1980s.
The paradox was this: the things humans find hard are easy for computers, and the things humans find easy are impossibly hard for computers.
A computer could prove theorems from Principia Mathematica—something only a handful of humans could do. It could evaluate chess positions millions of times per second—something no human could do. It could perform engineering calculations in microseconds that would take humans hours.
But it couldn’t tell a cup from a shoe. Couldn’t parse a casual sentence. Couldn’t walk across a room without bumping into furniture. Couldn’t recognize a face in different lighting. Couldn’t learn to ride a bicycle.
The highest achievements of human intellect—mathematics, logic, abstract reasoning—turned out to require relatively little computation. A few thousand lines of code could prove theorems.
But the things every toddler does effortlessly—seeing, moving, understanding language—required astronomical computation. And nobody knew how to program them.
Deep Blue, IBM’s chess computer, illustrated this perfectly. In 1997, it defeated world champion Garry Kasparov in a match that made headlines worldwide. It could evaluate 200 million positions per second. It could see 20 moves deep into some lines of play. It played chess at superhuman levels.
But it couldn’t tell you why it made a move, except by showing its evaluation of millions of positions—which wasn’t an explanation any human could parse. It couldn’t explain strategy in words. It couldn’t teach chess. It couldn’t write about chess. It couldn’t even recognize a chessboard in a photograph.
It played brilliant chess but understood nothing about chess. It had behavior without comprehension. Intelligence without understanding.
This was the core limitation of symbolic AI: you could program behavior, but behavior wasn’t understanding. You could encode rules, but rules weren’t learning. You could specify actions, but specification wasn’t adaptation.
The Reckoning
By the mid-1980s, cracks were showing. The expert system boom was slowing. Early promises weren’t materializing. Companies were discovering that maintaining expert systems was harder than building them. The Japanese Fifth Generation Project was quietly being acknowledged as a failure—they’d built fast inference machines, but inference without knowledge is worthless, and the knowledge bottleneck remained unsolved.
The limitations were becoming undeniable:
Brittleness. Systems worked in their narrow domains and failed everywhere else. No generalization. No transfer of knowledge between domains.
No learning. Systems couldn’t improve from experience. Every new capability required explicit programming. Adaptation was impossible.
No common sense. The frame problem defeated attempts to encode everyday knowledge. The gap between formal rules and implicit understanding couldn’t be bridged.
No robustness. Real-world messiness broke clean logical systems. Noise, ambiguity, missing information—all the normal conditions of reality—caused brittle failure.
Maintenance nightmares. Rule bases grew unmanageable. Interactions between rules became unpredictable. Adding knowledge meant debugging the entire system.
But perhaps most fundamentally: wrong theory of intelligence. Intelligence wasn’t symbol manipulation according to rules. Or rather, it wasn’t just that. Something crucial was missing.
The something missing, it would turn out, was learning. The ability to extract patterns from examples. To generalize from experience. To adapt to new situations not by following pre-programmed rules but by recognizing similarities to past situations.
Human intelligence doesn’t come from someone programming rules into our brains. It comes from experience—millions of examples, thousands of hours of practice, constant exposure to patterns in the world. A child learns to recognize faces not from being taught explicit rules about faces, but from seeing faces. Learns language not from grammar rules (those come much later, if at all), but from hearing language spoken.
Symbolic AI tried to build intelligence top-down: specify the rules, encode the knowledge, program the behavior. But intelligence, it seemed, had to be grown bottom-up: show examples, let systems extract patterns, enable learning.
In 1956 at Dartmouth, they thought intelligence was logic and logic was programmable. By 1986, thirty years later, they’d built impressive systems that proved theorems, diagnosed diseases, configured computers, and played master-level chess. They’d achieved remarkable things within carefully constrained domains.
But constraints are not reality. And when reality arrived—messy, ambiguous, open-ended, requiring common sense and learning and adaptation—the beautiful logical systems shattered like glass.
The Lesson Learned
The confidence of 1956 had given way to something more complex by the late 1980s: a mix of genuine achievement, frustrated ambition, and dawning realization. They had built systems that behaved intelligently in specific contexts. But intelligence that only works in controlled conditions isn’t really intelligence. It’s automation.
The lesson was slowly becoming clear, though it would take another decade and a winter of AI funding to fully sink in:
Intelligence isn’t rules we can write down. It’s patterns we learn from experience. It’s not something you can specify in advance. It’s something that emerges from learning.
You can’t program your way to general intelligence. The knowledge is too vast, too contextual, too implicit. The patterns are too subtle. The adaptation is too necessary.
But if you couldn’t program intelligence, what could you do? In 1986, most researchers didn’t have a good answer. A few were exploring neural networks—systems that could learn from examples—but they’d been dismissed as a dead end back in 1969. That dismissal would soon be reconsidered, but not yet.
For now, symbolic AI had reached its limits. The systems they’d built were impressive achievements of engineering and impressive demonstrations of how intelligence wasn’t going to work.
They had programmed behavior, not understanding. The difference would prove fatal.
Notes & Further Reading
The Dartmouth Conference (1956):
McCarthy, J., Minsky, M.L., Rochester, N., & Shannon, C.E. (1955). “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.” The actual workshop ran June-August 1956. Despite the optimistic proposal, participants later acknowledged little was actually “solved” that summer—but it established AI as a field and created a community of researchers.
The Logic Theorist:
Newell, A., & Simon, H.A. (1956). “The Logic Theory Machine: A Complex Information Processing System.” Their claim about solving the mind-body problem appears in: Newell, A., & Simon, H.A. (1976). “Computer Science as Empirical Inquiry: Symbols and Search.” Communications of the ACM, 19(3), 113-126.
Expert Systems:
Buchanan, B.G., & Shortliffe, E.H. (1984). Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Details MYCIN’s development and evaluation. For the broader expert systems boom, see Feigenbaum, E.A., McCorduck, P., & Nii, H.P. (1988). The Rise of the Expert Company.
The Frame Problem:
McCarthy, J., & Hayes, P.J. (1969). “Some Philosophical Problems from the Standpoint of Artificial Intelligence.” First articulation of the frame problem. Dennett’s essay “Cognitive Wheels: The Frame Problem of AI” (1984) provides an accessible philosophical treatment.
Moravec’s Paradox:
Moravec, H. (1988). Mind Children. The paradox is named after Moravec but was observed by many AI researchers in the 1970s-80s. Steven Pinker later expressed it memorably: “The main lesson of thirty-five years of AI research is that the hard problems are easy and the easy problems are hard.”
Deep Blue:
The famous match occurred May 3-11, 1997. Kasparov later wrote extensively about the experience, arguing that while Deep Blue played brilliant chess, it demonstrated search and evaluation, not understanding. See Kasparov, G. (2017). Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins.
Physical Symbol System Hypothesis:
Newell, A., & Simon, H.A. (1976). Still considered an important (if incomplete) theory. For critiques, see Searle’s Chinese Room argument (1980) and Brooks, R.A. (1991). “Intelligence Without Representation.”

