John Jumper on What AlphaFold Solved and What It

The protein folding problem took fifty years and an almost embarrassing amount of human ingenuity to crack. Synchrotrons the size of small towns. Crystals coaxed from solution over months. A year of work and roughly $100,000 per structure. Then AlphaFold 2 arrived at CASP14 in 2020 and scored so far above everything else that the organizers declared the problem essentially solved—which, in the cautious vernacular of structural biology, is basically a standing ovation.

In 2024, the Nobel Committee made it official. John Jumper, who led the AlphaFold team at DeepMind, shared the chemistry prize with Demis Hassabis and David Baker. Within days of the announcement, Jumper confirmed he was leaving for Anthropic.

In a long conversation with Tim Scarfe on Machine Learning Street Talk—filmed before the Anthropic news broke—Jumper offers something rarer than celebration: a precise accounting of what AlphaFold actually does, why the popular explanations for how it works are mostly wrong, and what solving one hard problem tells us about all the hard problems still ahead.

The narrowness is the point

The dominant narrative around AlphaFold frames it as AI unlocking the secrets of life. Jumper is careful, almost insistent, about pushing back on that.

"We are not trying to tell you everything," he says. "We are not a model of the entire cell. We are a predictor of this experiment that you did all the time and took you a year."

That framing sounds modest. It isn't. What Jumper is describing is a system that can reproduce, at atomic resolution and in minutes, the result of one of science's most expensive and time-consuming measurements—and do it for every protein in every sequenced genome, 200 million structures in all. The narrowness is precisely what makes it trustworthy. Because AlphaFold is pinned to a specific measurable output, researchers can characterize its accuracy, know when to trust it, and know when to go back to the bench.

The drug discovery application illustrates both the power and the limit. Jumper walks through the case of Midnolin—a human protein almost nobody had studied—which turned out to be a key component in how cells recycle proteins at a critical phase of division. Biologists found it through genetics, ran AlphaFold against it and 500 related proteins, and discovered a very specific structural pattern: one protein getting clamped between two parts of Midnolin like a pair of jaws. They then validated that prediction experimentally, removing the clamped region and watching the recycling stop. That's the loop AlphaFold enables: fast structural hypothesis, experimental test, mechanistic understanding.

But AlphaFold 2 couldn't tell you where a drug molecule actually binds to that protein. That required AlphaFold 3, released in 2024, which extended the system to handle small molecules, DNA, RNA, and other non-protein entities. And even then, you're still a long way from "cured disease." Jumper's factory joke lands: a technician fixes a complex machine by turning one nut a quarter-turn, then bills $10,000—not for the turning, but for knowing what to turn. AlphaFold helps you see the nut. The knowing still takes biology.

What actually made it work (and what didn't)

The technical section of the conversation is where Jumper gets genuinely animated, and also a little exasperated. A widespread explanation for AlphaFold 2's success credits geometric deep learning—specifically the SE(3)-equivariant architecture, which respects the rotational symmetries of 3D space. It's an elegant story. It is also, according to Jumper, mostly wrong.

The ablation data is unambiguous: removing the equivariant component cost AlphaFold 2 about 2.5 points on the GDT accuracy scale. The total improvement over AlphaFold 1 was 30 points. Equivariance contributed roughly 8 percent of the win. Jumper published this in the Nature paper. Nobody seemed to notice.

"I thought that would put it to bed," he says. "It didn't even put it to bed at all."

What actually drove performance was the Evoformer—the massive architectural trunk that runs a kind of dialogue between evolutionary data (thousands of related protein sequences from across species) and geometric data (pairwise distance predictions between atoms). Combined with a loss function called FAPE—Frame Aligned Point Error, which asks "from the perspective of each residue, where is everyone else?"—the system learned to assemble coherent 3D structure from purely non-geometric starting data. The geometry emerged; it wasn't imposed.

Jumper describes this as "ruthless empiricism." Someone deleted the convolutional layers from an intermediate architecture and performance went up—remove parameters, improve accuracy, a result that "doesn't normally happen in machine learning." The team followed the data rather than their intuitions, which is partly why the reviewer who read their Nature submission said it contained six or seven papers' worth of ideas. It's not one insight. It's eighteen doubles, to use Jumper's baseball analogy.

This matters beyond AlphaFold. The equivariance story gets repeated because it's clean and theoretically satisfying. The actual story—biological hypothesis plus geometric intuition plus months of ablations plus a loss function nobody had thought to write down before—is messier and harder to generalize. But it's the true story, and it has a different lesson for anyone trying to build the next domain-specific AI system.

Predict, control, understand

Jumper draws a taxonomy that most AI discourse collapses together. Prediction means knowing what your system will output. Control means steering that output toward a target. Understanding means compressing the knowledge into something a human can hold in their head and communicate to another human—"fits on an index card," as he puts it.

AlphaFold does the first two. The third one is still on us.

"We have to derive our own understanding at this moment," Jumper says. "It does the act of predict and maybe control."

This is a genuinely important distinction, especially as AI-for-science hype often conflates prediction accuracy with scientific insight. A system that reproduces experimental results with high fidelity is extraordinarily useful. It is not the same as a system that explains why biology works the way it does.

The Bitter Lesson—Rich Sutton's influential 2019 essay arguing that general methods using computation always win over domain knowledge in the long run—comes up here, and Jumper is direct: "I don't really love the bitter lesson as people try and apply it. In fact, AlphaFold 2 is the opposite of that." The data isn't infinite. The internet is finite. The protein database is finite. Domain knowledge about which architectural choices make sense for a particular problem isn't a handicap to be engineered away; it's what makes the difference between 15,000 structures and 150,000 structures worth of training data, as one ablation study effectively demonstrated.

What 1,000 African structural biologists changes

The conversation ends with Emmanuel Nji of BioStruct Africa, who provides a concrete measure of what the database means at ground level. He spent four or five years trying to crystallize a single protein. With AlphaFold, he resolved a structure in under three months. His program, backed by Google DeepMind and the Swedish Research Council, is now training 100 scientists per year across Africa, targeting 1,000 over the next decade—researchers working specifically on malaria, HIV, and antibiotic-resistant infections: diseases that receive a fraction of the structural biology resources directed at conditions affecting wealthier populations.

This is the part of the AlphaFold story that gets the least coverage and carries the most weight. The database being open didn't just accelerate existing research programs; it created new ones in places that never had access to a synchrotron.

As for why Anthropic wanted Jumper: Scarfe speculates, Jumper doesn't say, and it would be pure guesswork to assume it's about biology at all. What Jumper has that general-purpose AI labs rarely cultivate is a demonstrated ability to build systems that are precisely scoped, deeply embedded in domain knowledge, and calibrated to a specific measurable output—then to iterate on them with enough empirical discipline to separate the contributing factors from the good story about the contributing factors.

Whether that's what Anthropic is after, or whether AlphaFold's particular hybrid approach—human hypothesis plus machine learning plus years of head-banging against a specific dataset—becomes a template for other hard scientific domains, will depend on problems we haven't yet agreed to try to solve.

— Marcus Chen-Ramirez, Senior Technology Correspondent, Buzzrag