When AI Builds a Compiler in Two Weeks: What Just Changed

Here's what should probably disturb you: An Anthropic researcher just spent $20,000 in API costs to have Claude build a fully functional C compiler—100,000 lines of code, operational in two weeks. Nicholas Carlini, who ran the experiment, called it "some of the most fun I've had recently," but also admitted he "did not expect this to be anywhere near possible so early in 2026."

The engineers at IBM who discussed this milestone on the Mixture of Experts podcast had a different reaction: surprise that it took this long.

"I'm kind of surprised that this hasn't been done earlier," said Mihai Criveti, IBM's distinguished engineer for Agentic AI. "I've been doing it roughly for the last 3 years to a great degree of success."

That gap—between what sounds like a watershed moment and what apparently represents standard practice for people actually building software with AI—tells you something about where we are. The technology has already moved beyond what most of us realize. The news is just catching up.

The Economics Don't Lie

Let's ground this in numbers, because the $20,000 price tag sounds expensive until you run it against traditional software development costs. Using the COCOMO model for software estimation, building 100,000 lines of code the human way would cost approximately $2.5 million, require 10 developers, and take 20 months.

Two weeks versus 20 months. $20,000 versus $2.5 million. The math isn't subtle.

Martin Keen, IBM master inventor, noted that Carlini used Claude's Opus 4.6 model for this project. "This week, Anthropic have brought out Sonnet 4.6, which in many cases is equally as capable," Keen said. "It will be interesting to know how much an API cost it would be for having Sonnet to do that. I suspect it would be quite a bit less as well."

The price is dropping while the capability increases. This is the part where I'm supposed to tell you the implications are unclear. They're not.

What Actually Happened Here

Carlini's approach matters more than the compiler itself. He built a harness that put 16 agents into a continuous problem-solving loop: finish one task, identify the next open problem, move forward. No constant human supervision. No prompt engineering gymnastics. Just agents doing what agents apparently do now.

"This experiment excites me, but it also leaves me feeling uneasy," Carlini wrote. "Building this compiler has been some of the most fun I've had recently, but I did not expect this to be anywhere near possible so early in 2026."

That unease? The IBM engineers share it, even as they point out this isn't exactly unprecedented.

Criveti explained that serious AI users have been running these kinds of loops for years—generating code, running tests, doing code reviews, regenerating based on feedback. "I think what is unique here is somebody actually admitted that they're doing it and published the results," he said. "Things are heading towards a place where people are a lot more open about using agents to write their own agents."

The milestone isn't the capability. It's the transparency.

The Humans-Still-Matter Defense

The reflexive response to stories like this is to emphasize what humans still do better. And look, it's true—Carlini didn't just launch the agents and go to dinner. He intervened when they got stuck. He redirected effort. He provided, in Keen's words, "a valuable role."

But here's Keen's next sentence: "The moat of human requirement is getting smaller and smaller. The requirements for us are becoming less and less with each new iteration of these models."

Kush Varshney, IBM Fellow, raised a question that keeps surfacing in these conversations: "What is the human part of this whole endeavor of knowledge creation or production of knowledge work?" He described it as "something to be very worried about."

Not 'something to monitor.' Not 'something to consider.' Something to be very worried about.

Criveti offered a more pragmatic frame: AI helps automate coding aspects of software development, but "it doesn't replace the art of software engineering." He argued that AI agents need the same tools human developers use—IDE features, test frameworks, debugging environments. "If you build your agent as part of a true software development process, you're able to get good results. If all you're doing is vibe coding, you're not going to get good results."

Which sounds reassuring until you remember that setting up proper development processes is exactly what people good with AI will do. The bar isn't 'can AI code?' anymore. It's 'can AI work within professional development workflows?' And apparently, yes.

The Test That Failed

Here's the detail that should interest you: Carlini's compiler passed GCC's basic test suite. Impressive. Then humans opened the first ticket: "Cannot compile hello world."

More testing required. More tokens spent. The system worked in theory but failed at the most fundamental task.

This is either comforting (AI still can't handle basic cases!) or concerning (AI can build complex systems that superficially work but fail in practice). I'm not sure which interpretation wins.

Meanwhile, in India

The same podcast covered another story that reframes the compiler discussion: Google and DeepMind's $200 billion AI infrastructure investment in India. That's billion with a B—the largest AI infrastructure deal in history.

When host Matt Kosinski called it "a staggering sum," Criveti pushed back: "Is it a staggering sum? How much is that in tokens?" He ran rough math: 10,000 concurrent users running a large language model for a year on Opus 4.6 burns through something like 50 billion tokens.

The question isn't whether the number is big. It's whether the number is big relative to what AI infrastructure actually costs and what it enables. Varshney suggested the real driver is sovereignty—countries wanting control over AI technology rather than depending on U.S. data centers that could be cut off during geopolitical tensions.

We're watching the industrial revolution play out in real-time, except the factories are data centers and the product is inference.

What Six Months Buys You

Varshney made a comment that stuck with me: "Right now I mean who knows? I mean we wait 6 months, that might not be the human role either."

Six months. Not six years. Not "eventually." Six months from now, the things humans currently do to guide AI agents might be automated too.

Criveti predicted that within a year, widespread use of AI to automate entire development processes will be standard. "We're going to get there within the next year and that's going to be scary," he said.

So here we are. An AI system built a compiler in two weeks for $20,000. Engineers who work at the frontier say they've been doing similar things for years, they're just starting to talk about it publicly. The economic case against human developers grows stronger with each model release. And the experts building these systems use words like "uneasy," "worried," and "scary."

The compiler is already done. The question is what you're going to do with that information.

—Marcus Chen-Ramirez