AI Coding Tools Work Best With Old Engineering

There's a movement happening in AI coding circles called "specs to code." The pitch is elegant: write a specification, feed it to an AI, get code out. If something breaks, don't touch the code—just modify the spec and run it again. It's compiler thinking applied to AI assistants, and according to developer educator Matt Pocock, it produces garbage.

Pocock has spent the last 18 months teaching developers to build with AI coding tools like Claude Code. He's watched hundreds of engineers hit the same wall: the first AI-generated code works okay, the second iteration works worse, and by the third pass, you're staring at an unmaintainable mess. The specs-to-code approach, he argues in a recent talk at the AI Engineer conference, is "just V coding by another name"—the fantasy that you can delegate everything and never think about the actual structure of your system.

The problem isn't the AI. It's what Pocock calls "software entropy"—the tendency of codebases to decay with every change unless someone actively invests in the design. When you treat code as disposable output from a spec compiler, nobody's investing. The AI doesn't understand system-level design. It's making tactical changes without strategic context. And entropy accelerates.

The Fundamental Misconception

Underlying the specs-to-code movement is an assumption: code is cheap. If the AI can generate unlimited code, who cares if it's messy? Just regenerate it.

Pocock's counterargument is stark: "Bad code is the most expensive it's ever been." Not because AI makes bad code worse—though it can—but because a hard-to-change codebase blocks you from extracting AI's actual value. AI coding assistants excel in well-structured codebases with clear boundaries and good tests. They flounder in spaghetti. The worse your architecture, the less AI can help you.

This inverts the conventional AI narrative. Instead of AI making engineering practices obsolete, it makes them load-bearing. You need better fundamentals to use AI well, not fewer.

Failure Mode One: The AI Didn't Understand

The first breakdown Pocock sees is miscommunication. You think you explained what you want. The AI thinks it understands. You both discover you were wrong after the code ships.

He borrows a concept from Frederick P. Brooks: the "design concept," an invisible shared understanding that exists between collaborators. You can't put it in a markdown file. It's the theory of what you're building, and it has to be genuinely mutual.

Pocock's solution is a prompt he calls "Grill Me": Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one by one.

The prompt went viral—13,000 GitHub stars. Developers report the AI asking 40, 60, sometimes 100 questions before it's satisfied. It transforms the interaction from eager code generation into adversarial requirements gathering. The result isn't just better alignment—it's a conversation artifact you can convert into a product requirements document or task list.

Pocock thinks this beats the default "plan mode" in most AI coding tools, which are, in his words, "extremely eager to create an asset" and start coding before genuine understanding exists.

Failure Mode Two: The AI Is Too Verbose

The second breakdown is language mismatch. The AI uses terms you don't recognize. You use terms it interprets differently. Everyone's talking past each other, and the code reflects it.

This is a classic problem in software development—the gap between domain experts and engineers. Domain-driven design solved it decades ago with "ubiquitous language": a shared vocabulary derived from the problem domain, used consistently in code, documentation, and conversation.

Pocock built a skill that scans codebases, extracts terminology, and generates a ubiquitous language document—markdown tables of terms the developer and AI both reference. He keeps it open during planning sessions. By reading the AI's thinking traces (when available), he's noticed it not only improves planning but makes the AI think less verbosely. Implementation aligns better with intent.

It's Domain-Driven Design, but the "domain expert" is a large language model.

Failure Mode Three: The AI Built It, But It Doesn't Work

The third breakdown is correctness. Even when aligned and communicating clearly, the AI ships broken code.

The obvious fix is feedback loops: static types, automated tests, browser access for frontend work. But Pocock noticed something: LLMs don't use feedback loops efficiently. They're bad at incremental development. They generate huge chunks of code, then type-check afterward, like a student who writes the whole essay before proofreading.

The Pragmatic Programmer calls this "outrunning your headlights"—driving faster than your feedback mechanism allows. Pocock's solution is test-driven development. Write the test first, make it pass, refactor. TDD forces the AI into small, deliberate steps.

But TDD only works if your codebase is testable, which brings us to the structural question: what does a testable codebase look like?

Deep Modules vs. Shallow Modules

John Ousterhout's A Philosophy of Software Design distinguishes between deep and shallow modules. Deep modules hide complexity behind simple interfaces—lots of functionality, minimal surface area. Shallow modules expose complex interfaces for minimal functionality.

AI-generated codebases, left unchecked, tend toward shallow modules. Tons of tiny files with complex interdependencies. The AI struggles to navigate these. It can't keep the dependency graph in context. It doesn't understand what the code does because it's scattered across fifty shallow modules.

Deep modules change this. They create clear boundaries with simple interfaces. You test at the interface. The implementation can stay messy—or you can let the AI handle it—because the boundary is solid.

Pocock has a skill for this too: "Improve codebase architecture." It explores the codebase, identifies related code, and wraps it in deep modules. The result is a codebase that rewards TDD and lets the AI understand system structure.

The Strategic/Tactical Division

The pattern across all these failure modes is the same: AI is a tactical programmer. It's the sergeant on the ground making changes. But somebody needs to think strategically—about design, boundaries, interface contracts, module architecture.

That's the human role. Not writing every line of code. Designing the interfaces. Investing in system design daily, as Kent Beck advises. Using AI as an implementation engine within a structure you control.

Pocock frames it as "design the interface, delegate the implementation." You don't review every line inside a well-tested module. You treat it as a gray box—verify the boundary works, move on. This scales your brain because you're not trying to hold the entire implementation in your head.

But it requires knowing your module map cold. It requires thinking about interfaces during planning. It requires, in other words, the fundamentals: clean architecture, ubiquitous language, test-driven development, deliberate design.

Why This Matters For OSS

From a community perspective, this has implications beyond individual productivity. If AI coding tools amplify existing code quality—making good codebases better and bad codebases worse—then OSS projects with poor architecture are about to get much harder to maintain.

We're already seeing maintainer burnout accelerate. Adding AI to poorly structured projects won't fix that. It might make it worse, as contributors use AI to ship faster without investing in design, accelerating entropy.

The projects that will thrive are the ones that enforce architectural standards, maintain clear module boundaries, require tests, and document their ubiquitous language. The fundamentals aren't just individual best practices anymore—they're community survival strategies.

Pocock's message is reassuring and demanding in equal measure: your existing skills aren't obsolete. They're more important. But you have to actually use them. The AI won't do it for you. It can't. That's the job.

—Dev Kapoor