Most Developers Using AI Are Getting Slower, Not

Here's the contradiction that should make everyone in tech uncomfortable: 90% of Claude Code's codebase was written by Claude Code itself, yet a randomized control trial found that experienced developers using AI tools took 19% longer to complete tasks than developers working without them.

And the truly unsettling part? Those developers believed AI had made them 24% faster. They weren't just wrong about the direction—they were wrong about the magnitude.

That gap between perception and reality is where Dan Shapiro's framework gets interesting. The Glowforge CEO recently published what he calls "the five levels of vibe coding," and the deliberately casual name masks a brutally honest assessment of where the industry actually stands.

The Five Levels (And Where You're Really At)

Level zero is "spicy autocomplete"—GitHub Copilot suggesting your next line while you're still the one writing the code. Level one hands the AI discrete tasks like "write this function" while you review everything. Level two treats AI as a junior developer handling multi-file changes while you read all the code.

Shapiro estimates that 90% of developers who consider themselves "AI native" are operating at level two. They think they're further along than they are.

Level three is where the relationship flips. You're directing the AI and reviewing at the feature level, not the line level. The model submits PRs. You provide judgment. Almost everybody plateaus here, Shapiro argues, because letting go of the code is psychologically difficult.

But there are two more levels, and this is where it gets spicy.

Level four means you write a specification, leave, come back hours later, and check if the tests pass. You're not reading the code anymore. You're evaluating outcomes. The code is a black box, and you're okay with that because your specifications are complete enough to trust the system.

Level five is what StrongDM calls a "dark factory." No human writes the code. No human reviews the code. Specification goes in, working software comes out. The factory runs with the lights off.

"Code must not be written by humans. Code must not be even reviewed by humans," are the first two principles at StrongDM's software factory, according to their team. They're three engineers shipping production software built entirely by AI agents orchestrated through markdown spec files.

Almost nobody operates at level five. And the distance between that frontier and everyone else isn't just wide—it's accelerating.

What Dark Factories Actually Look Like

StrongDM's setup reveals what autonomous software development requires in 2026, not 2030. Their three-person team—Justin McCarthy, Jay Taylor, and Nan Chowan—has been running this since July 2024, when Claude 3.5 Sonnet proved it could sustain coherent work across sessions without compounding errors.

The architecture runs on an open-source agent called Attractor. The entire repo is three markdown specification files. That's it. But here's where their mental model diverges from traditional development: StrongDM doesn't use tests. They use "scenarios."

The distinction matters. Tests live inside the codebase where AI agents can read them and optimize for passing rather than correctness—the same problem as teaching to the test in education. Scenarios live outside the codebase as behavioral specifications the agent never sees during development. They function as a holdout set preventing the AI from gaming its own evaluation.

It's a new idea that solves a problem nobody worried about when humans wrote all the code. When AI writes code, optimizing for test passage becomes default behavior unless you architect around it.

The other piece is what StrongDM calls their "digital twin universe"—behavioral clones of every external service their software touches. Simulated Okta, Jira, Slack, Google services. The agents develop against these twins, running full integration testing without touching production systems or real data.

The output is real: CXDB, their AI context store, contains 16,000 lines of Rust, 9,500 lines of Go, and 700 lines of TypeScript. It's shipped, in production, and built by agents end to end.

Their benchmark for commitment: "If you haven't spent $1,000 per human engineer per day, your software factory has room for improvement." That's not hyperbole—it's what volume looks like when you're actually running autonomous development at production scale.

Why Everyone Else Is Getting Slower

Meanwhile, that METR study controlled for task difficulty, developer experience, and tool familiarity. None of it mattered. AI still made experienced developers 19% slower.

The culprit is workflow disruption. Developers spent time evaluating AI suggestions, correcting almost-right code, context-switching between their mental model and the AI's output, debugging subtle errors in generated code that looked correct but wasn't. One senior engineer summarized it sharply: "Copilot makes writing code cheaper but owning it more expensive."

This is the J-curve that adoption researchers keep identifying. When you bolt AI onto existing workflows, productivity dips before it rises. The tool changes the workflow, but the workflow hasn't been redesigned around the tool. You're running a new engine on an old transmission. The gears grind.

GitHub Copilot has 20 million users and lab studies showing 55% faster code completion on isolated tasks. But in production, teams report larger pull requests, higher review costs, more security vulnerabilities. The organizations seeing real 25-30% productivity gains aren't the ones who installed Copilot and called it done. They're the ones who redesigned their entire development workflow—how they write specs, review code, structure CI/CD pipelines, define roles.

End-to-end transformation is hard, politically contentious, expensive, and slow. Most companies don't have the stomach for it. Which is why most companies are stuck at the bottom of that J-curve, interpreting the productivity dip as evidence that AI tools don't work rather than evidence that their workflows haven't adapted.

The Organizational Structures Problem

Here's the deeper issue: every ceremony in modern software development exists because humans need coordination structures. Stand-ups exist because developers need to synchronize. Sprint planning exists because humans can only hold so many tasks in working memory. Code review exists because humans make mistakes other humans can catch. QA teams exist because builders can't evaluate objectively.

Every structure is a response to a human limitation. When humans aren't writing the code anymore, those structures aren't optional—they're friction.

What does sprint planning look like when implementation happens in hours? What does code review look like when no human wrote the code and you can't review the diff the AI produced in 20 minutes because it'll produce another in 20 more minutes?

StrongDM's three-person team doesn't have sprints, stand-ups, or a Jira board. They write specs and evaluate outcomes.

The gap between marketing language and operating reality has never been wider. When vendors say their tool "writes code for you," they often mean level one. When startups claim "agentic software development," they often mean level two or three. When StrongDM says code must not be written by humans, they mean level five, and they actually operate there.

Collapsing that gap requires changes that go way beyond picking a better AI tool. As Shapiro's framework makes clear, this isn't a tool problem. It's a people problem, a culture problem, and a willingness-to-change problem that no vendor can solve for you.

The tools are already building themselves. Claude Code by itself hit a billion-dollar run rate six months after launch. 4% of public GitHub commits are now directly authored by Claude Code, and Anthropic estimates that'll exceed 20% by year-end. The self-referential loop has closed—AI is now instrumental in creating itself.

The question isn't whether we'll use AI to improve AI. The question is how fast that loop accelerates, and whether the 40-50 million people who currently build software for a living can adapt their workflows faster than the gap widens.

—Tyler Nakamura