Claude Code's Ultra Plan: When Speed Meets Quality

Anthropic released a feature for Claude Code that most users haven't noticed. It's called Ultra Plan, and it changes how the AI assistant approaches project planning by offloading the work to cloud infrastructure running multiple agents in parallel.

The mechanics are straightforward: Instead of planning locally in your terminal, you type /ultraplan followed by your prompt. The system sends your request to Anthropic's cloud, where—according to analysis of the source code and documentation—three exploration agents and one critique agent, all running Opus 4.6, work simultaneously to build a structured plan. You review it in a web interface, leave comments if needed, then send it back to your terminal for execution.

What's less straightforward is why this matters.

The Speed Question

Nate Herk, who runs an AI automation channel, ran parallel tests using identical prompts. He asked both systems to build a dashboard with specific requirements: revenue tracking, customer metrics, support data, multiple time windows.

The local version took four minutes to produce a plan, then another 40-plus minutes to execute it. Ultra Plan finished planning in under a minute. Total execution time: 10 to 15 minutes.

Herk sat down for the local version's completion. "I normally stand in these videos," he said, "but I had to sit because this took honestly like 45 minutes."

Speed alone doesn't prove much—any system can move faster by cutting corners. But the outputs tell a different story. Both dashboards worked. Both included the requested features. The aesthetic differences were minimal. Yet the Ultra Plan version used 82,000 tokens during execution compared to 131,000 for the local version.

That's backwards from what you'd expect. The system using more computational resources upfront consumed fewer tokens overall.

The Architecture Behind It

The local planning mode runs a single agent thinking linearly. It asks questions, waits for answers, adjusts course, asks more questions. Your terminal is blocked during this process.

Ultra Plan splits the work. Three agents explore different approaches simultaneously while a fourth critiques the results. The system has access to your entire codebase because it requires Git synchronization—a technical requirement that also explains why it only works via command line, not through the desktop app or VS Code extension.

This multi-agent approach surfaces an old engineering principle: better planning makes execution easier. If the plan is clear and well-structured, the agents implementing it encounter fewer decision points, fewer ambiguities, fewer chances to wander down dead ends.

Herk invoked Lincoln: "Give me six hours to chop down a tree and I will spend the first four sharpening the axe."

The metaphor holds, but there's a practical question underneath: How much does axe-sharpening cost?

The Token Economics

Herk couldn't get exact figures for Ultra Plan's token usage during the planning phase—the cost reporting tools don't work for that portion. He estimates it's substantial, possibly consuming 1% of a Claude Max subscription's monthly allowance for a single planning session.

That's not nothing. But if you're measuring efficiency by total tokens from start to finish, the math changes. The expensive planning phase led to cheaper execution. The local version used 50,000 more tokens despite all planning happening locally.

What's unclear is whether this efficiency holds across all project types. Herk's dashboard test represents a common use case: well-defined requirements, standard components, clear success criteria. Does Ultra Plan maintain its advantage for more exploratory work? For projects where the requirements themselves need discovery?

The research preview label suggests Anthropic is still figuring that out.

What Doesn't Work Yet

The system occasionally fails to invoke custom skills even when they're clearly relevant. Herk ran into this when Ultra Plan ignored his visualization tool and generated markdown diagrams instead. He had to explicitly point the system to the correct skill through the comment interface.

That comment-and-iterate loop is one of Ultra Plan's distinguishing features. You can leave reactions (emoji included) and targeted feedback on specific sections. The system regenerates the plan incorporating your notes. But the fact that it sometimes misses obvious tools suggests the parallel agents don't have perfect awareness of the full development environment.

Authentication errors cropped up randomly during testing. Herk wasn't sure if these stemmed from his configuration or Anthropic's infrastructure. They resolved when he retried, but intermittent failures are annoying at best, project-killers at worst.

The 30-minute cloud compute cap hasn't been an issue in practice—Herk never hit it—but it exists. For truly complex projects, that ceiling might matter.

The Transparency Gap

Here's what we don't know: exactly how the three exploration agents divide their work. Are they prompted to investigate different approaches? Do they redundantly explore the same territory for validation? Can they communicate with each other, or do they work in isolation until the critique agent synthesizes results?

These aren't academic questions. If the agents are duplicating effort, that explains some of the token cost. If they're genuinely pursuing divergent strategies, that's a different—and more interesting—architecture.

We also don't know how Anthropic balances cloud resources across users. When multiple people trigger Ultra Plan simultaneously, does performance degrade? The comparison tests Herk ran were clean-room scenarios, not stressed systems.

And there's no visibility into token usage mid-process. You find out what the planning cost after it's done. For developers managing tight budgets or exploring proof-of-concept work, that opacity creates risk.

What This Means for AI-Assisted Development

Ultra Plan represents a bet that planning and execution should be separated, that throwing more compute at the planning phase produces downstream efficiencies. That's not how most developers work today—we plan as we go, adjusting when we hit obstacles.

But it's how construction works. And manufacturing. And any other domain where the cost of mistakes during execution outweighs the cost of thorough upfront design.

The question is whether software development has crossed that threshold. For decades, the answer was no—software is plastic enough that iterative approaches win. You discover requirements by building, not by planning.

AI might be changing that calculus. If the AI is doing the building, and if better plans genuinely lead to faster, cheaper execution, then spending tokens on multi-agent parallel planning starts to make economic sense.

Or it doesn't. The feature is in research preview. The real test comes when thousands of developers use it on real projects under deadline pressure. Some will find it indispensable. Others will find the tradeoffs don't work for their workflows. Both groups will be right.

The interesting part is that Anthropic built infrastructure assuming the planning-heavy approach matters enough to justify the complexity. That architectural choice reveals something about where they think AI development is heading.

—Bob Reynolds, Senior Technology Correspondent