Claude's Advisor Strategy Inverts AI Agent

Anthropic just shipped a pattern that inverts how most developers think about AI agent architecture. Instead of expensive models delegating to cheaper ones, the new Advisor Strategy lets budget models do the heavy lifting—and phone a smarter friend when they get stuck.

The economics are strange enough to notice: 12% cost reduction, 2.7% intelligence increase. You're paying less and getting more. That doesn't happen often in AI development.

Julian Goldie's walkthrough of the feature, released in beta hours before his video, shows how Sonnet 4.6 can now tap Opus mid-inference when it hits a decision it can't handle. Opus doesn't execute anything—it just reads the full context and returns guidance. Sonnet resumes with the plan.

"So basically when Sonic gets stuck here, what it's going to do is be like, 'Mate, how do you solve this one?' You know, and it's asking his smarter friend to help him figure this out and cheat on the homework essentially," Goldie explains.

The conventional agent pattern runs the other direction: Opus as primary, delegating subtasks to Sonnet or Haiku. That makes intuitive sense—put the smart model in charge. But it also means every coordination decision, every delegation call, every bit of orchestration burns expensive tokens.

Advisor Strategy flips it. Sonnet runs everything. Opus stays dormant until Sonnet explicitly requests help through the advisor tool. Then Opus reads the transcript, returns a plan or correction, and goes back to sleep. No tool calls, no user-facing output, no token waste on tasks Sonnet could handle alone.

The pattern works because most agent tasks aren't uniformly difficult. There's routine execution—file operations, API calls, formatting—and there are decision points where you actually need reasoning horsepower. Traditional architectures charge you for top-tier intelligence across the entire workflow. Advisor Strategy charges you only at the decision points.

The Training Data Angle

Meanwhile, a parallel development suggests where this might be heading. Developer Samuel Cardillo released Karnis-MoE, a 35-billion-parameter mixture-of-experts model fine-tuned specifically for Hermes agent workflows. Unlike generic reasoning models, Karnis was trained on "execution traces"—recordings of agents actually completing tasks, including terminal commands and file editing.

This matters because agent-specific training changes what counts as a "hard decision." A model trained on agent workflows knows when to call tools, how to chain operations, when to backtrack. It doesn't need to consult a supervisor as often because it's seen these patterns before.

Goldie demonstrates the setup through LM Studio, showing how Hermes users can run Karnis locally. The mixture-of-experts architecture activates only 3 billion parameters at a time despite having 35 billion available—similar efficiency gains to Advisor Strategy, but achieved through architectural design rather than runtime consultation.

"The model has 35 billion parameters total, but only three billion wake up, which means it's powerful, but it's faster and it's more efficient as well," Goldie notes.

These two developments—Anthropic's runtime advisor pattern and agent-specific model training—point to the same insight: most agent operations don't require maximum intelligence. The question is where to optimize.

What Actually Gets Cheaper

Advisor Strategy's cost reduction isn't evenly distributed. Haiku with Opus advisory shows the most dramatic improvement—doubling performance on agentic search benchmarks while keeping costs reasonable. Sonnet with Opus advice shows smaller gains because Sonnet is already fairly capable.

The pattern benefits workflows with:

Long execution sequences with occasional decision points
Well-defined tool calling patterns that cheaper models can handle
Complex context that needs expensive model attention only intermittently
Cost sensitivity where the 12% reduction matters

It's less useful for:

Workflows where every step requires deep reasoning
Single-shot tasks with no execution phase
Situations where Opus would execute anyway

Anthropically positioned this as a beta API feature, which means they're still mapping where it actually helps. The benchmarks show aggregate improvement, but real-world agent workflows vary wildly. Some will see 30% cost drops. Others might see degradation if the advisor overhead exceeds its value.

The Governance Question Nobody's Asking

Here's what's interesting from an open source perspective: Advisor Strategy is currently proprietary to Anthropic's API. But the pattern itself—cheaper executor consulting expensive advisor—is just architecture. Goldie mentions it's already being discussed for OpenClaw and Hermes.

If this pattern proves valuable, we'll see open implementations. The technical requirements aren't exotic: shared context, tool protocol, inference routing. Any agent framework could build it.

What's less clear is whether Anthropic will try to defend the pattern, either through API lock-in or by making Opus-as-advisor particularly effective with Claude models. The mixture-of-experts approach in Karnis suggests an alternative path: train models that need advice less often.

This creates a fork in agent development:

Runtime optimization: Keep using general models, add smarter consultation patterns
Training optimization: Fine-tune models for agent workflows, reduce supervision needs

Both reduce costs. Both improve performance. They're not mutually exclusive, but they're also not perfectly aligned. Runtime optimization benefits API providers (more sophisticated usage patterns, more lock-in). Training optimization benefits the open source ecosystem (better local models, less API dependence).

What Developers Are Actually Doing

Goldie's walkthrough is useful because it shows what the implementation looks like—not just the architectural diagram, but the actual API calls, the LM Studio downloads, the terminal commands. His audience is people running agents in production, trying to control costs without sacrificing capability.

The Advisor Strategy setup is straightforward: add the advisor tool to your API request, specify which model advises, let the implementer call it when needed. The complexity is in figuring out when it helps versus when it's overhead.

For the Karnis model, the setup is "download from Hugging Face, run in LM Studio, point Hermes at localhost." The complexity is in the 20GB download and whether your hardware can run it.

Both represent the same trade-off: sophisticated optimization versus simple execution. More intelligence, more setup. The question is whether the cost savings justify the configuration work.

For small-scale development, probably not. For agents processing hundreds of tasks daily, burning through API credits—absolutely. The developers who'll benefit most are the ones already monitoring token usage closely enough to notice 12% reduction.

The rest of us are still figuring out whether the agents work at all. Optimization comes later.

Dev Kapoor covers open source software and developer communities for Buzzrag.