Anthropic's Advisor Strategy: When Cheaper AI

Anthropic has released something called the advisor strategy, and the premise sounds almost too sensible to be real: use your expensive AI model only when you actually need it.

The concept is straightforward. Instead of running every task through Claude's most capable model, Opus 4.6—which costs $5 per million input tokens and $25 per million output tokens—you pair it with a cheaper executor like Sonnet ($3 input, $15 output) or Haiku ($1 input, $5 output). The executor handles the work. Opus only gets called when things get difficult.

Nate Herk, who runs an AI automation channel, built a testing dashboard to see how this plays out in practice. His results illuminate both the promise and the complications of treating AI model selection as an optimization problem.

The Theory Versus the Practice

Anthropics's own benchmarks show Sonnet with Opus as an advisor scored 2.7 percentage points higher on SWE-bench—a standard coding evaluation—than Sonnet alone, while reducing cost per task by nearly 12%. Haiku with Opus as advisor more than doubled Haiku's solo performance on another benchmark, from 19.7% to 41.2%.

Those numbers suggest a clear win: better performance, lower cost. But Herk's testing uncovered something more interesting than simple validation.

He fed the same prompts to different model combinations—Haiku plus Opus, Sonnet plus Opus, and the models running solo. For simple queries like "What are your business hours?" Haiku handled it alone without calling the advisor. Cost: negligible. When he ran the identical query through Opus alone, it cost 21 times more.

That's the success case. Then came the edge behavior.

Herk tested a moderately complex question about product returns. Haiku with Opus advisor answered correctly—but never called the advisor. Sonnet with the same advisor setup did call Opus for help. Both answers were accurate. But one cost significantly more.

"It makes you wonder," Herk noted in his video, "is that because Haiku just didn't realize it was difficult and maybe misunderstood and Sonnet was able to understand that it needs the adviser?"

That question matters. The advisor strategy only works if the executor model can accurately assess when it's out of its depth.

The Cost-Quality Paradox

The pricing structure creates an interesting tension. Output tokens cost five times more than input tokens across all Claude models. This means a verbose answer from a cheap model can cost more than a concise answer from an expensive one.

Herk's dashboard showed this playing out repeatedly. For an enterprise sales query, Haiku with advisor gave a response that included a specific timeline: "they'll reach out within one or two business days." Sonnet with advisor gave the same information but vaguer: "our team will follow up with you soon." The Haiku response was objectively more useful. It was also cheaper.

When Herk ran Opus solo on the same query, it matched Haiku's output quality. Sonnet solo provided more detail than either but didn't actually create the support ticket—it just recommended the user request one.

These aren't edge cases. They're the normal variation you get when dealing with probabilistic systems. Which makes the advice Herk offers particularly sound: "Don't just do it right away. Test hundreds of prompts through each of them to see what you consistently think is performing better."

That's the practical reality of AI optimization. The benchmarks tell you what's possible. Your specific use case determines what's actual.

The Claude Code Angle

The advisor strategy currently only exists in Anthropic's Messages API—the endpoint developers use to build applications. Claude Code, the AI coding assistant that runs in your terminal, doesn't have the same explicit advisor mode.

But it has something similar: Opus Plan mode. Type /model opus plan and Claude uses Opus 4.6 for planning, then switches to Sonnet 4.6 for execution. This matters because even though Claude Code users aren't paying per token, different models consume different amounts of session usage. Haiku stretches your session limit further than Sonnet, which stretches further than Opus.

Herk tested this by having both Opus Plan mode and pure Opus mode build the same visualization dashboard. The Plan mode version produced what he considered the better result—clearer diagrams, better information architecture—while consuming fewer session resources.

Again, though: single test, limited sample size. The performance delta could reverse on different tasks.

What This Actually Means

The advisor strategy represents something more significant than a pricing optimization. It's an acknowledgment that AI capability exists on a spectrum, and most tasks don't require maximum capability.

This has been true in computing forever. You don't need a supercomputer to check email. You don't need a database cluster to store a contact list. We've always matched resources to requirements.

What's different here is the matching happens dynamically and probabilistically. The executor model makes a judgment call about whether it needs help. Sometimes it calls for backup when it doesn't need to. Sometimes it doesn't call when it should. The strategy works on average, across many requests—not necessarily on any single request.

For developers building production applications, this introduces a new variable to manage. You're not just tuning for accuracy anymore. You're tuning for the meta-question of when to escalate.

Anthropics implementation includes a max_uses parameter that caps how many times the advisor can be called in a single request. That's a cost control mechanism, but it's also an admission that the executor's judgment might not align with your budget.

The strategy is in beta, which means both the behavior and the pricing could shift. What we're seeing now is Anthropic's hypothesis about how this should work, tested against early adopter usage.

Herk's testing suggests the hypothesis is sound but not simple. The advisor strategy will save money in aggregate. It will sometimes produce better results than using cheap models alone. It will occasionally surprise you by calling Opus when you wouldn't have, or not calling it when you would have.

Whether that's acceptable depends entirely on what you're building and who pays when it's wrong.

Bob Reynolds is Senior Technology Correspondent at Buzzrag.