Anthropic's Advisor Strategy: When Cheaper AI Models Work Better
Anthropic's new advisor strategy pairs expensive Opus with budget models, cutting costs by 12% while maintaining quality. But testing reveals surprises.
Written by AI. Bob Reynolds
April 11, 2026

Photo: Nate Herk | AI Automation / YouTube
Anthropic has released something called the advisor strategy, and the premise sounds almost too sensible to be real: use your expensive AI model only when you actually need it.
The concept is straightforward. Instead of running every task through Claude's most capable model, Opus 4.6—which costs $5 per million input tokens and $25 per million output tokens—you pair it with a cheaper executor like Sonnet ($3 input, $15 output) or Haiku ($1 input, $5 output). The executor handles the work. Opus only gets called when things get difficult.
Nate Herk, who runs an AI automation channel, built a testing dashboard to see how this plays out in practice. His results illuminate both the promise and the complications of treating AI model selection as an optimization problem.
The Theory Versus the Practice
Anthropics's own benchmarks show Sonnet with Opus as an advisor scored 2.7 percentage points higher on SWE-bench—a standard coding evaluation—than Sonnet alone, while reducing cost per task by nearly 12%. Haiku with Opus as advisor more than doubled Haiku's solo performance on another benchmark, from 19.7% to 41.2%.
Those numbers suggest a clear win: better performance, lower cost. But Herk's testing uncovered something more interesting than simple validation.
He fed the same prompts to different model combinations—Haiku plus Opus, Sonnet plus Opus, and the models running solo. For simple queries like "What are your business hours?" Haiku handled it alone without calling the advisor. Cost: negligible. When he ran the identical query through Opus alone, it cost 21 times more.
That's the success case. Then came the edge behavior.
Herk tested a moderately complex question about product returns. Haiku with Opus advisor answered correctly—but never called the advisor. Sonnet with the same advisor setup did call Opus for help. Both answers were accurate. But one cost significantly more.
"It makes you wonder," Herk noted in his video, "is that because Haiku just didn't realize it was difficult and maybe misunderstood and Sonnet was able to understand that it needs the adviser?"
That question matters. The advisor strategy only works if the executor model can accurately assess when it's out of its depth.
The Cost-Quality Paradox
The pricing structure creates an interesting tension. Output tokens cost five times more than input tokens across all Claude models. This means a verbose answer from a cheap model can cost more than a concise answer from an expensive one.
Herk's dashboard showed this playing out repeatedly. For an enterprise sales query, Haiku with advisor gave a response that included a specific timeline: "they'll reach out within one or two business days." Sonnet with advisor gave the same information but vaguer: "our team will follow up with you soon." The Haiku response was objectively more useful. It was also cheaper.
When Herk ran Opus solo on the same query, it matched Haiku's output quality. Sonnet solo provided more detail than either but didn't actually create the support ticket—it just recommended the user request one.
These aren't edge cases. They're the normal variation you get when dealing with probabilistic systems. Which makes the advice Herk offers particularly sound: "Don't just do it right away. Test hundreds of prompts through each of them to see what you consistently think is performing better."
That's the practical reality of AI optimization. The benchmarks tell you what's possible. Your specific use case determines what's actual.
The Claude Code Angle
The advisor strategy currently only exists in Anthropic's Messages API—the endpoint developers use to build applications. Claude Code, the AI coding assistant that runs in your terminal, doesn't have the same explicit advisor mode.
But it has something similar: Opus Plan mode. Type /model opus plan and Claude uses Opus 4.6 for planning, then switches to Sonnet 4.6 for execution. This matters because even though Claude Code users aren't paying per token, different models consume different amounts of session usage. Haiku stretches your session limit further than Sonnet, which stretches further than Opus.
Herk tested this by having both Opus Plan mode and pure Opus mode build the same visualization dashboard. The Plan mode version produced what he considered the better result—clearer diagrams, better information architecture—while consuming fewer session resources.
Again, though: single test, limited sample size. The performance delta could reverse on different tasks.
What This Actually Means
The advisor strategy represents something more significant than a pricing optimization. It's an acknowledgment that AI capability exists on a spectrum, and most tasks don't require maximum capability.
This has been true in computing forever. You don't need a supercomputer to check email. You don't need a database cluster to store a contact list. We've always matched resources to requirements.
What's different here is the matching happens dynamically and probabilistically. The executor model makes a judgment call about whether it needs help. Sometimes it calls for backup when it doesn't need to. Sometimes it doesn't call when it should. The strategy works on average, across many requests—not necessarily on any single request.
For developers building production applications, this introduces a new variable to manage. You're not just tuning for accuracy anymore. You're tuning for the meta-question of when to escalate.
Anthropics implementation includes a max_uses parameter that caps how many times the advisor can be called in a single request. That's a cost control mechanism, but it's also an admission that the executor's judgment might not align with your budget.
The strategy is in beta, which means both the behavior and the pricing could shift. What we're seeing now is Anthropic's hypothesis about how this should work, tested against early adopter usage.
Herk's testing suggests the hypothesis is sound but not simple. The advisor strategy will save money in aggregate. It will sometimes produce better results than using cheap models alone. It will occasionally surprise you by calling Opus when you wouldn't have, or not calling it when you would have.
Whether that's acceptable depends entirely on what you're building and who pays when it's wrong.
Bob Reynolds is Senior Technology Correspondent at Buzzrag.
Watch the Original Video
Claude Just Told Us to Stop Using Their Best Model
Nate Herk | AI Automation
14m 50sAbout This Source
Nate Herk | AI Automation
Nate Herk | AI Automation is a burgeoning YouTube channel with a subscriber base of 476,000, dedicated to enabling businesses to harness AI automation effectively. Having been active for just over six months, Nate Herk focuses on the transformative potential of artificial intelligence in enhancing business efficiency and competitiveness. The channel's mission is to guide enterprises, whether novices or veterans of AI, toward optimizing their operations through smart AI applications.
Read full source profileMore Like This
What Cloning a $100K Website Teaches About Design
A developer used AI to replicate an award-winning site in 15 minutes. The process reveals more about learning web design than automation.
OpenAI and Anthropic Face Their Monetization Reckoning
As OpenAI and Anthropic prepare for IPOs, both companies are making hard choices about compute resources and pricing. The AI industry's profitability problem is here.
ChatGPT vs Claude: The Visual Explainer Battle Nobody Saw Coming
OpenAI and Anthropic released competing visual tools within 48 hours. We tested both—one's faster, one's smarter, and the differences matter.
Claude Code's Ultra Plan: When Speed Meets Quality
Anthropic quietly released Ultra Plan for Claude Code. It uses parallel AI agents to plan projects faster—and execution follows suit. Here's what's happening.