Anthropic's Advisor Strategy Flips Claude's Model

There's this tension at the heart of working with LLMs right now: the models smart enough to handle complex reasoning burn through your token budget like it's nothing, while the efficient ones choke on anything remotely nuanced. For anyone building with Claude, this isn't theoretical—it's the daily calculation that determines whether you can actually finish what you started.

Anthropic's answer is genuinely interesting, and it's not what you'd expect. Instead of making their models bigger or faster, they're rethinking which model should be in charge.

The Inversion

The new advisor strategy flips Claude's usual hierarchy. Normally, Opus (Claude's most capable model) runs the show and occasionally delegates simpler tasks down to Sonnet or Haiku. The advisor approach inverts this: Sonnet becomes the executive agent handling all the actual work—code changes, tool calls, user-facing output—while Opus sits in an advisory role, only getting consulted when Sonnet hits a wall.

The executive never stops being Sonnet. The advisor never writes code. This division of labor is the entire point.

When Anthropic tested this internally on SWE-bench (a benchmark that measures how well models can resolve real GitHub issues), the Sonnet-as-executive + Opus-as-advisor combination outperformed Sonnet alone on both performance and cost. Which makes sense—you're only invoking the expensive model when reasoning depth actually matters, not for routine implementation.

The promise is straightforward: build more within your token limits without constant hand-holding. But does it actually deliver?

What Works

AI LABS tested the advisor strategy across three scenarios on real applications, and the first test showed exactly where this approach shines. They had a real-time sync issue—moving and resizing elements worked perfectly across sessions, but deletion just... wouldn't sync. Sonnet had already tried debugging this multiple times and kept failing.

With the advisor strategy active, something different happened. Because Sonnet had already failed repeatedly, it recognized it needed help and invoked Opus. The advisor reviewed the conversation history, identified where the sync logic was breaking, and provided specific restructuring guidance. Sonnet applied those fixes. Done.

"If we had tried fixing this using Sonnet alone, it would have taken more rounds of back and forth prompting because Sonnet inherently is a weaker model and not capable enough to handle complex logic by itself," the AI LABS team noted. "On the other hand, using Opus alone would have consumed far more tokens and likely wouldn't have been this fast."

This is the advisor strategy working as intended: Sonnet handles the execution, gets stuck on genuinely hard reasoning, calls in Opus for guidance, then implements the solution. Token-efficient and effective.

The second test—transforming an entire app's UI to a different component library—showed both the method's capabilities and its ceiling. Sonnet correctly identified this as a major change and consulted the advisor before touching any code. Opus caught version conflicts between the new and existing libraries that would have broken everything. Sonnet resolved the dependencies, then methodically worked through each component.

The resulting UI was "much more interactive and looked significantly more polished than before," according to the test. But here's the catch: the entire process took 31 minutes. For a not-particularly-complex app, that's... a while.

Where It Breaks Down

The problem is parallelization. Opus orchestrates tasks differently—it identifies what can run simultaneously and executes in parallel. Sonnet, being a smaller model, handles everything sequentially. One thing, then the next thing, then the next thing. For large-scale UI transformations, this sequential processing becomes the bottleneck.

Then there's the judgment problem. The third test revealed it clearly: when asked to add a completely new feature to an existing app, Sonnet just... did it. Without consulting Opus. It treated a complex feature addition as routine implementation, which it absolutely was not.

The result? Multiple bugs. Changes bleeding across component boundaries. Broken streaming functionality. Only after AI LABS explicitly told it to use the advisor did Opus get invoked, identify the wrong component choices, and provide the fix.

"The model doesn't always judge the complexity of a task the same way you do," they observed. "And when it misjudges, you end up with bugs that the adviser would have caught from the start."

This isn't a minor edge case—it's a fundamental limitation. Sonnet lacks the reasoning depth to accurately assess when it's out of its depth. Sometimes it knows to ask for help. Sometimes it confidently proceeds with an implementation path that's going to cause problems downstream.

The Actual Use Case

So where does this leave us? The advisor strategy works best in a pretty specific sweet spot: simple to medium-complexity applications where most tasks are straightforward but you occasionally need deeper reasoning. If that describes your project, this approach can genuinely save you several rounds of back-and-forth prompting and let you build more within your token limits.

But for complex applications with many connected dependencies or multiple failure points, the calculus changes. "Even when Sonnet follows the adviser's guidance correctly, it can still choose the wrong implementation path because it doesn't have the reasoning depth to evaluate multiple approaches at once and weigh the downstream consequences," AI LABS noted. In those scenarios, the extra prompting rounds needed to correct Sonnet's misjudgments can actually cost more time than just running Opus from the start.

The strategy is useful when two conditions are both true: you're working within tight token limits, and the application doesn't require Opus-level reasoning at every step. If only one of those is true, you probably want a different approach.

What's interesting to me is what this reveals about the current state of AI tooling. We're at this weird moment where the models are powerful enough to build real applications but constrained enough that resource management becomes its own engineering problem. The advisor strategy is basically Anthropic saying: here's a pattern for working within those constraints more intelligently.

It's not trying to eliminate the tradeoffs. It's trying to make them more intentional. Which feels like the actually useful innovation here—not that you can make Sonnet as capable as Opus (you can't), but that you can structure your workflow so you're only paying for Opus when you genuinely need it.

Whether that's worth the added complexity of managing when to invoke the advisor, and the occasional need to nudge Sonnet into consulting it, depends entirely on your specific constraints and tolerance for debugging. There's no universal answer. Just a more granular set of tradeoffs to navigate.

— Yuki Okonkwo, AI & Machine Learning Correspondent