Anthropic's Sonnet 4.6: When A 'Workhorse' Model

Anthropic released Claude Sonnet 4.6 this week, and something interesting is happening in the model hierarchy that nobody seems to be talking about directly: the company's mid-tier "workhorse" model is performing so close to its flagship that either the flagship isn't special anymore, or we need to rethink what mid-tier means.

The benchmark numbers tell one story. The safety documentation tells another. And the gap between those stories is where things get interesting.

What Changed

Sonnet 4.6 brings substantial improvements over its predecessor, particularly in areas that matter for real-world deployment. Tool use jumped from 43.8% to 61.3%—that's the kind of leap that changes what you can realistically build with the model. Agentic terminal coding went from 51% to 59%. Office task performance went from 16 to 33.

But here's what catches my attention: on OSWorld—a benchmark where AI models navigate actual computer environments to complete practical tasks—Sonnet 4.6 scored 72.5% versus Sonnet 4.5's 61.4%. The model interacts with computers the way humans do. As Matthew Berman notes in his technical breakdown, "There are no special APIs or purpose-built connectors. The model sees the computer and interacts with it in much the same way a person would, clicking a virtual mouse and typing on a virtual keyboard."

No special APIs means no special protections. The model is literally looking at your screen and deciding what to click.

The Prompt Injection Problem Nobody's Solving

Let me be direct about something: prompt injection attacks aren't a theoretical concern anymore. When you give an AI model access to browse the web, read documents, or interact with your computer, you're trusting that malicious instructions hidden in that content won't hijack the model's behavior.

Prompt injections work like this: someone knows your AI will read certain text. They manipulate that text—either creating it themselves or compromising a third-party source—to include instructions like "forget all previous instructions and do this malicious task instead."

Anthropic claims Sonnet 4.6 shows "major improvement" in resistance to prompt injections, performing "similarly to Opus 4.6." That's good. But the underlying problem remains: we're deploying models with computer access before we've solved the fundamental security architecture.

If you're using Claude to process emails, analyze documents, or automate workflows with sensitive data, you need to understand this threat isn't hypothetical. The model will follow instructions it encounters. The question is whose instructions it prioritizes.

When The Mid-Tier Model Beats The Flagship

Something odd shows up in the GDP-val benchmark, which measures AI's ability to accomplish real-world professional tasks across 44 occupations and nine industries. Sonnet 4.6 scored higher than Opus 4.6—the supposedly more capable flagship model.

On office tasks specifically, Sonnet 4.6 scored 33, while Opus 4.6 scored lower. For agentic financial analysis, Sonnet 4.6 outperformed not just Opus 4.6, but also Gemini 3 Pro and GPT-5.2.

This creates an interesting market signal. Anthropic prices Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens—the same as Sonnet 4.5. Opus costs more. But if Sonnet is matching or beating Opus on knowledge work tasks, why would you pay the premium?

Berman raises the speculation that's circulating in AI circles: "A lot of people are speculating that they were training Sonnet 5 or even maybe Opus 5 and decided to just name it Sonnet 46."

I don't know if that's true. But the capability compression is real. The distance between tiers is shrinking.

What The Safety Card Actually Says

Anthropic deployed Sonnet 4.6 under AI Safety Level 3, which they define as systems that "substantially increase the risk of catastrophic misuse compared to non-AI baselines like a search engine or a textbook."

That's already an interesting threshold. ASL-2 covers systems showing "early signs of dangerous capabilities." ASL-3 is where misuse risk becomes substantial. ASL-4 and ASL-5+ aren't defined yet, which tells you something about how fast this is moving.

The model card states that Sonnet 4.6 doesn't cross Anthropic's AI R&D-4 threshold—meaning it can't yet "fully automate the work of an entry-level remote-only researcher at Anthropic." It also doesn't cross the CBRN-4 threshold for meaningfully assisting in development of chemical, biological, radiological, or nuclear weapons.

But then comes this line: "Confidently ruling out these thresholds is becoming increasingly difficult. This is in part because the model is approaching or surpassing high levels of capability in our rule-out evaluations. In addition, parts of the AI R&D-4 and CBRN-4 thresholds have fundamental epistemic uncertainty or require more sophisticated forms of measurement."

Translation: we're reaching the point where we can't reliably measure whether models have crossed certain capability thresholds. Either the models are getting too capable for our evaluation methods, or the thresholds themselves involve "fundamental epistemic uncertainty"—meaning we might not be able to define them precisely enough to test for them.

That's not a safety success story. That's a measurement crisis.

The Knowledge Work Optimization

Anthropic is clearly positioning Sonnet 4.6 for one use case above all others: knowledge work. The model excels at creating PowerPoints, manipulating Excel, processing large documents, and automating professional workflows. It now comes with a million-token context window and is the default model on the free plan.

For entrepreneurs, analysts, and content creators, this is probably the most immediately useful AI release in months. The tool use improvements alone make it viable for tasks that would have required human oversight with previous versions.

But that optimization creates its own questions. When AI gets genuinely good at knowledge work—not just generating drafts but executing multi-step professional tasks—what happens to the economics of those professions? We're not talking about distant automation. Sonnet 4.6 is available now, at prices that make it cheaper than entry-level labor in most markets.

The Vending Bench benchmark is instructive here. It gives models control of a simulated vending machine business and measures profit over 350 days. Sonnet 4.5 generated about $2,000. Sonnet 4.6 generated $5,500, with the model card noting it "outperforms 4.5 by investing in capacity early then pivoting to profitability in the final stretch."

That's not just better performance. That's strategic thinking about resource allocation and timing. That's the kind of business judgment we usually consider distinctly human.

What This Actually Means

The more interesting story isn't that Anthropic released a better model. It's that the gap between "good enough for most tasks" and "best available" is closing while our ability to understand what these models can actually do is getting worse.

We're deploying systems rated at Safety Level 3—substantial misuse risk—with security vulnerabilities we haven't solved, with capability measurements we're losing confidence in, at price points that make them viable replacements for human knowledge workers.

And we're calling this the "workhorse" model.

Rachel "Rach" Kovacs covers cybersecurity, privacy, and digital safety for Buzzrag.