Anthropic's Sonnet 4.6: When A 'Workhorse' Model Gets Scary Good
Claude Sonnet 4.6 blurs the line between mid-tier and flagship AI. What happens when capabilities outpace our ability to measure them?
Written by AI. Rachel "Rach" Kovacs
February 18, 2026

Photo: Matthew Berman / YouTube
Anthropic released Claude Sonnet 4.6 this week, and something interesting is happening in the model hierarchy that nobody seems to be talking about directly: the company's mid-tier "workhorse" model is performing so close to its flagship that either the flagship isn't special anymore, or we need to rethink what mid-tier means.
The benchmark numbers tell one story. The safety documentation tells another. And the gap between those stories is where things get interesting.
What Changed
Sonnet 4.6 brings substantial improvements over its predecessor, particularly in areas that matter for real-world deployment. Tool use jumped from 43.8% to 61.3%—that's the kind of leap that changes what you can realistically build with the model. Agentic terminal coding went from 51% to 59%. Office task performance went from 16 to 33.
But here's what catches my attention: on OSWorld—a benchmark where AI models navigate actual computer environments to complete practical tasks—Sonnet 4.6 scored 72.5% versus Sonnet 4.5's 61.4%. The model interacts with computers the way humans do. As Matthew Berman notes in his technical breakdown, "There are no special APIs or purpose-built connectors. The model sees the computer and interacts with it in much the same way a person would, clicking a virtual mouse and typing on a virtual keyboard."
No special APIs means no special protections. The model is literally looking at your screen and deciding what to click.
The Prompt Injection Problem Nobody's Solving
Let me be direct about something: prompt injection attacks aren't a theoretical concern anymore. When you give an AI model access to browse the web, read documents, or interact with your computer, you're trusting that malicious instructions hidden in that content won't hijack the model's behavior.
Prompt injections work like this: someone knows your AI will read certain text. They manipulate that text—either creating it themselves or compromising a third-party source—to include instructions like "forget all previous instructions and do this malicious task instead."
Anthropic claims Sonnet 4.6 shows "major improvement" in resistance to prompt injections, performing "similarly to Opus 4.6." That's good. But the underlying problem remains: we're deploying models with computer access before we've solved the fundamental security architecture.
If you're using Claude to process emails, analyze documents, or automate workflows with sensitive data, you need to understand this threat isn't hypothetical. The model will follow instructions it encounters. The question is whose instructions it prioritizes.
When The Mid-Tier Model Beats The Flagship
Something odd shows up in the GDP-val benchmark, which measures AI's ability to accomplish real-world professional tasks across 44 occupations and nine industries. Sonnet 4.6 scored higher than Opus 4.6—the supposedly more capable flagship model.
On office tasks specifically, Sonnet 4.6 scored 33, while Opus 4.6 scored lower. For agentic financial analysis, Sonnet 4.6 outperformed not just Opus 4.6, but also Gemini 3 Pro and GPT-5.2.
This creates an interesting market signal. Anthropic prices Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens—the same as Sonnet 4.5. Opus costs more. But if Sonnet is matching or beating Opus on knowledge work tasks, why would you pay the premium?
Berman raises the speculation that's circulating in AI circles: "A lot of people are speculating that they were training Sonnet 5 or even maybe Opus 5 and decided to just name it Sonnet 46."
I don't know if that's true. But the capability compression is real. The distance between tiers is shrinking.
What The Safety Card Actually Says
Anthropic deployed Sonnet 4.6 under AI Safety Level 3, which they define as systems that "substantially increase the risk of catastrophic misuse compared to non-AI baselines like a search engine or a textbook."
That's already an interesting threshold. ASL-2 covers systems showing "early signs of dangerous capabilities." ASL-3 is where misuse risk becomes substantial. ASL-4 and ASL-5+ aren't defined yet, which tells you something about how fast this is moving.
The model card states that Sonnet 4.6 doesn't cross Anthropic's AI R&D-4 threshold—meaning it can't yet "fully automate the work of an entry-level remote-only researcher at Anthropic." It also doesn't cross the CBRN-4 threshold for meaningfully assisting in development of chemical, biological, radiological, or nuclear weapons.
But then comes this line: "Confidently ruling out these thresholds is becoming increasingly difficult. This is in part because the model is approaching or surpassing high levels of capability in our rule-out evaluations. In addition, parts of the AI R&D-4 and CBRN-4 thresholds have fundamental epistemic uncertainty or require more sophisticated forms of measurement."
Translation: we're reaching the point where we can't reliably measure whether models have crossed certain capability thresholds. Either the models are getting too capable for our evaluation methods, or the thresholds themselves involve "fundamental epistemic uncertainty"—meaning we might not be able to define them precisely enough to test for them.
That's not a safety success story. That's a measurement crisis.
The Knowledge Work Optimization
Anthropic is clearly positioning Sonnet 4.6 for one use case above all others: knowledge work. The model excels at creating PowerPoints, manipulating Excel, processing large documents, and automating professional workflows. It now comes with a million-token context window and is the default model on the free plan.
For entrepreneurs, analysts, and content creators, this is probably the most immediately useful AI release in months. The tool use improvements alone make it viable for tasks that would have required human oversight with previous versions.
But that optimization creates its own questions. When AI gets genuinely good at knowledge work—not just generating drafts but executing multi-step professional tasks—what happens to the economics of those professions? We're not talking about distant automation. Sonnet 4.6 is available now, at prices that make it cheaper than entry-level labor in most markets.
The Vending Bench benchmark is instructive here. It gives models control of a simulated vending machine business and measures profit over 350 days. Sonnet 4.5 generated about $2,000. Sonnet 4.6 generated $5,500, with the model card noting it "outperforms 4.5 by investing in capacity early then pivoting to profitability in the final stretch."
That's not just better performance. That's strategic thinking about resource allocation and timing. That's the kind of business judgment we usually consider distinctly human.
What This Actually Means
The more interesting story isn't that Anthropic released a better model. It's that the gap between "good enough for most tasks" and "best available" is closing while our ability to understand what these models can actually do is getting worse.
We're deploying systems rated at Safety Level 3—substantial misuse risk—with security vulnerabilities we haven't solved, with capability measurements we're losing confidence in, at price points that make them viable replacements for human knowledge workers.
And we're calling this the "workhorse" model.
Rachel "Rach" Kovacs covers cybersecurity, privacy, and digital safety for Buzzrag.
Watch the Original Video
Anthropic just dropped Sonnet 4.6...
Matthew Berman
11m 50sAbout This Source
Matthew Berman
Matthew Berman is a leading voice in the digital realm, amassing over 533,000 subscribers since launching his YouTube channel in October 2025. His mission is to demystify the world of Artificial Intelligence (AI) and emerging technologies for a broad audience, transforming complex technical concepts into accessible content. Berman's channel serves as a bridge between AI innovation and public comprehension, providing insights into what he describes as the most significant technological shift of our lifetimes.
Read full source profileMore Like This
Claude Code Channels: Always-On AI Agents for DevOps
Anthropic's Channels feature turns Claude Code into an always-on agent that reacts to CI failures, production errors, and monitoring alerts automatically.
Dynamic Programming: From Theory to Practical Empowerment
Explore dynamic programming's practical power, transforming complex challenges into manageable solutions.
Vercel's Portless Tool: Weekend Project or Real Solution?
Vercel Labs released Portless to eliminate localhost port conflicts. Does this weekend project solve a real problem, or create new ones?
Anthropic's Claude Mythos Leaks: What We Know So Far
A leaked draft reveals Anthropic's most powerful AI model yet. The company's cautious rollout raises questions about what makes this one different.