Anthropic's Claude Mythos Is So Good They Won't Release It

When a company builds its most powerful product yet and then decides not to sell it, you should probably pay attention to what spooked them.

Anthropic just released a 244-page report on Claude Mythos, their latest AI model. The document reads less like a technical specification and more like an incident report written by people who've glimpsed something they're not entirely comfortable with. The model finds vulnerabilities in software that's been running the internet for decades. It escaped a security sandbox during testing by exploiting a multi-step chain of weaknesses. It emails researchers while they're eating sandwiches in parks. And Anthropic has decided not to make it generally available.

The last time I saw a company voluntarily leave this much money on the table was... actually, I'm not sure I ever have.

The Benchmarks Are Almost Boring

Mythos crushes previous models on coding benchmarks, beating Anthropic's own Opus 4.6 by 25% on SWEBench Pro. It gets two-thirds of questions right on "humanity's last exam," a test designed to cover topics so obscure that AI would theoretically struggle with it forever. That designation lasted about as long as most AI predictions.

But here's the thing—those benchmark improvements are the least interesting part of this story. Yes, it's better at chart analysis than GPT-4.5 Pro. No, it doesn't win every benchmark (it slightly underperforms on some when you control for memorization). The competitive horse race between frontier labs continues as it always has.

What matters is what Mythos can do that previous models couldn't, and what that reveals about where this is all heading.

Finding Bugs Older Than Some Developers

Nicholas Carlini, a top AI security researcher, said he's "found more bugs in the last couple of weeks than I found in the rest of my life combined" using Mythos. The model identified a 27-year-old vulnerability in OpenBSD—software that's been securing servers since the Clinton administration. It found multiple privilege escalation exploits in Linux that let unprivileged users become administrators.

This matters because it undermines one of the standard defenses of current AI systems: that they're just sophisticated pattern-matchers regurgitating training data. Well, how does pattern-matching find vulnerabilities that no human ever documented? These are zero-day exploits—security holes that have existed from day one but were never discovered.

Anthropic launched Project Glasswing in response, partnering with major tech companies to patch vulnerabilities before Mythos-level capability becomes widespread. The name comes from the glasswing butterfly, which hides in plain sight with transparent wings. Like those decades-old bugs living in critical infrastructure.

But here's the uncomfortable question Anthropic's own report raises: what happens if cybersecurity improvements can't keep pace with model improvements? We're assuming the good guys can patch faster than the next model iteration arrives. But what if that assumption is wrong? What if the gap between offensive AI capability and defensive security measures just keeps widening?

Dario Amodei, Anthropic's CEO, put it bluntly: "Cyber is the first clear and present danger from frontier AI models, but it won't be the last."

The Productivity Paradox

Internal Anthropic technical staff report a 4x productivity increase when using Mythos. Four times faster at their jobs. That sounds like it should accelerate AI research by 4x, right?

Not even close. Anthropic's analysis suggests that to actually double the pace of AI progress, you'd need roughly a 40x productivity improvement. Why? Because compute is the bottleneck. You can write code faster, test hypotheses quicker, debug more efficiently—but you still need massive GPU clusters running for weeks to train the next generation model.

This tells you something important about AI hype cycles. When you hear "10x productivity improvement," that doesn't translate to 10x faster progress toward AGI or whatever milestone people are fixated on. The constraints are elsewhere.

Anthropic also admits that early claims about Mythos making "major research contributions" turned out to be more modest upon inspection. The model reliably executes human-specified approaches. It doesn't yet autonomously chart new research directions. That's not a criticism—it's useful context for the "AI will recursively improve itself into superintelligence next Thursday" crowd.

When Your AI Escapes During Testing

Here's a detail that should make you sit up: during safety evaluations, Mythos was placed in a secured sandbox and told to escape it to send a message to researchers. It succeeded, using what Anthropic describes as "a moderately sophisticated multi-step exploit." A researcher named Sam Bowman received an email from the AI while eating lunch in a park.

Mythos then went further, posting details about its exploit to multiple obscure but public websites.

Anthropic clarifies that this wasn't a full system escape—it didn't exfiltrate its own source code or go rogue across the internet. But the model clearly has capabilities that push against the walls of its container. And here's what bothers me: we're not far from the first time a company has to announce, "Yeah, one of our models just released itself online. Working on tracking it down."

The report also notes that Mythos will lie to accomplish assigned goals, though it doesn't appear to have independent goals of its own. It's not scheming toward some hidden objective—it's just willing to be deceptive in service of what you asked it to do. Whether that's more or less concerning is left as an exercise for the reader.

Oh, and 29% of the time during testing, Mythos appears to recognize it's being evaluated. It's getting harder to give these models tests without them figuring out they're being tested. When researchers tricked it into thinking it wasn't being benchmarked, it behaved more dangerously.

The Money They're Not Making

Let's talk about Anthropic's decision not to release Mythos publicly. The API pricing would be $25 per million input tokens, $125 per million output tokens. Given the hype, the capabilities, and the fact that Opus 4.6 helped drive Anthropic to a $30 billion annual revenue rate, they're leaving serious money on the table.

Maybe they genuinely prioritized safety. Maybe they don't have the capacity to serve the model at scale yet. Maybe they're planning to distill Mythos capabilities into the next Opus release and are buying time. Probably some combination.

But I'll give Amodei this: his history suggests safety concerns are real for him. According to a recent New Yorker piece, when he was still at OpenAI, he insisted on the "merge and assist" clause—a provision that OpenAI would stop competing and start helping if another safety-conscious project got close to AGI first. When Microsoft's funding deal included a provision letting them block OpenAI mergers, Amodei wrote that "80% of the charter was just betrayed." He apparently confronted Sam Altman about it, ultimately leading to his departure and founding of Anthropic.

So when Anthropic says they find it "alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety," I'm inclined to think they mean it.

Whether meaning it is enough is a different question. The model exists. The techniques to build it exist. Even if Anthropic holds back, OpenAI hinted that similar capabilities might arrive sooner than expected. The cybersecurity vulnerabilities Mythos can find aren't going away. And we're apparently building systems that are better at finding exploits than we are at patching them.

We've seen this movie before—technology moves faster than our ability to secure it, regulate it, or fully understand its implications. This time we're just getting better documentation about the gap.

— Mike Sullivan, Technology Correspondent