Anthropic Built an AI Too Dangerous to Release Publicly

Anthropic just did something unusual in the AI world: they built a model so capable that they decided not to release it to the public. And the weird part? That might be the first genuinely good news story about AI deployment we've had in a while.

The model is called Claude Mythos, and according to AI automation specialist Nate Herk, it found more security vulnerabilities in a few weeks than most researchers discover in their entire careers. We're talking about a 27-year-old bug in OpenBSD that could remotely crash any server running the operating system. A bug in FFmpeg—the software that processes video across basically the entire internet—that somehow survived 5 million automated tests over 16 years. Multiple Linux vulnerabilities that could escalate a user with zero permissions straight to admin access.

But here's what makes this different from typical AI hype: Anthropic didn't set out to build a hacking tool. They trained Mythos to be exceptionally good at writing code. The ability to break code came along for the ride, completely unsolicited.

When Better Code Means Better Hacking

The benchmarks tell a striking story. On the SWE-bench, which measures how well AI can fix real-world software bugs, Anthropic's previous best model (Claude Opus) scored 80.8%. Mythos hit 93.9%. On cybersecurity benchmarks measuring exploit detection, Opus managed 66.6% while Mythos reached 83.1%.

Those aren't incremental improvements—they're generational leaps. And they reveal something uncomfortable about how AI capabilities scale: get better at one thing, and you might accidentally get better at its opposite too. It's the locksmith problem. Train someone to understand locks deeply enough, and they inherently understand how to pick them. The skills aren't separable.

What really sets Mythos apart, according to Herk, is its ability to chain vulnerabilities together. It doesn't just spot individual bugs—it identifies how three or four small weaknesses can be linked into a complete attack chain. That's elite-level human hacker territory, the stuff that requires both technical depth and creative lateral thinking.

Which creates an obvious dilemma: if you've built something that could dramatically improve internet security or dramatically compromise it depending on whose hands it's in, what do you do?

The Defender's Advantage

Anthropochoose a third option between "release it publicly" and "lock it in a vault forever." They launched Project Glasswing—essentially giving the good guys a head start.

The partner list reads like a who's-who of internet infrastructure: AWS, Apple, Google, Microsoft, Nvidia, Cisco, Crowdstrike, JP Morgan. These companies get direct access to Mythos to scan their own systems, find vulnerabilities before attackers can, and patch them before the exploits become public knowledge. Anthropic also opened access to over 40 organizations maintaining critical open-source software, committed $100 million in usage credits, donated $4 million directly to security groups, and entered discussions with the U.S. government.

The commitment includes sharing their findings publicly within 90 days—so this isn't indefinite gatekeeping. It's a deliberate attempt to tip the scales toward defenders in what's become an increasingly asymmetric battle.

Herk sees this as potentially precedent-setting: "This may be the first time that a major AI lab has essentially said, 'We built something too powerful to release, but here is our plan.'"

Boris Churnney, the creator of Claude Code, framed it bluntly on Twitter: "Mythos is very powerful and should feel terrifying. I am proud of our approach to responsibly preview it with cyber defenders rather than generally releasing it into the wild."

What Actually Changes for Regular People

If you're not running security operations at a Fortune 500 company, the benefits here are indirect but real. When Mythos identifies a vulnerability in Linux, or a web framework, or video processing software, the patches eventually reach everyone who uses those systems. You won't see it happen—one day there'll just be a software update, and behind it is an AI that caught something a human probably would've missed.

For small businesses, this potentially democratizes security in an interesting way. Enterprise-level companies can afford red teams, penetration testing, million-dollar security audits. Small businesses get antivirus software and hope. What Glasswing does is essentially trickle down Fortune 500-grade vulnerability detection to everyone who uses the infrastructure those big companies maintain. You don't pay for it, you don't configure it, but you benefit from it.

That's the optimistic read, anyway. The pessimistic read is that we're in an arms race with no finish line.

The Uncomfortable Trajectory

Here's the part that should keep people up at night: Mythos won't stay unique for long. Every major AI lab is building increasingly capable coding models right now. And if coding ability automatically confers hacking ability—which seems to be the case—then every frontier model currently in training will become a better hacker whether its creators intend that or not.

Herk notes that what Mythos can do today, smaller open-source models will probably be able to do in 12 to 24 months. The genie doesn't go back in the bottle. You can't uninvent this capability. You can't keep it secret forever when multiple labs are converging on similar capabilities independently.

So the real question isn't whether Anthropic made the right call with Mythos—it probably did. The question is whether this approach becomes standard practice across the industry, or whether it remains a one-time curiosity. Will OpenAI take similar precautions with its next model? Will Google? Will Meta?

The labs that build safety deployment plans before they're needed will likely be the ones we trust with increasingly powerful systems. The ones that don't will be the ones generating the catastrophic headlines.

This is fundamentally an arms race that may never end. AI capabilities will keep scaling. What feels impossibly advanced today will be commodity technology in 18 months. The defensive applications and offensive applications will continue to coexist in the same models, inseparable by design.

But for the first time, at least one major lab has given defenders a meaningful head start. Whether that becomes the norm or remains an exception will tell us a lot about where this technology is actually headed—and who's steering it.

—Zara Chen