Claude Opus 4.7 Spotted as Quality Complaints Mount

Something weird is happening with Claude Opus 4.6, and the community has receipts.

Users have been reporting degraded performance for weeks—less sharp responses, inconsistent reasoning, rate limits hitting way faster than they used to. The kind of stuff that makes you wonder if you're imagining things or if the model actually changed. Turns out, maybe you're not imagining it.

Bridgebench, a benchmark platform, retested Claude Opus 4.6 this week. Last week, it ranked #2 on their hallucination benchmark with 83.3% accuracy. After the retest? It dropped to 10th place with 68.3% accuracy. That's a 15-percentage-point nosedive in hallucination accuracy in a matter of days.

Now, Anthropic hasn't confirmed anything about intentionally nerfing the model (and "nerfing" might not even be the right word here). But the timing is... interesting. Internal API references have reportedly spotted Claude Opus 4.7, which typically happens shortly before a public release.

The token tax theory

There's another wrinkle. Users analyzing HTTP proxy requests from Anthropic's Claude Code spotted something odd: newer versions appear to inject around 20,000 additional build tokens server-side, even when the actual input size stays the same. That would explain why people are slamming into rate limits faster than expected, especially those on higher-tier plans who should theoretically have more headroom.

One workaround making the rounds: downgrading Claude Code to version 2.1.98 via npx [email protected], which reportedly doesn't have this token overhead. Nothing official from Anthropic on whether this is intentional optimization, a bug, or infrastructure prep for something bigger.

The cynical read: they're making the current model feel weaker so the next one feels like a bigger leap. The charitable read: they're reallocating resources as they prep for Opus 4.7. Both could be true. OpenAI has done similar things before major model drops—shifting capacity, tweaking rate limits, optimizing costs.

Either way, if you're paying real money for Claude Code or an Anthropic Pro plan and noticing worse performance, you're not alone.

Anthropic's bigger play

Meanwhile, Anthropic isn't just iterating on models. They're reportedly building a full-stack AI studio—think Google AI Studio but for Anthropic. This would be a complete vibe-coding environment where you can build and deploy apps directly from the platform, putting it in direct competition with tools like Lovable and Replit.

They've also pushed an update to Claude Code on desktop that introduces a more unified interface, letting developers work across multiple repositories in a single instance. The pattern is clear: Anthropic isn't just selling a model anymore. They're building an entire developer ecosystem.

OpenAI fires back with image gen v2

Over at OpenAI, signs point to GPT Image Gen v2 dropping as early as this week—possibly Tuesday or Thursday. The company is reportedly A/B testing the new image model inside ChatGPT, which usually means a public rollout is imminent.

Early outputs are wild. The model can now generate UI screenshots with actual coherent text, including a Slack channel where fake Anthropic employees discuss whether OpenAI or xAI poses a greater threat. All the usernames are Anthropic-themed, the conversation flows naturally, and the formatting looks legit. That level of text coherence in generated images has been notoriously difficult for diffusion models.

OpenAI is also launching a new $100/month ChatGPT Pro tier aimed squarely at developers doing heavy coding work. You get 5x more usage than the Plus plan (boosted to 10x through May 31st), plus access to all the existing Pro features—higher-end models, extended thinking capabilities, the works. The timing is either coincidence or perfectly calculated: Anthropic's rate limit drama makes this the ideal moment to pitch an alternative.

The "open source" that isn't

MiniMax released their M2.7 model this week with a blog post claiming it's "fully open source." The model weights are available, which is true. But the license restricts commercial use without authorization, which means it doesn't meet the Open Source Initiative's definition of open source.

The community pushed back immediately, because of course they did. This isn't the first time a company has slapped "open source" on a model with restrictive licensing, and it won't be the last. The term has become marketing gold, even when it's technically inaccurate.

Capabilities-wise, M2.7 is solid. People are running it locally on 4x DGX setups, serving full BF16 weights up to 200K context. With tools like OpenCode, the model can even monitor its own hardware in real time—reporting thermals, tokens per second, time to first token. So while the "open source" claims are debatable, the model itself is genuinely capable.

There's also Gemopus-4-26B from Jack Ron—a fine-tuned version of Gemma 4 26B designed to reason more like Claude Opus 4.6. In testing on dual RTX 3090 setups, it performs well on data reasoning and signal filtering but falls apart on longer coding tasks. Fast and decent for agent-style workflows, unreliable for deep debugging. Still, the fact that a 26B parameter model can approximate Opus 4.6's reasoning style at all is notable.

When AI trains on the people it might replace

Here's the uncomfortable part: reports from India describe factory workers wearing head-mounted cameras that record their hand movements to train AI systems for industrial automation. On the surface, it's framed as data collection to improve robotics. In practice, it's workers generating the training data for systems that could eventually make their jobs obsolete.

This isn't speculative dystopia—it's already happening. AI adoption isn't just a software story anymore. It's being built directly into physical labor pipelines, and the people doing that labor are often the ones creating the data that trains their replacements.

The productivity wedge

On a lighter note, Anthropic rolled out Claude for Word in beta this week. You can now draft, edit, and revise documents directly inside Microsoft Word via a sidebar powered by Claude. It preserves formatting, shows edits as tracked changes, and generally feels native to actual document workflows instead of the usual copy-paste dance.

Right now it's only available for team and enterprise users, but it's another example of AI embedding itself into the tools people already use daily. The friction is disappearing fast.

The pace here is almost hard to track. Google I/O is coming up (expect Gemini 3.5 or maybe even Gemini 4). DeepSeek v4 is reportedly close. More GPT models are in the pipeline. Each week compounds on the last, and the release cadence shows no signs of slowing.

The question isn't whether these tools will get more capable—they will. It's whether the infrastructure around them (rate limits, pricing tiers, actual reliability) can keep up with the hype.

—Yuki Okonkwo, AI & Machine Learning Correspondent