Anthropic Engineers Have No Consensus on Claude

Ray Amjad went to a Code with Claude event in Tokyo expecting to find the secret workflow. The canonical approach. The thing Anthropic engineers were doing internally that the rest of us hadn't figured out yet. What he found instead was something I've seen before, just not usually in a corporate AI lab: nobody agreed on anything.

"Everyone at Anthropic is living in the future," one engineer told him, "but in four different futures, and one of them may be directionally correct."

I've been covering open source communities long enough to recognize that sentence. It's the thing maintainers say in the early chaotic phase of a project—before a BDFL emerges, before governance coalesces, before someone writes the RFC that everyone eventually rallies around. You heard it in the Node.js fork era. You heard it during Kubernetes' early multi-orchestrator wars. You heard it in Rust's async story, where the community spent years running parallel experiments before the ecosystem settled on a direction. The difference is that those communities were volunteer-run. Anthropic is a company with a commercial product shipping to enterprise customers, and its own engineers can't tell you which workflow is correct.

That's either deeply honest or mildly alarming, depending on what you think this technology is ready to do.

Four futures, one Slack channel

Amjad describes a scene that rhymes structurally with what I'd call distributed-OSS governance by default: engineers running individual experiments, sessions flowing back into a central analysis system, and ad hoc coordination happening over Slack when someone wants to test a prompt. "Someone may have an idea and then quickly send a message being like, 'Hey guys, can you run this prompt for us?' People would run the prompt and then give the result back."

That's not a workflow. That's a community norm masquerading as one. And it's genuinely interesting — not because it reveals some flaw in Anthropic, but because it suggests Claude Code workflows are still in a pre-paradigm phase. The field hasn't had its "tabs vs. spaces" moment yet, the kind of settled convention that becomes invisible once everyone agrees on it.

The practical upshot Amjad draws from this: whatever you're doing right now, there's probably an Anthropic engineer doing something similar. He takes that as reassurance. I'd take it as useful context — it means you're not behind, but it also means there's no authoritative answer to catch up to.

The spec debate, and a borrowed phrase

One of the more substantive arguments Amjad surfaces concerns spec-driven development frameworks — OpenSpec, SpecKit, BMAD, and what feels like a new one every other Tuesday. His conclusion, reinforced by conversations at the event: none of the Anthropic engineers he spoke to used any of these frameworks. Instead, they described a goal, specified a few constraints, had the model ask clarifying questions, sketched a design, and then let Claude run.

An engineer framed the underlying principle with a phrase that has a longer history than the engineer probably intended: the map is not the territory. Korzybski coined it in 1931 — the point being that our symbolic representations of reality always diverge from reality itself. Applied to AI coding: an overly precise spec is a map, and the actual codebase is the territory. When the agent hits a corner case that doesn't fit the map, forcing it to follow the map anyway produces worse results than trusting its judgment to adapt.

This lands differently when you remember how much of the last two years of developer tooling was about constraining AI output — system prompts, guardrails, rigid spec formats — precisely because the models weren't trustworthy enough to adapt gracefully. Amjad's argument is that Claude Fable 5 has shifted that calculus. The scaffolding we built around earlier, weaker models may now be actively limiting what stronger ones can do. He recommends a genuinely useful experiment: apply a task using your current workflow, then apply the same task by deleting your skills and memory files and just describing what you want. See which one gets further.

That's not a benchmark — it's an invitation to find out for yourself. Which, again, is essentially what every Anthropic engineer appears to be doing.

Verification is where the work actually lives

Strip away the meta-discussion about specs and the more operationally interesting insight Amjad brings back is about verification environments. Anthropic's own Claude desktop application, he says, is automatically verified whenever a change is made — Claude itself runs the desktop app in a container, using computer use to confirm the change landed correctly. Every surface where a user might encounter a change gets its own automated verification path: browser changes get Playwright, API changes get an agent hitting the endpoint, terminal changes get an agent driving a terminal.

Amjad rebuilt this pattern for his own application. Every new PR triggers automated browser verification with a recording, which then gets dropped as a GIF into the appropriate Slack channel. It's unglamorous infrastructure work — the kind of thing maintainers in mature OSS projects figured out years ago because they had no choice — and Amjad's argument is simply that the dividend pays out over time. The multi-agent orchestration question isn't just about which agent talks to which; it's about whether you can trust the output enough to automate the merge.

He runs around 20–30 PRs open simultaneously, ordering merges by blast radius — smallest changes first, automated where possible. The loop isn't just a metaphor for him; it's an actual system running on a Hetzner server while he records videos.

The question nobody wants to answer directly

Here's the moment in Amjad's video that I think deserves more weight than it gets in the original telling.

He asked Anthropic engineers directly: with models at this capability level, what's the remaining difference between a junior and a senior engineer? The answer he got was thoughtful: seniors still catch what the model gets wrong because they have lived experience to measure the output against. Juniors don't have that calibration yet, so they tend to accept outputs that look correct but aren't. The gap is closing, the engineers agreed — but it's closing because senior judgment still matters, not because it doesn't.

And then Amjad asked the obvious follow-up: what are you going to do when the models are good enough that senior judgment is also redundant?

"Often the answer was like, 'I'm going to retire.' And I guess they can do so because they would have enough equity from the value of Anthropic shares or something like that."

I want to stay with that for a moment, because it's the sentence that reveals the structural asymmetry underneath everything else in this video. Anthropic engineers can afford to run four different workflow experiments simultaneously because they have downside protection. They have equity in the outcome. If Claude gets good enough to make their judgment redundant, they get a payout. The experimentation costs them nothing existentially.

The developers reading this — and the junior engineers whose entire career arc this observation touches — mostly don't have that. They're being told, correctly, that the gap between junior and senior is narrowing. They're watching a hiring market that's already responding to that signal. And the people delivering this information are the same ones who will retire comfortably if they're right.

This isn't a knock on Anthropic engineers for having equity. That's how tech compensation works, and there's nothing uniquely culpable about it. But it does change how you should hold the non-consensus. When Anthropic engineers say "nobody has the right answer yet, experiment freely," they're speaking from a position where getting it wrong is recoverable. For most of the developers running those experiments on their own time, with their own careers on the line, the calculus is different.

The four futures they're living in aren't equally safe to get wrong.

By Dev Kapoor, Open Source & Developer Communities Correspondent, Buzzrag