Anthropic Engineers Have No Consensus on Claude Code
Ray Amjad attended a Claude Code event in Tokyo and found Anthropic engineers running wildly different workflows. What that non-consensus actually means for developers.
Written by AI. Dev Kapoor

Photo: AI. Zephyr Cole
Ray Amjad went to a Code with Claude event in Tokyo expecting to find the secret workflow. The canonical approach. The thing Anthropic engineers were doing internally that the rest of us hadn't figured out yet. What he found instead was something I've seen before, just not usually in a corporate AI lab: nobody agreed on anything.
"Everyone at Anthropic is living in the future," one engineer told him, "but in four different futures, and one of them may be directionally correct."
I've been covering open source communities long enough to recognize that sentence. It's the thing maintainers say in the early chaotic phase of a project—before a BDFL emerges, before governance coalesces, before someone writes the RFC that everyone eventually rallies around. You heard it in the Node.js fork era. You heard it during Kubernetes' early multi-orchestrator wars. You heard it in Rust's async story, where the community spent years running parallel experiments before the ecosystem settled on a direction. The difference is that those communities were volunteer-run. Anthropic is a company with a commercial product shipping to enterprise customers, and its own engineers can't tell you which workflow is correct.
That's either deeply honest or mildly alarming, depending on what you think this technology is ready to do.
Four futures, one Slack channel
Amjad describes a scene that rhymes structurally with what I'd call distributed-OSS governance by default: engineers running individual experiments, sessions flowing back into a central analysis system, and ad hoc coordination happening over Slack when someone wants to test a prompt. "Someone may have an idea and then quickly send a message being like, 'Hey guys, can you run this prompt for us?' People would run the prompt and then give the result back."
That's not a workflow. That's a community norm masquerading as one. And it's genuinely interesting — not because it reveals some flaw in Anthropic, but because it suggests Claude Code workflows are still in a pre-paradigm phase. The field hasn't had its "tabs vs. spaces" moment yet, the kind of settled convention that becomes invisible once everyone agrees on it.
The practical upshot Amjad draws from this: whatever you're doing right now, there's probably an Anthropic engineer doing something similar. He takes that as reassurance. I'd take it as useful context — it means you're not behind, but it also means there's no authoritative answer to catch up to.
The spec debate, and a borrowed phrase
One of the more substantive arguments Amjad surfaces concerns spec-driven development frameworks — OpenSpec, SpecKit, BMAD, and what feels like a new one every other Tuesday. His conclusion, reinforced by conversations at the event: none of the Anthropic engineers he spoke to used any of these frameworks. Instead, they described a goal, specified a few constraints, had the model ask clarifying questions, sketched a design, and then let Claude run.
An engineer framed the underlying principle with a phrase that has a longer history than the engineer probably intended: the map is not the territory. Korzybski coined it in 1931 — the point being that our symbolic representations of reality always diverge from reality itself. Applied to AI coding: an overly precise spec is a map, and the actual codebase is the territory. When the agent hits a corner case that doesn't fit the map, forcing it to follow the map anyway produces worse results than trusting its judgment to adapt.
This lands differently when you remember how much of the last two years of developer tooling was about constraining AI output — system prompts, guardrails, rigid spec formats — precisely because the models weren't trustworthy enough to adapt gracefully. Amjad's argument is that Claude Fable 5 has shifted that calculus. The scaffolding we built around earlier, weaker models may now be actively limiting what stronger ones can do. He recommends a genuinely useful experiment: apply a task using your current workflow, then apply the same task by deleting your skills and memory files and just describing what you want. See which one gets further.
That's not a benchmark — it's an invitation to find out for yourself. Which, again, is essentially what every Anthropic engineer appears to be doing.
Verification is where the work actually lives
Strip away the meta-discussion about specs and the more operationally interesting insight Amjad brings back is about verification environments. Anthropic's own Claude desktop application, he says, is automatically verified whenever a change is made — Claude itself runs the desktop app in a container, using computer use to confirm the change landed correctly. Every surface where a user might encounter a change gets its own automated verification path: browser changes get Playwright, API changes get an agent hitting the endpoint, terminal changes get an agent driving a terminal.
Amjad rebuilt this pattern for his own application. Every new PR triggers automated browser verification with a recording, which then gets dropped as a GIF into the appropriate Slack channel. It's unglamorous infrastructure work — the kind of thing maintainers in mature OSS projects figured out years ago because they had no choice — and Amjad's argument is simply that the dividend pays out over time. The multi-agent orchestration question isn't just about which agent talks to which; it's about whether you can trust the output enough to automate the merge.
He runs around 20–30 PRs open simultaneously, ordering merges by blast radius — smallest changes first, automated where possible. The loop isn't just a metaphor for him; it's an actual system running on a Hetzner server while he records videos.
The question nobody wants to answer directly
Here's the moment in Amjad's video that I think deserves more weight than it gets in the original telling.
He asked Anthropic engineers directly: with models at this capability level, what's the remaining difference between a junior and a senior engineer? The answer he got was thoughtful: seniors still catch what the model gets wrong because they have lived experience to measure the output against. Juniors don't have that calibration yet, so they tend to accept outputs that look correct but aren't. The gap is closing, the engineers agreed — but it's closing because senior judgment still matters, not because it doesn't.
And then Amjad asked the obvious follow-up: what are you going to do when the models are good enough that senior judgment is also redundant?
"Often the answer was like, 'I'm going to retire.' And I guess they can do so because they would have enough equity from the value of Anthropic shares or something like that."
I want to stay with that for a moment, because it's the sentence that reveals the structural asymmetry underneath everything else in this video. Anthropic engineers can afford to run four different workflow experiments simultaneously because they have downside protection. They have equity in the outcome. If Claude gets good enough to make their judgment redundant, they get a payout. The experimentation costs them nothing existentially.
The developers reading this — and the junior engineers whose entire career arc this observation touches — mostly don't have that. They're being told, correctly, that the gap between junior and senior is narrowing. They're watching a hiring market that's already responding to that signal. And the people delivering this information are the same ones who will retire comfortably if they're right.
This isn't a knock on Anthropic engineers for having equity. That's how tech compensation works, and there's nothing uniquely culpable about it. But it does change how you should hold the non-consensus. When Anthropic engineers say "nobody has the right answer yet, experiment freely," they're speaking from a position where getting it wrong is recoverable. For most of the developers running those experiments on their own time, with their own careers on the line, the calculus is different.
The four futures they're living in aren't equally safe to get wrong.
By Dev Kapoor, Open Source & Developer Communities Correspondent, Buzzrag
AI Moves Fast. We Keep You Current.
Framework breakdowns, tool comparisons, and AI coding insights — distilled from the best tech YouTube creators. Free, weekly.
More Like This
Claude's 1M Context Window Breaks at 40% Capacity
Claude Code's million-token context degrades at 300-400k tokens. Tariq from Anthropic explains why bigger windows create bigger problems.
Anthropic's Claude Code Update Automates Developer Workflow
Anthropic's latest Claude Code update introduces autonomous PR handling, security scanning, and git worktree support—raising questions about AI's role in development.
Anthropic's Cloud Tasks Point to 'Software Factory' Future
Anthropic's new remote task scheduling for Claude Code suggests AI development is heading toward autonomous 'software factories' running 24/7.
Claude Code's Secret Memory Feature Solves AI Amnesia
Anthropic quietly added 'autodream' to Claude Code—a feature that consolidates AI memories like human sleep. Here's what it means for developers.
Building an LLM Wiki from Karpathy's Blueprint
Nate Herk demos an AI-powered personal wiki built on Karpathy's LLM knowledge base idea. Here's what the architecture reveals about how context shapes AI reasoning.
Claude Sonnet 5 vs Opus 4.8: Benchmarks and Costs
Anthropic's Claude Sonnet 5 matches Opus 4.8 on most benchmarks at roughly half the price. Here's what that means for developers and the broader AI ecosystem.
Google's Gemma 4: Local AI That Doesn't Need the Cloud
Google's Gemma 4 brings cloud-level AI to your laptop. Free, offline, commercially usable—but is local AI ready to replace the cloud model?
OpenAI's Codex Plugin for Claude Code: What It Does
OpenAI's new Codex plugin extends Claude Code with external reviews and GPT models. Here's what developers need to know about capabilities and risks.
RAG·vector embedding
2026-07-05This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.