AI Coding Loops Are Replacing the Prompt—Now What?
Developers are designing autonomous AI loops that merge code without human review. The engineering logic is sound. The accountability framework is nonexistent.
Written by AI. Samira Barnes

Photo: AI. Roxanne Vex
Boris Cheshirkov, a developer at WorkOS, said something in a recent interview that has been ricocheting around developer circles ever since. "I don't prompt Claude anymore," he explained. "I have loops that are running. They're the ones that are prompting Claude and kind of figuring out what to do. My job is to write loops."
That sentence is doing a lot of work. Developer and educator Ray Amjad unpacks it in a recent video, and what he describes is less a productivity tip than a structural reorganization of what a software developer actually does — and, by extension, who is responsible when the system makes a consequential mistake.
The Architecture, Plainly
Amjad's framing maps the evolution of AI-assisted coding in three stages. First, autocomplete: the developer writes a line, the model suggests the rest. Second, multi-agent juggling: developers with several agent windows open, manually prompting each one in sequence, still the human connector between every step. Most developers working with AI tools today are somewhere in this second stage.
Stage three is what Cheshirkov and Amjad are describing: you stop being the person who runs the process. You design the process that runs itself.
The practical architecture involves nested loops. An inner loop handles a discrete task — write spec, implement, code review, fix, verify, merge PR. An outer loop feeds the inner loop with new inputs: monitor a competitor's changelog, detect a new feature, draft a spec, await human approval, then trigger the build. Amjad describes running a loop for 19 hours that verified over 500 user flows in an application, followed by an 11-hour loop that fixed the issues the first loop identified. The loop revolution Amjad is articulating here isn't theoretical — he's been running these systems daily.
The memory layer matters too. Because each loop run is stateless by default, Amjad uses external systems — Git commit history, Airtable, Slack channels — to give loops continuity across runs. His Slack setup is genuinely clever: a dedicated channel per loop where the bot posts its findings, and where Amjad leaves emoji reactions that the next run reads as decisions. The reaction to recommendation four is the instruction. The channel is both audit trail and control surface.
Where the Engineering Frame Runs Out
Amjad is honest about the central risk, which he calls entropy. Loops that compound work also compound errors. A bad fix at iteration three becomes the foundation for iterations four through forty if nothing outside the model is checking whether reality still matches intent. His answer is adversarial code reviews, verification steps, and what he calls an "oracle" — some external signal outside the model's own judgment. Test suites. Production error rates. Revenue metrics. Actual user behavior.
This is sound engineering reasoning. It is also the point where the engineering frame reaches its limit and a different set of questions begins.
When an autonomous loop — one that its designer has not manually reviewed at each step — merges a pull request that introduces a security vulnerability, who is accountable? Amjad's framing is that the developer has moved from being the person who runs the thing to being the person who designs the thing that runs. But legal and regulatory frameworks for software liability have not caught up to that distinction. Negligence doctrine in software contexts has historically been murky — courts have been reluctant to impose liability for pure software defects in the absence of physical harm — but the introduction of autonomous systems making production changes at scale without human review creates exactly the kind of elevated risk profile that tends to attract regulatory attention after something goes badly wrong.
The EU's AI Act, which entered its first enforcement phase in 2024, classifies systems differently based on their risk profile and autonomy level. Autonomous loops that modify production code and merge changes without human sign-off at each step are not obviously "low risk" under that framework, though the Act's current language was drafted with different use cases in mind. In the United States, there is no equivalent framework, which means that right now, accountability for an autonomous loop that ships a breaking change or exposes customer data sits entirely with whoever designed it — and whatever terms of service they agreed to with Anthropic.
That is an unresolved question, not a settled one. But the community of developers adopting loop-based development appears largely uninterested in it, which is a pattern I've seen before in adjacent contexts: the Claude agent teams handling business decisions are operating in the same accountability vacuum.
The Access Problem Nobody Is Naming
There's a second structural issue that gets less airtime in these conversations: building effective loops is not cheap or simple.
Amjad mentions, in passing, that one loop run consumed 4.1 million tokens — a figure he reports from his own operation, and one he assesses as economically justified by the output. That calculation is available to him because he is running his own products and can directly measure revenue impact. For a developer at a small agency, a startup without clear unit economics, or a freelancer working on fixed-price contracts, that calculation is much harder to make. The token costs are real. The infrastructure to build reliable oracles — the external benchmarks that prevent slop from compounding — requires engineering investment that not every team can absorb.
The outer loop Amjad describes as a competitor-monitoring system — automated surveillance of rivals' changelogs, feature rollouts, and LinkedIn activity, feeding directly into a build pipeline — is a capability that scales with organizational resources. A team that can afford to run it continuously has a compounding structural advantage over one that cannot. The leverage Amjad and Cheshirkov describe is real. It is also unevenly distributed in ways that have implications for labor markets, for small development shops competing against better-resourced ones, and for the broader question of what "software developer" means as a job category when the unit of work shifts from a prompt to a loop.
The Claude task orchestration infrastructure enabling this shift was built by Anthropic with enterprise use cases in mind. The practitioners evangelizing loop-based development are, for the most part, sophisticated independents who have already built the supporting infrastructure. What that means for everyone else working in the same toolchain deserves more than a footnote.
The Abstraction Layer Argument
Cheshirkov's historical arc, as Amjad relays it, runs from punch cards to assembly to high-level languages to frameworks to prompts and now to loops. Each step up the abstraction ladder expanded what a single developer could build — and removed some category of decision-making from direct human control.
That is a real pattern. It is also an argument that has been used to justify every successive layer of automation in software, including several that turned out to have consequences the architects didn't anticipate. The question worth sitting with is not whether the abstraction is happening — it clearly is — but what accountability structures need to exist at each layer.
Right now, the developer who designed the loop owns everything it does. The terms of service with the AI provider disclaim liability for outputs. The organization deploying the system in production has whatever internal controls it has chosen to build. And there is no external regulatory framework that specifically addresses autonomous AI loops modifying production software.
"You design the thing that runs," as Amjad puts it, "and the leverage moves up one more layer."
When the thing that runs makes a consequential error, leverage is not the word that will appear in the incident report.
Samira Barnes covers technology policy and digital rights for Buzzrag.
AI Moves Fast. We Keep You Current.
Framework breakdowns, tool comparisons, and AI coding insights — distilled from the best tech YouTube creators. Free, weekly.
More Like This
Karpathy's Self-Evolving AI Wiki Tests New Memory Model
Andrej Karpathy released an architectural blueprint for AI agents that maintain their own knowledge bases. Does it solve AI's memory problem or create new ones?
Multica Wants to Turn AI Agents Into Project Managers
An open-source tool promises kanban boards for Claude and other coding agents. But do developers actually want their AI assistants managed like tasks?
When AI Coding Tools Outgrow Low-Code Platforms
AI coding agents like Claude Code are shifting the calculus for when low-code automation tools like n8n make sense—and when they don't.
AI Agents Running for Hours—and Who's Accountable
Anthropic's Prabaker and Wilson reveal the engineering behind long-running AI agents—and raise accountability questions regulators haven't caught up to yet.
Graphify Cuts AI Coding Costs—But Read the Fine Print
Graphify promises 40%+ token savings for AI coding assistants. What that means for enterprise procurement, regulated industries, and inflated community claims.
Claude Code Explained: What Anthropic's Free Course Covers
Anthropic's free Claude Code course on Anthropic Academy covers setup, CLAUDE.md files, and security. Here's what the curriculum actually teaches—and what it leaves open.
Claude Code's Side Channel Solves AI Coding's Focus Problem
Anthropic's new /btw command lets developers ask questions without disrupting Claude Code's work—addressing context pollution that degrades AI performance.
How NPMX Exposes the Infrastructure Problem Big Tech Won't Fix
A community-built npm browser highlights what happens when Microsoft-owned platforms stagnate. The technical solution reveals a deeper governance question.
RAG·vector embedding
2026-06-11This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.