AI Coding Agents That Run Their Own Loops

There's a specific kind of developer productivity advice that sounds profound until you try it and spend two hours cleaning up the wreckage. "Just let the agent handle it" has been one of those. For most people who've actually used AI coding tools beyond the demo reel, the reality has been more like: agent makes plan, developer reads plan, developer hand-holds agent through execution step by step, developer pastes reviewer comments back into agent, developer merges PR. The human is still the nervous system of the whole operation.

Theo — the developer behind the t3.gg ecosystem and a generally reliable signal-to-noise ratio in the AI tools space — recently documented his own conversion on this point, and the details are worth sitting with.

His central argument, laid out in a recent video: stop writing prompts for your agents. Start designing loops that let agents write prompts for themselves.

That's a sentence that would've been meaningless two years ago. It's worth unpacking what it actually means in practice.

The handholding problem

Theo's starting point is honest about where most developers actually are. He describes his old workflow with some self-awareness: "asking the model to make a plan, reading the plan, saying 'yeah that looks good,' go do this part and then the next part, then having another agent review it, then bringing the feedback back to the first agent." The loop existed. He was just the one running it.

This is the copy-paste era's spiritual successor. We graduated from pasting code snippets out of ChatGPT into our editors. We got AI that could edit files directly. But the cognitive load of orchestrating that work — deciding when to spin up a reviewer, when to pass feedback back, when to trigger the next stage — stayed firmly with the human.

What Theo describes trying is closing that loop at the agent level. Concretely: he told Claude Code to monitor a pull request for incoming review comments from automated tools, then address those comments autonomously, then trigger a re-review. He set this running on a separate machine, on an isolated work tree, and left it alone for six-plus hours. When he checked back in, a meaningful amount of review-driven improvement had happened without him touching a keyboard.

That's not magic. It's plumbing. But it's plumbing that most developers haven't built yet.

Loops that make loops

The more interesting experiment comes when Theo tackles a multi-PR refactor — a complex piece of work on his Lakebed project involving data architecture changes that couldn't reasonably land as a single pull request. After getting the model to break down the work and generate HTML plans for each phase (an organizational pattern he credits to another developer, Thoric), he asked it something he didn't expect to work:

"Would it be possible to make a workflow of some form that first will spin up a separate thread to make the PR, second, spin up another thread to review that PR when it's filed, three, puts the thread from one in a loop reviewing comments until it gets all approvals, and then fourth, the thread would merge the PR and trigger another one for the next piece."

The agent designed a workflow with a heartbeat — polling every five to ten minutes, checking PR status, spinning up fresh review threads on new commits, sending findings back to the implementation thread, and chaining to the next PR on completion. Theo went to sleep. He woke up to four stacked PRs, reviewed and merged.

The part worth pausing on isn't the outcome. It's the structure. This wasn't a hardcoded pipeline someone built in advance. The workflow was generated dynamically, shaped to the specific contours of this specific problem. Different problem, different loop. That's a genuinely different model of software development infrastructure than what most teams are running.

Theo draws an analogy to agile sprints — the two-week cycle of backlog grooming and ticket prioritization that most engineering teams treat as fixed geometry. "We kind of had to force our work to fit that shape," he says. "The shape of the loop, the shape of the structure, the shape of how work happens can be dynamically generated based on the shape of the work that you're doing."

The cost question isn't minor

Here's where the honesty gets useful. Theo doesn't paper over the economics. When a loop goes wrong — or even when it goes right but inefficiently — it can burn tokens at a rate that would make you wince. He describes one instance where a reviewer left three relatively small comments on a PR. The Opus-powered response loop ran for eight hours and consumed over three million tokens.

That's not a typo. Eight hours. Three million tokens. Three comments.

His mitigation is essentially: subscription plans change the math entirely. On the $200/month Claude Code plan, he tracked roughly $10,000 of inference across multiple machines in the first 17 days of June — running loops aggressively, including multiple concurrent ones. At API rates, that number would be financially ruinous. Under the subscription model, it cost him $600 across three accounts.

The implication is structural: agentic loops at this scale are currently a feature of subscription pricing, not a general best practice. If you're paying per token, "let the agent loop" is advice that could get expensive fast. If you're on a flat-rate plan and not approaching your limits, you're effectively leaving compute on the table.

Theo is transparent that he's on a $200 plan and that this shapes his calculus. That context matters. The advice reads differently for a solo developer on API credits versus someone with a subscription burning at 30% capacity.

What this isn't

It's worth noting what Theo explicitly doesn't claim. He's not arguing for fully autonomous loops that ship production code to millions of users with no oversight. "I am not at the fully autonomous loop point yet," he says, and he flags the "code is just happening by itself" posture of some other developers as not something he endorses.

He also pushes back on a specific flavor of agentic setup that's become fashionable: pre-defining roles and personas for sub-agents in markdown files. The adversarial reviewer. The security auditor. The grooming agent. His critique is that hardcoding these roles misses the actual value of dynamic agents — they can determine what context they need and what role they should play based on the problem in front of them. Scaffolding that in advance is, in his framing, like creating a project template where every file already exists and you just fill in the blanks.

That's a reasonable critique, though it's worth noting that the tradeoff isn't entirely one-sided. Predictable, auditable workflows have their own value in team settings, especially where compliance or security review matters. Dynamic loops that self-organize are impressive; they're also harder to reason about after the fact.

The underlying shift

What Theo is documenting is less a technique than a change in where the developer's attention should live. His practical recommendation: map everything you do after your agent finishes a task — running the dev server, checking if things work, committing, pushing, filing the PR, collecting reviewer feedback, addressing it, merging — and then ask whether the agent could do each of those steps itself.

"If you are reading the code your agent put out before another agent read it and gave feedback on it," he argues, "you're wasting your own time."

That framing will land differently depending on how much you trust current models to catch their own mistakes. There's a real question about whether agent-reviews-agent actually catches errors that the original agent made — or whether two systems with similar priors just validate each other's blind spots. Theo's answer, implicitly, is: try it and find out. His results were good enough to keep running.

The honest version of this story is that we're watching one developer's discovery process in real time. The loops worked for his project, on his infrastructure, with his risk tolerance. The patterns he's identifying — monitor PRs for feedback, close the review loop autonomously, let the model design the workflow structure rather than hardcoding it — are likely to become standard practice. The question of when they're ready for code that actually has millions of users on the other end is one the industry hasn't answered yet.

Marcus Chen-Ramirez is a senior technology correspondent for Buzzrag covering AI, software development, and the places where technology meets the rest of life.