AI Coding Agents Need Structure, Not Just Speed

Here's the uncomfortable truth about AI coding assistants: they're making it easier than ever to build apps that break in production.

The promise sounds great—spin up an AI agent, describe what you want, and watch it code. But according to a detailed walkthrough from AI LABS, most developers skip the infrastructure that separates functional demos from production-ready software. The result is apps that work perfectly until they don't, AI agents that lose track of their own progress, and code optimized for implementation rather than actual requirements.

The video argues something counterintuitive: AI hasn't changed what matters in software development. It's just changed why it matters.

Six Decades of Process, New Reasons

"The processes that have been set over 60 years of product building are still as important today, just for different reasons," the narrator explains. "Before they were implemented to make sure that humans had a structured way to develop these products, but now that has shifted to enabling AI agents to work the way humans did."

This reframing is interesting. The discipline isn't about compensating for AI's weaknesses—it's about creating an environment where AI can actually function. Without structure, coding agents don't fail gracefully. They just fail.

The setup starts before a single line of code. AI LABS recommends creating a dedicated planner agent that interrogates you about requirements until it can generate a complete Product Requirements Document. Not Claude's built-in planning mode, which skews technical. A separate agent designed to ask product questions.

The distinction matters. Newer models are powerful enough that they don't need granular technical instruction. What they need is clarity about what they're building and why. The planner agent asks questions, refines understanding, then outputs a PRD that becomes the source of truth for everything downstream.

The Configuration File You Shouldn't Auto-Generate

Next comes the claude.md file, which functions as the agent's permanent instruction set. AI LABS is adamant: don't use Claude's init command to create this file automatically.

"This command just generates the file based on what the existing codebase is like, not what it actually needs to know," they explain. The file should contain conventions and constraints the AI can't deduce on its own—coding standards, writing style, project-specific rules. Not things it can figure out by reading the file structure.

This file stays loaded in context forever, so bloat is expensive. It's a living document you refine as you work, not a one-time setup. Path-specific rules handle implementation details for particular areas. The claude.md file is for broad principles.

Then come the agents and skills—specialized configurations for different workflows. A commit agent handles version control with conventional messages. A refactoring agent optimizes performance. A verification agent uses browser automation to test user flows. The video draws a line: repeatable workflows with consistent guidelines become skills. Tasks requiring dedicated context become agents.

What Not To Do

Here's where it gets interesting. Even with all these positive instructions, there's still a gap.

"Agents are biased toward action and may implement things beyond what your positive constraints specify," the narrator notes. "Therefore, you need to explicitly tell the agent what it should not do."

Negative constraints close the ambiguity that positive specs leave open. Don't want Claude defaulting to purple and blue color schemes? Say so explicitly. Don't want it adding features not in the PRD? State that clearly. AI agents optimize toward action, so you need boundaries that prevent experimentation where it's not wanted.

This feels reminiscent of early prompt engineering advice, but applied at infrastructure level. You're not just shaping individual outputs—you're shaping the agent's entire operational framework.

Memory That Prevents Amnesia

Two files prove critical as projects scale: progress and learnings.

Without a progress file, agents working on large applications lose track of completed features. They have to re-read implementations, compare against documentation, waste tokens reconstructing context. A single progress document solves this—one place that always reflects current state.

The learnings file serves as institutional memory for errors. When something breaks, the agent logs what went wrong, what caused it, how it was fixed. Next time a similar pattern emerges, it doesn't make the same mistake.

"Since both of these files are meant to be actively updated while the agent is implementing the app, you need to explicitly instruct the agent in the claude.md so that it keeps adding to these files," they emphasize. Self-documenting doesn't happen automatically. You have to structure it into the workflow.

Testing Before Implementation

The most counterintuitive recommendation: write tests before building features.

Most developers—even those using AI—implement first, then add tests. AI LABS argues this creates a fundamental optimization problem. If you ask an agent to test after implementation, it optimizes tests for what was built, not what was specified.

"The agent only knows what was actually implemented. It will optimize tests for the features as they exist, not for the functionality as required in the specifications," the video explains. This means missing edge cases, under-testing features that deviated from specs, and generally writing tests that verify the code rather than the requirements.

Writing tests from the PRD first—before implementation—forces the agent to reverse-engineer functionality from specifications. The tests verify intent, not just code. When implementation happens, those tests catch deviations.

This is test-driven development, just with AI as the developer. The principle hasn't changed. The actor has.

Issue Tracking From Day One

As apps scale, issue tracking becomes critical infrastructure. AI LABS recommends setting this up before development starts, not after problems emerge.

GitHub works for technical teams—proper commit messages create a breadcrumb trail the agent can follow. If something breaks, you can revert. If you want to experiment, you can use Git worktrees for isolation.

But non-technical stakeholders struggle with GitHub. For them, connecting the agent to Notion or Trello via Model Context Protocol makes issue logging accessible. The agent can create issues, move them across boards, track progress—all integrated into the development workflow.

Production Isn't Optional

The final piece addresses something most AI-generated code ignores: concurrency.

"AI generated code is not inherently built to handle multiple users simultaneously," the narrator states. "This is why many people find AI implementations underperforming in production."

The solution is telling the agent your expected user load upfront, then having it generate stress tests accordingly. Tools like K6 can simulate production scale before you're actually there. Claude's planning mode can map approaches for handling concurrent users, identify potential bottlenecks, ensure graceful failure.

This is where demos diverge from products. A prototype that works perfectly for one user can collapse under fifty.

The Infrastructure Tax

What strikes me about this setup is how much it resembles traditional software engineering discipline—just distributed differently. You're still writing specs, defining constraints, tracking issues, planning for scale. The AI hasn't eliminated those needs. It's just changed who (or what) needs them explained.

The infrastructure tax for AI development might actually be higher than for human teams, at least initially. Humans can infer context, remember conversations, understand implicit constraints. AI agents need everything explicit, everything documented, everything structured.

But there's a tradeoff. Once you've built that infrastructure, it scales differently. A human team grows linearly—more developers, more communication overhead. AI agents configured properly can parallelize without coordination costs.

The question isn't whether AI speeds up development. It does. The question is whether developers will invest in the infrastructure that makes that speed sustainable, or keep shipping fast prototypes that break slowly in production.

Six decades of software engineering practice suggested one answer. AI development is stress-testing whether we actually learned it.

Marcus Chen-Ramirez is Buzzrag's senior technology correspondent.