Dark Code: When AI Writes Software Nobody

There's code running in production right now—at companies you use daily—that nobody can fully explain. Not the engineer who shipped it. Not the team that owns the service. Not the CTO. The code works. It passes tests. And no human on the payroll understands what it does, why it does it, or what happens if it stops.

The industry calls this "dark code"—behavior in production that nobody can trace end to end. It's not buggy code or technical debt. It's code that was never understood by anyone at any point because it was generated by AI, passed automated checks, and shipped. The comprehension step simply didn't happen, not because someone was careless, but because the process no longer requires it.

Nate B Jones, who covers AI strategy, frames this as bigger than a security or quality issue. "It's really an organizational capability problem. It's got regulatory exposure elements. It's got business liability elements," he argues in a recent video breaking down the dark code phenomenon. His central claim: if you're building software now, there's a fundamental shift in what competence looks like, and dark code sits at the heart of it.

Why Dark Code Multiplies

The problem has two reinforcing components. First, there's a structural element: when AI generates code, it's inherently harder to understand because you didn't write it yourself. You didn't think through the trade-offs or wrestle with the implementation details. Second, there's velocity pressure. AI enables speed, and the industry is making trillion-dollar bets on that speed. When you combine velocity with structural opacity, comprehension decouples from authorship unless you take deliberate measures to prevent it.

Even simple uses of AI coding tools create dark code. Type "please make this in Lovable" and watch it generate something functional—that's dark code. You shipped it. You probably can't explain all of it.

The Obvious Solutions Don't Work

The instinct is to treat this as a tooling problem. Make everything observable—instrument every service, measure what breaks. Jones acknowledges the value of telemetry but draws a sharp line: "That doesn't mean the same thing as comprehension. That doesn't solve your dark code problem. It just means you can measure what dark code is breaking for you in production."

Another common response: better agentic pipelines with guardrails and orchestration. Again, useful for reducing risk, but it doesn't solve comprehension. "If you're adding layers to your agent pipeline, that is also not actually solving your dark code problem. That is just adding a layer," Jones notes. Now when something breaks, you're troubleshooting both the dark code and the pipeline.

Then there's the YOLO approach—just ship it and deal with consequences later. Jones points to Factory.ai as a disciplined version of this, where extraordinary testing and evaluation might proxy for human understanding. But most organizations yoloing code into production lack that discipline. The result: distributed authorship without clear ownership. Marketing teams manage websites, PMs can vibe code to a point, then engineering takes over. Nobody owns the complete package.

The issue compounds as AI gets stronger. Better models make it easier to convince yourself everything's fine—the AI understands its own code, the AI will fix problems. But as Jones points out, it's hard to know when AI is overconfident. Even AI-native companies like Anthropic and OpenAI invest heavily in evals, telemetry, and pipeline understanding while still having engineers commit PRs, review code, and maintain comprehension. They don't assume AI is magical.

Meanwhile, layoffs across the industry are making dark code worse. Fewer engineers, more code to maintain, less time for understanding. And this isn't just an engineering problem—it's a board-level concern touching SOC 2 compliance, encryption at rest, regulatory liability.

What Actually Works

Jones proposes a three-layer approach that treats dark code as an organizational capability problem rather than a technical one.

Layer One: Spec-Driven Development

Force understanding before code exists. Not the 2010s-era overdocumentation for process sake, but enough clarity to write down what you're building. "As long as you understand what you want to build and can write it out clearly, go," Jones says. This is organizational discipline—resisting both the urge to bury yourself in artifacts and the urge to just start vibing code.

Amazon learned this the expensive way. After their December outage, they rebuilt their coding tool Kira to lead with spec-driven development, turning prompts into requirements and task lists before generating code. "When the company that learned this lesson the hardest bakes it into the product, maybe we should all learn that lesson," Jones suggests.

The spec becomes the eval. Write a clear specification, and you have the test that the agent keeps trying to pass. It's not complicated—it just requires discipline.

Layer Two: Self-Describing Systems

Make comprehension embedded in the code itself through what Jones calls "context engineering." This isn't agent self-reporting—it's restructuring your codebase so understanding is immediately legible to humans and agents.

This breaks into three components: structural context (manifests describing what each module does, its dependencies, and dependents), semantic context (rules of engagement like performance expectations, failure modes, retry semantics), and behavioral contracts. The goal is code understanding that doesn't live in people's heads but in the system itself.

Layer Three: The Comprehension Gate

Create a filter that catches what the first two layers miss—a mechanism that surfaces the questions a senior engineer would ask before approving a PR. Why did you call that dependency here? Why cache in a location other services can't read? How are you thinking about separation of concerns?

This gate serves two purposes: it makes code readable and accountable, and it feeds back into your evaluation process to improve code quality over time. The promise is improving both speed and quality simultaneously, which sounds impossible until you realize the alternative is shipping code nobody understands.

The Stakes For Different Roles

For engineering leaders, the question isn't about observability or pipeline quality—those are table stakes. The real question: "Do I have mechanisms that enable me to make the dark code that I'm producing legible so that I know where I'm driving?" Otherwise, as Jones puts it, "you really are driving with the headlights off."

Founders face both competitive advantage and existential risk. Many vibe code to market fast, listening only to the speed part of startup advice, trying to sell products built on code they don't understand. Jones argues transparency becomes a differentiator: "You can stand out so easily as a founder if you just know the code in this day and age. Make it legible, explain your trade-offs, be transparent."

For senior engineers used to reviewing code by hand, the adjustment is massive. But Jones argues you can't avoid using AI to help understand code because volume expectations won't decrease. You need "lenses on the code that help you to see more farther clearer" rather than glancing at PR reviews from Cursor or Claude and thinking, "Ah, it's probably fine."

That last instinct—the one where you trust the automated review and move on—is where the organizational capability problem lives. Dark code isn't a technical challenge waiting for better tools. It's a question of whether organizations can maintain comprehension at AI speed, or whether we're collectively comfortable with production systems nobody fully understands.

The industry is making that choice right now, one shipped PR at a time, mostly without realizing there's a choice being made.

—Dev Kapoor