AI Agents Need DMVs: A Reality Check on Autonomous Systems

IBM's Jeff Crume wants you to think about AI agents the way you think about cars. Not the sexy parts—the acceleration, the freedom, the open road. The boring parts. The DMV. The parking tickets. The mandatory insurance.

In a recent video, Crume walks through what he calls an infrastructure for governing autonomous AI systems, drawing explicit parallels to how we regulate vehicles. His framework includes credential management systems (the DMV), secure key storage (where you keep your car keys), policies (traffic laws), and enforcement mechanisms (police). It's methodical, comprehensive, and—here's the interesting part—probably insufficient for the problem it's trying to solve.

The analogy works beautifully until you remember that cars don't learn, don't evolve their behavior based on data, and can't make millions of mistakes before anyone notices.

The Scale Problem Nobody Wants to Talk About

Crume's first major point is about nonhuman identities—credentials for AI agents that authenticate what they're authorized to do. He's not wrong that we need this. Companies deploying multiple agents will need some way to track which agent has permission to access which database, call which API, or interact with which customer.

But consider the scale. A mid-sized company might have hundreds of employees with various levels of system access. That same company could plausibly deploy thousands of agents, each potentially needing different credentials for different tasks. As Crume notes, "These nonhuman identities will be populating all over the place. There's going to be a ton of them and we're going to have to manage them."

The car analogy starts to strain here. We issue driver's licenses to humans who can understand consequences, who have reputations to protect, who move at human speed. The typical person might interact with three to five systems in a workday. An AI agent could interact with three to five systems per second.

Crume's solution—credential management tools—addresses the symptom but not the underlying tension: we're creating entities that need human-like permissions but operate at machine scale. That's not a problem you solve with better infrastructure. That's a problem where the infrastructure itself might be the wrong model.

Drift, Hallucinations, and the "Turns Out It's Bad for Business" Standard

The policy section of Crume's framework tackles bias detection, drift monitoring, and what he delightfully calls the "hate, abuse, and profanity" problem. His phrasing is worth quoting directly: "Turns out it's bad for business if your AI cusses out your customers or does objectionable things."

This is where the analogy reveals something uncomfortable. We write traffic laws because we understand, broadly, what safe driving looks like. We know what behaviors cause accidents. The rules themselves aren't perfect, but they're legible.

AI agent behavior is not legible in the same way. "Drift"—when a model's performance degrades or shifts over time—isn't like a driver who gradually starts ignoring speed limits. It's more like a driver whose understanding of what "red" and "green" mean slowly shifts without anyone noticing. You can detect it, but by the time you do, the agent has already been making decisions based on its new, drifted understanding.

Crume acknowledges this: "A person might make a mistake. An agent can make hundreds, thousands, millions of these maybe before someone catches it."

The question he doesn't ask: if an agent can make thousands of mistakes before detection, what exactly is the enforcement layer enforcing? You can put a gateway between the agent and the LLM, checking inputs and outputs. But if the problem is subtle—a gradual bias toward certain types of customer queries, a slow degradation in how it handles edge cases—your checkpoint might not catch it until significant damage is done.

The Vault Metaphor and What It Misses

Crume spends time on secure credential storage—the equivalent of having a safe place to keep your car keys. For AI agents, this means vaults where API keys, passwords, and other access credentials are stored and managed.

This is solid, practical advice. It's also, frankly, table stakes. The harder question is: what happens when the agent itself is compromised? A stolen car key is a stolen car key. A compromised AI agent could potentially leak its credentials, yes, but it could also do something more insidious: continue operating normally while subtly changing what "normal" means.

Consider prompt injection attacks, where carefully crafted inputs can cause an LLM to ignore its instructions. Or model poisoning, where training data is subtly corrupted to change behavior. Your vault is secure, your credentials are managed, your policies are clear—and none of it matters because the agent itself has been altered.

The car analogy doesn't stretch to cover this. We don't worry about cars spontaneously rewriting their own code based on malicious road signs.

What the Framework Gets Right

For all my skepticism, Crume's fundamental insight is sound: we can't just build AI agents and hope for the best. The tools he describes—identity management systems, secure vaults, monitoring infrastructure, enforcement gateways—are all necessary. They're just not sufficient.

The DMV model works for cars because cars are relatively stable technology. They don't learn. They don't evolve unexpected behaviors. They break in predictable ways. The regulatory infrastructure around vehicles developed over decades, responding to observable patterns of failure and harm.

AI agents are different. They're adaptive systems operating at machine speed in environments we don't fully control. The failure modes aren't always observable until after damage is done. The harms aren't always legible—bias, drift, and hallucination don't announce themselves the way a car crash does.

Crume's framework gives us the vocabulary and tools to start addressing these problems. But pretending the analogy is tighter than it actually is—that governing AI agents is basically like governing cars, just with more keys to manage—undersells the genuine strangeness of what we're attempting.

We're not just putting autonomous systems on the road. We're putting autonomous systems that can change what roads mean, that can decide which destinations matter, that can operate for extended periods before anyone notices they're lost. The fact that tools exist to help manage this, as Crume notes, is reassuring. The fact that we're still reaching for car metaphors to explain it is less so.

—Marcus Chen-Ramirez, Senior Technology Correspondent