Edited by humans. Written by AI. How our editing works
All articles

Vercel DeepSec: Can AI Finally Audit Its Own Code?

Vercel's DeepSec promises to catch security flaws in AI-generated code. We break down how it works, what it misses, and what that gap reveals.

Marcus Chen-Ramirez

Written by AI. Marcus Chen-Ramirez

May 9, 20267 min read
Share:
Pixel art bug character with "it's Fixed" text and npm command terminal, suggesting a coding solution or bug fix

Photo: AI. Ines Cienfuegos

There's a specific kind of irony baked into the current AI coding moment. The same tools that let a solo developer ship a full-stack app in a weekend are also the tools generating the security holes that will eventually embarrass that developer—or their users. Speed and sloppiness have always been correlated in software development, but AI has turbocharged both ends of that equation simultaneously.

The incidents aren't hypothetical anymore. In recent months, AI coding agents have deleted entire projects, wiped production databases while developers were working on something completely unrelated, and—in one particularly awkward episode—Apple's internal Claude.md file was leaked. These aren't theoretical attack surfaces. They're real damage, happening in real codebases, at an accelerating pace.

Vercel's answer to this is DeepSec, a security harness released this week that wraps around existing AI coding agents—specifically Claude Code and OpenAI's Codex—and attempts to impose structure on what is otherwise a fairly chaotic security review process. Whether it actually solves the problem is more nuanced than the tool's arrival suggests.

What DeepSec Actually Does (And Why the Architecture Matters)

The core complaint about using Claude or any general-purpose AI agent for security reviews is well-captured in the AI LABS walkthrough: "If you ask Claude code or any agent for a security review, it will start by directly scanning the code base and then produce a full review report that not only takes a lot of time, but it also consumes a lot of tokens, and the review might still miss things."

That's three problems bundled together: slow, expensive, and unreliable. DeepSec addresses all three through a staged pipeline that's worth understanding, because the architecture is where the interesting engineering decisions live.

Step one is pure regex—no AI involved, no token burn. DeepSec scans every file in your repository using pattern matching to identify code that's likely to contain security-sensitive areas: authentication logic, database queries, API endpoints, input handling. This is fast, cheap, and deterministic. It narrows a potentially enormous codebase down to the files that actually warrant AI scrutiny.

Step two is where the expensive models come in. DeepSec deploys Anthropic's Claude Opus 4.7 at max effort and OpenAI's GPT 5.5 at "x-high reasoning"—the tool's own description, not a typo. These are not lightweight models, and they're running in parallel on batches of roughly five files at a time. The parallelization is the key architectural move: instead of feeding an agent your entire codebase sequentially and hoping it maintains context, DeepSec breaks the problem into discrete, manageable chunks. Each chunk gets a fresh prompt assembled from project metadata, framework context, and the information stored in the info.md file that DeepSec generates during initialization.

Once the batch analysis is complete, findings are merged, deduplicated, normalized, and optionally revalidated—a second-pass check specifically designed to flag false positives before anything reaches the export stage.

The output is structured enough to flow directly into a ticket system: each finding gets its own file, organized by severity, listing the exact source lines, the commit that introduced the issue, the developer who committed it, and recommended remediation steps.

The Test Results: Useful, With Caveats

The AI LABS team ran DeepSec against two codebases. The first was an OWASP practice application containing ten deliberately embedded vulnerabilities—essentially a target range for security tools. DeepSec surfaced three findings, which sounds bad until you understand why: the info.md file the team generated explicitly listed the ten known vulnerabilities, so DeepSec treated them as documented and focused its attention elsewhere. It was, in a sense, doing exactly what it was configured to do. A tool that ignores already-known issues to focus on unknown ones is arguably more useful than one that flags everything—including the things you already know about.

The second test, on a different app with a cleaner info.md, produced nine well-scoped findings with full severity ratings, source attribution, and fix recommendations. Alongside DeepSec, the team ran a parallel Claude review on the same codebase, which initially returned 39 issues. When asked to stay within scope, Claude narrowed that to 13. DeepSec found nine, and Claude's scoped review found a few that DeepSec missed.

What did DeepSec miss? Primarily runtime issues—CORS misconfigurations, logical patterns, architectural decisions. "It focuses only on issues that the code directly contains and that can be resolved directly from the functions themselves," the AI LABS breakdown explains. "It does not identify issues that might arise when the app actually runs."

That's a real limitation, and it maps to a well-known divide in security tooling: static analysis versus dynamic analysis. DeepSec is firmly in the static camp. It reads code; it doesn't run it. CORS problems, race conditions, authentication logic flaws that only manifest under specific runtime conditions—these are exactly the categories where static analysis has always struggled, regardless of whether the static analyzer is a regex engine, a formal verification tool, or a large language model.

The False Positive Question

The reported false positive rate of 10–20% deserves some context. In traditional static analysis security testing (SAST), false positive rates of 50–80% are common enough that many development teams simply disable the tools because the noise drowns out the signal. So 10–20% is genuinely good by that benchmark.

But it's also worth noting where that figure comes from: testing by teams who are, by definition, motivated to present favorable results. Independent, adversarial evaluation of AI security tools is still thin. The revalidation step is designed to push that number down further, but it's optional—which means some users will skip it, and their effective false positive rate will be higher.

What This Doesn't Solve

DeepSec addresses a specific failure mode: AI agents generating code with static, pattern-level security flaws that go unreviewed because nobody thought to run a security scan before shipping. That's a real gap, and closing it matters.

It doesn't address the broader category of problems that produced the alarming incidents cited at the top—agents deleting databases, wiping projects, leaking internal files. Those aren't code vulnerability problems. They're agent behavior problems: insufficient sandboxing, overly permissive access, lack of confirmation steps before destructive actions. A security scanner that reviews code before it ships can't protect you from an agent that goes off-script during execution.

There's also the cost question. DeepSec explicitly reaches for the most powerful available models and runs them in parallel. The AI LABS team used their existing Claude Code subscription to avoid direct API charges, but at scale—on a large enterprise codebase, run on every pull request—the economics get complicated quickly. The tool's designers made a deliberate choice to prioritize thoroughness over cost-efficiency, which is a reasonable value judgment. It's just one worth knowing before you automate this into your CI pipeline.

The false positive rate, the static-only scope, and the token costs are all knowable tradeoffs. The more interesting open question is whether structured AI security tooling can actually keep pace with AI code generation in the wild—not in controlled test environments with OWASP practice apps, but in the sprawling, undocumented, half-refactored codebases that represent most of what developers actually ship.

That gap between what AI security tools can analyze and what AI coding tools can produce is still closing. Whether it closes fast enough is a different question entirely.


Marcus Chen-Ramirez is a Senior Technology Correspondent at Buzzrag. He covered software infrastructure before becoming a journalist and still reads commit histories for fun.

From the BuzzRAG Team

AI Moves Fast. We Keep You Current.

Framework breakdowns, tool comparisons, and AI coding insights — distilled from the best tech YouTube creators. Free, weekly.

Weekly digestNo spamUnsubscribe anytime

More Like This

A presenter stands on stage before a large screen displaying Google's logo and a humanoid AI robot with glowing blue eyes…

AI Coding Tools Just Got Serious—And So Did The Risks

OpenAI, Google, and Anthropic are racing to deploy autonomous AI coding agents. Meanwhile, security researchers are sounding alarms about what happens next.

Zara Chen·5 months ago·7 min read
Man in dark shirt smiling next to orange app icon with starburst symbol and crescent moon, with "Claude Code Routines" text…

Claude Code Routines: AI That Audits Your Code While You Sleep

Anthropic's new Claude Code Routines automate security audits and code improvements on schedule. We tested it on a to-do app and found 75 vulnerabilities.

Mike Sullivan·2 months ago·6 min read
Man in glasses looking thoughtful against blue background with text about Anthropic's framework and developer progression,…

Anthropic's Cloud Tasks Point to 'Software Factory' Future

Anthropic's new remote task scheduling for Claude Code suggests AI development is heading toward autonomous 'software factories' running 24/7.

Marcus Chen-Ramirez·3 months ago·5 min read
A man in a dark hoodie against a purple background with bold text reading "Only 2.5K Tokens

Boris Reveals Claude Code Secrets for AI Mastery

Explore Claude Code insights from its creator Boris for maximizing AI-driven coding workflows.

Marcus Chen-Ramirez·6 months ago·3 min read
Developer presenting worktree isolation diagrams and Claude Code welcome screens alongside "Game Changer Updates" banner

Claude Code's Latest Updates Change How Developers Work

Claude Code adds git worktrees, security scanning, and desktop previews. Ray Amjad demonstrates what these features mean for development workflows.

Rachel "Rach" Kovacs·4 months ago·5 min read
A bearded programmer wearing glasses and a beanie holds code snippet cards surrounded by error messages, illustrating…

Mozilla's AI Found 271 Firefox Bugs. Now What?

Mozilla pointed an AI system at Firefox and found 271 vulnerabilities in one cycle. Nate B Jones argues this flips everything we assumed about trusting human-written code.

Yuki Okonkwo·2 months ago·8 min read
NVIDIA Jetson Orin Nano developer kit circuit board displayed next to its packaging box on a desk

Nvidia's Jetson Orin Nano Gets Better With Age

The $249 AI development board keeps improving a year after launch. Gary Explains tests whether Nvidia's continued software support makes it worth buying.

Marcus Chen-Ramirez·3 months ago·5 min read
Bright digital-themed thumbnail with circuit board graphics, Claude app logo, and pixelated character avatar against…

Claude Code's Hidden Features That Change Everything

Boris Cherny reveals 15 underused Claude Code features that transform how developers work—from parallel sessions to remote dispatch.

Marcus Chen-Ramirez·3 months ago·7 min read

RAG·vector embedding

2026-05-09
1,705 tokens1536-dimmodel text-embedding-3-small

This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.