Auditing Vibe-Coded Apps Without Reading Them

Someone hands you 100,000 lines of AI-generated code. It runs on their machine. They want to ship it to real users by Friday. What do you do?

The Brainqub3 channel — a YouTube presence for an AI consultancy of the same name — has a video making the rounds with a direct answer to that question: stop trying to read the code, and make the agent tell you what it actually does instead. The argument is cleaner than it sounds, and it draws on legitimate computer science to get there. But Brainqub3 is also selling something, and that context belongs in the room.

The theory is real

The presenter anchors the framework in two pieces of foundational theory, and the sourcing here is specific enough to take seriously.

The first is Rice's theorem. H. G. Rice published "Classes of Recursively Enumerable Sets and Their Decision Problems" in the Transactions of the American Mathematical Society in 1953 — the video links to the DOI, which checks out. The practical upshot: there is no general procedure that can examine an arbitrary program and determine whether it behaves as intended. A complete proof of software correctness isn't just expensive. It's provably impossible in principle for arbitrary systems.

The second is Dijkstra. The video quotes him as saying, "Program testing can be used to show the presence of bugs, but never to show their absence." That line is widely attributed to his 1970 paper "Notes on Structured Programming" (EWD249), though the exact wording has varied across citations over the decades — readers who want to verify the precise phrasing should consult the EWD249 source text directly. The underlying point, however, is uncontested: testing raises confidence, never certainty.

The video rounds this out by citing a 2015 paper in Communications of the ACM — Newcombe, Rath, Zhang, Munteanu, Brooker, and Deardeuff, "How Amazon Web Services Uses Formal Methods" — describing how Amazon's engineers verify services like S3 and DynamoDB. The paper is real and publicly available. Its punchline for this argument: even AWS, with essentially unlimited engineering resources, doesn't try to verify whole services. They model the parts that matter most, layer on static analysis and fault injection, and accept that they will still find subtle bugs. If that's the ceiling for organizations with Amazon's budget, the argument goes, a startup with a vibe-coded prototype should recalibrate its ambitions accordingly.

The logical move Brainqub3 draws from this is defensible: since certainty is off the table, stop chasing it. Aim for coverage of the behaviors that matter in production.

Four principles, one that actually does the work

The video translates this into a four-principle audit framework, and here I'd push back on how evenly weighted they're presented.

Starting from the live repository rather than documentation is table stakes — "if it even exists," the presenter notes, correctly — and any experienced engineer does this reflexively. Tracing representative end-to-end flows to see real coupling rather than the intended architecture diagram is similarly standard inheritance practice.

Reading existing tests as documentation of intended behavior is more interesting, partly because it acknowledges a gap the other principles skip over: tests written by an AI agent during vibe coding may not document what the system should do so much as what the agent assumed it should do. The distinction matters. An agent-generated test suite can achieve high coverage of agent-generated logic without surfacing whether the logic itself matches any human's intent. That's not a dealbreaker, but it means the presenter's caveat — "you read them as documentation of intended behavior... you don't take them on trust" — is doing more work than the framing suggests.

The fourth principle — separating what you observe from what you're inferring from what you don't know — is where the actual analytical value lives. The presenter describes it as labeling your confidence, producing "an honest, actionable picture." That's not repackaged common sense. That's a specific epistemic discipline that most people under deadline pressure skip entirely, and it's the one that converts a code audit from a confidence performance into something that might actually catch production failures.

Now, about the business model

The video promotes an open-source tool called architecture-as-is, available on GitHub, that automates the four-principle audit using static analysis, existing tests, and live code execution. The presenter built it and open-sourced it. That's the charitable read, and it's probably the accurate one — the tool appears to be publicly accessible on GitHub, though readers should verify active maintenance status before building it into a production workflow.

But Brainqub3 is also an AI consultancy. The presenter's framing — "on a recent client audit" — is doing commercial work in addition to pedagogical work. The free tool demonstrates the methodology; the methodology creates demand for consultants who can apply it; Brainqub3 provides those consultants. That's a coherent business model and not a scandal, but it's the kind of loop that financial coverage should name rather than leave implicit.

The video also promotes a paid Claude Skills course and a free Claude Skills Playbook, positioned as the solution to "holding a standard across a team" as organizations scale up AI-assisted development. Again: not inherently problematic. But the framing that vibe-coding creates a verification problem, and that Brainqub3's tooling and consulting solves that problem, is advocacy dressed as instruction. The computer science backing the argument is genuine. The boundary between "here's how to think about this" and "here's why you need us" is fuzzier than the video presents.

Who absorbs the failure

Here's the question the video doesn't ask, because it's not the video's job to ask it: when a non-technical founder ships a 100,000-line AI-generated application using a confidence-calibration framework they don't fully understand, and something goes wrong — a data breach, a billing error, a vulnerability — who pays?

Not Brainqub3. Probably not the founder personally, if they've structured their entity correctly. The presenter acknowledges this directly: "You're responsible for any codebase you move into production. If something goes wrong, it's your fault. If there's a data breach, you'll get a call." But "getting a call" and absorbing the cost are different things. Users whose data is exposed absorb that cost. Investors who funded a prototype-to-production timeline absorb it. In some cases, the employees of the company that gets breached absorb it.

The Brainqub3 framework genuinely reduces the risk of a bad outcome. Coverage-based verification is demonstrably better than nothing, which is what most vibe-coded production deployments currently have. But risk reduction isn't risk elimination, and the gap between "I ran the audit tool and got a confidence score" and "this system is safe for real users" is exactly where liability lives. The framework tells you where to look closely. It cannot tell you what you'll find when you do, or whether you'll understand what you find.

The AI development ecosystem has a structural incentive to keep that gap obscure. Tooling vendors profit when non-technical founders believe they can close it with the right tool. Consultancies profit when those founders discover they can't. No single actor in that chain is behaving dishonestly. But the aggregate effect is an industry that keeps selling confidence to people who are buying certainty — and externalizing the cost of the difference onto users who never got a vote.

The architecture-as-is tool points an agent at your repo and produces an honest map of what you know and what you don't. That's genuinely useful. What it can't do is make the unknown safe. Someone has to hold that line, and right now the market hasn't decided who that someone is or what they get paid to do it.

Jin Seo covers business, finance, and economic policy for BuzzRAG.