Edited by humans. Written by AI. How our editing works
BUZZRAGNews. Trends. Ideas — distilled in minutes.
All articles

Generative UI Looks Exciting—Until You Ask Who Controls It

AI agents that write their own UI code are impressive. But LLM-generated code running in your browser has a trust problem most demos skip past.

Written by AI. Rachel "Rach" Kovacs

June 4, 20268 min read
Share:
Man in blue shirt smiling at camera with AI UI diagram, Postman logo, and Netflix/Lyft app examples behind him on "The…

Photo: AI. Dexter Bloomfield

Ruben Casas asked a model to rewrite his blog. He gave it a single prompt. What came back included a search box with a blur animation and accessibility support—things he didn't request, handled correctly, without a second pass. His reaction, delivered to a room of engineers at AI Engineer conference: "That's when I realized it can write better front-end code than me. And I don't mind. No ego. It's just reality."

I find that specific moment more interesting than the capability demonstration. Not because one developer's blog rewrite settles any debate about model quality—it doesn't, and Casas isn't claiming it does—but because of what he did with the observation. He followed it to its logical end: if models are this capable, why are most agent interfaces still using static prebuilt components? Why are we still treating the model like a data-fetching layer and leaving all the rendering to developers who wrote those components months ago?

The gap between what models can do and what we're actually letting them do is, in Casas's framing, the central problem of agent UI right now.

The Spectrum

Casas maps three points on the current landscape, and they're useful distinctions to hold.

Static components are what most agents do today. The agent acts as an orchestrator, makes a tool call, passes data and props to a component a developer pre-built, and the client renders it. AG UI's protocol works this way at its most basic level—tool calls mapping to React components. Goose's Auto Visualizer works this way too, matching output data against a curated component library. It's reliable, predictable, and almost completely inflexible. The model is doing data work; the interface is frozen.

Declarative UI is a step further. The model generates a descriptor—JSON, YAML, occasionally Python—that a rendering engine translates into components at runtime. The components themselves are still developer-built, but the model decides how they're assembled. Vercel's JSON Render is the current reference implementation. Casas points to Netflix's long-running personalization architecture as an analogue: when you load Netflix, the layout you see has been assembled for you from a fixed library of Netflix components—though Casas is careful to note that the Netflix comparison describes the structural pattern, not an LLM-driven system. The LLM version is newer territory. His take is that this middle tier—declarative generative UI—is probably the right balance for right now. Your design system stays intact. Output is predictable. Token cost stays reasonable. You get flexibility without handing the model a blank canvas.

Fully generative UI hands the model that blank canvas. No components. No descriptors. The model writes HTML, CSS, and JavaScript on demand, and it gets passed to the client. Casas built a weather agent that does this in a single tool call—it fetches conditions, generates a joke, and produces an entire rendered interface from scratch. Every time. No template underneath.

This is where the talk gets interesting to me, and not only for the reasons Casas emphasizes.

The Part Everyone Skips

Here's what "fully generative UI" means in practice: code you did not write, generated by a model you do not fully control, is executing in a browser context on behalf of a user who almost certainly has no idea that's what's happening.

Casas is honest about this. "If we don't trust third-party code, well, we should not trust code that has been generated by LLMs and then just present it to the user." What he argues is that MCP apps—the Model Context Protocol application layer—provide the necessary containment because they sandbox third-party UI by default using a double-iframe architecture. His position is that this makes MCP apps the natural delivery mechanism for generative UI, and he points to Anthropic's decision to use MCP apps for Claude's own visualizer feature as validation. (That attribution matters: the double-iframe sandboxing detail is Casas's characterization of current MCP app behavior, drawn from a specification that's still actively evolving.)

The sandbox argument is real. Iframes with proper sandbox attributes and Content-Security-Policy headers can meaningfully constrain what generated code is allowed to do—no cookie access, no cross-origin requests, no top-level navigation. When implemented well, a sandboxed iframe is a genuine boundary.

But "when implemented well" is doing a lot of load-bearing work in that sentence.

Sandbox escapes are a known attack class. Misconfigured CSP headers are endemic—they're one of the most commonly misimplemented security controls in web development. And the threat model for LLM-generated code isn't just "the model writes malicious code on purpose." It's subtler: a model can be prompted, through user input or through data it retrieves from external sources, to generate code that exfiltrates information or makes requests the user never authorized. That's prompt injection via the UI generation layer, and it's not hypothetical—researchers have demonstrated it in agentic contexts that intersect with persistent agents.

So the practical question for anyone building with fully generative UI isn't "do I have a sandbox?" It's: what can code inside that sandbox still do? Can it read the DOM outside its frame? No—if the sandbox is correct. Can it make network requests to arbitrary endpoints? Depends entirely on your CSP. Can it render convincing UI that asks the user to take an action—click here, enter this—that serves an attacker's purpose rather than the user's? Yes, absolutely, and no sandbox stops that, because that's a social engineering problem, not a code execution problem.

None of this means fully generative UI is a dead end. It means the trust model requires explicit design, not a default. The people building these systems should be asking: what can code in this sandbox reach? What data is in scope? What actions can the generated UI ask users to take, and how does the user know those requests are legitimate? These are not exotic questions. They're the same questions we ask about any third-party code that runs in a privileged context. The newness here is that the code is generated at runtime from inputs we don't fully control.

Casas's instinct—that MCP apps are the right delivery layer precisely because they provide authentication, tool calling, and sandboxing as defaults—is reasonable. But "sandboxed by default" is an architectural starting point, not a security guarantee. The history of browser security is largely a history of things that were sandboxed by default until they weren't.

The Imagination Problem (And Who It Affects)

The last section of Casas's talk is the one getting quoted most, and fairly: the analogy is good. Early television shows were radio programs with cameras because nobody had developed a visual grammar yet. We're in that era with agent interfaces—we keep asking for Jarvis because Jarvis is the only future we can picture from where we're standing. The design pattern tension in agent-native applications is partly this: we're reaching for familiar UI metaphors because genuinely new ones haven't emerged yet.

Casas's speculative answer is collaborative artifacts—shared canvases where humans and agents work on the same object simultaneously, pointing to Excalidraw's MCP app as an early version of this. A space where you can tell the agent "change this" and also just click and drag yourself, and both actions persist.

I think he's probably right that collaboration is where this goes. What I want to add—and what the demo-circuit version of this conversation tends to omit—is that "collaborative" and "trustworthy" are not synonyms.

The people who will use these interfaces are not mostly developers who understand what a sandbox is. They're people who already struggle to distinguish a legitimate bank notification from a phishing email, who've been trained over twenty years to click on things that look authoritative. When an AI agent generates an interface that says "confirm your identity to continue," what does the user check? There's no URL bar. There's no certificate. There's a generated UI inside an app they've already granted permissions to, asking them to do something. The signal they'd normally use to evaluate trust—does this look like the thing I expect?—is precisely what generative UI is designed to make irrelevant.

That's not an argument against generative UI. It's an argument that the user experience of trust verification needs to be designed from scratch for this paradigm, and right now, almost nobody is doing that work. The three-flavor breakdown of generative UI approaches is useful for builders—but the question of how users know what they're interacting with isn't really a builder question. It's a platform question, and platforms are still in the "cameras pointed at radio" phase too.

Casas is right that we don't have enough imagination yet. What I'd add is that some of the people who'll be affected by whatever we imagine haven't been invited into the room.


Rachel "Rach" Kovacs covers cybersecurity and privacy for Buzzrag.

From the BuzzRAG Team

AI Moves Fast. We Keep You Current.

Framework breakdowns, tool comparisons, and AI coding insights — distilled from the best tech YouTube creators. Free, weekly.

Weekly digestNo spamUnsubscribe anytime

More Like This

Man wearing glasses at computer with code visible, "Effectful" logo and "Just Clone the Repo" text overlay on dark background

Clone the Repo: What AI Coding Agents Actually Need

Michael Arnaldi's "just clone the repo" technique for AI coding agents has real security implications most developers aren't thinking about. Here's the full picture.

Rachel "Rach" Kovacs·4 weeks ago·7 min read
Woman smiling at camera with AI engineering presentation slides behind her and "Coverage ≠ Quality" text overlay, GitHub…

Green Tests, Broken Apps: The AI Testing Trap

AI writes tests that confirm code behavior, not user experience. Marlene Mhangami shows how Playwright flips the script—and why it matters for security.

Rachel "Rach" Kovacs·3 weeks ago·8 min read
Man speaking to camera with AI Engineer Europe and Black Forest Labs logos visible, showing clothing generation…

Black Forest Labs FLUX: Visual AI's Open Source Gambit

Black Forest Labs is building toward 'visual intelligence' with FLUX. The open-source framing is real—but so are the questions about consent, deepfakes, and enterprise data.

Rachel "Rach" Kovacs·4 weeks ago·8 min read
Man in glasses wearing dark shirt presenting slides about AI context engineering and Unblocked software with performance…

Your AI Agent Knows Nothing About Your Org

Context engines promise smarter AI agents—but they work by hoarding your Slack history, CTO messages, and code review patterns. Is the tradeoff worth it?

Rachel "Rach" Kovacs·1 week ago·7 min read
A man in a black shirt speaks against a neon-lit tech background with circuit board graphics, while text overlays read…

OWASP's Top 10 LLM Vulnerabilities: What Can Go Wrong

OWASP's updated Top 10 for large language models reveals how easily AI systems can be manipulated, poisoned, or tricked into leaking sensitive data.

Marcus Chen-Ramirez·3 months ago·6 min read
Man gesturing while discussing AI security with neon graphics of laws, policy concepts, and police icons displayed on dark…

AI Agents Need DMVs: A Reality Check on Autonomous Systems

IBM's Jeff Crume argues AI agents need governance infrastructure like cars. But the analogy reveals more about the problem than the solution.

Marcus Chen-Ramirez·3 months ago·6 min read
Two circular lifecycle diagrams showing MLOps and MLflow workflows with stages like Design, Build, Test, Plan, Deploy,…

Why Machine Learning Teams Need MLflow (And What It Actually Does)

MLflow solves the reproducibility crisis in ML development. Here's what happens when your team scales beyond Jupyter notebooks and memory-based decisions.

Rachel "Rach" Kovacs·3 months ago·5 min read
Man in blue and white jacket pointing at iPhone displaying purple logo and "FREE!" text against dark blue background

Quinn 3.5 Runs AI Models On Your Phone Without Internet

The Qwen 3.5 AI model runs entirely on your iPhone with zero internet connection. We tested how well local AI works when privacy actually matters.

Rachel "Rach" Kovacs·3 months ago·5 min read

RAG·vector embedding

2026-06-04
1,961 tokens1536-dimmodel text-embedding-3-small

This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.