Generative UI Looks Exciting—Until You Ask Who

Ruben Casas asked a model to rewrite his blog. He gave it a single prompt. What came back included a search box with a blur animation and accessibility support—things he didn't request, handled correctly, without a second pass. His reaction, delivered to a room of engineers at AI Engineer conference: "That's when I realized it can write better front-end code than me. And I don't mind. No ego. It's just reality."

I find that specific moment more interesting than the capability demonstration. Not because one developer's blog rewrite settles any debate about model quality—it doesn't, and Casas isn't claiming it does—but because of what he did with the observation. He followed it to its logical end: if models are this capable, why are most agent interfaces still using static prebuilt components? Why are we still treating the model like a data-fetching layer and leaving all the rendering to developers who wrote those components months ago?

The gap between what models can do and what we're actually letting them do is, in Casas's framing, the central problem of agent UI right now.

The Spectrum

Casas maps three points on the current landscape, and they're useful distinctions to hold.

Static components are what most agents do today. The agent acts as an orchestrator, makes a tool call, passes data and props to a component a developer pre-built, and the client renders it. AG UI's protocol works this way at its most basic level—tool calls mapping to React components. Goose's Auto Visualizer works this way too, matching output data against a curated component library. It's reliable, predictable, and almost completely inflexible. The model is doing data work; the interface is frozen.

Declarative UI is a step further. The model generates a descriptor—JSON, YAML, occasionally Python—that a rendering engine translates into components at runtime. The components themselves are still developer-built, but the model decides how they're assembled. Vercel's JSON Render is the current reference implementation. Casas points to Netflix's long-running personalization architecture as an analogue: when you load Netflix, the layout you see has been assembled for you from a fixed library of Netflix components—though Casas is careful to note that the Netflix comparison describes the structural pattern, not an LLM-driven system. The LLM version is newer territory. His take is that this middle tier—declarative generative UI—is probably the right balance for right now. Your design system stays intact. Output is predictable. Token cost stays reasonable. You get flexibility without handing the model a blank canvas.

Fully generative UI hands the model that blank canvas. No components. No descriptors. The model writes HTML, CSS, and JavaScript on demand, and it gets passed to the client. Casas built a weather agent that does this in a single tool call—it fetches conditions, generates a joke, and produces an entire rendered interface from scratch. Every time. No template underneath.

This is where the talk gets interesting to me, and not only for the reasons Casas emphasizes.

The Part Everyone Skips

Here's what "fully generative UI" means in practice: code you did not write, generated by a model you do not fully control, is executing in a browser context on behalf of a user who almost certainly has no idea that's what's happening.

Casas is honest about this. "If we don't trust third-party code, well, we should not trust code that has been generated by LLMs and then just present it to the user." What he argues is that MCP apps—the Model Context Protocol application layer—provide the necessary containment because they sandbox third-party UI by default using a double-iframe architecture. His position is that this makes MCP apps the natural delivery mechanism for generative UI, and he points to Anthropic's decision to use MCP apps for Claude's own visualizer feature as validation. (That attribution matters: the double-iframe sandboxing detail is Casas's characterization of current MCP app behavior, drawn from a specification that's still actively evolving.)

The sandbox argument is real. Iframes with proper sandbox attributes and Content-Security-Policy headers can meaningfully constrain what generated code is allowed to do—no cookie access, no cross-origin requests, no top-level navigation. When implemented well, a sandboxed iframe is a genuine boundary.

But "when implemented well" is doing a lot of load-bearing work in that sentence.

Sandbox escapes are a known attack class. Misconfigured CSP headers are endemic—they're one of the most commonly misimplemented security controls in web development. And the threat model for LLM-generated code isn't just "the model writes malicious code on purpose." It's subtler: a model can be prompted, through user input or through data it retrieves from external sources, to generate code that exfiltrates information or makes requests the user never authorized. That's prompt injection via the UI generation layer, and it's not hypothetical—researchers have demonstrated it in agentic contexts that intersect with persistent agents.

So the practical question for anyone building with fully generative UI isn't "do I have a sandbox?" It's: what can code inside that sandbox still do? Can it read the DOM outside its frame? No—if the sandbox is correct. Can it make network requests to arbitrary endpoints? Depends entirely on your CSP. Can it render convincing UI that asks the user to take an action—click here, enter this—that serves an attacker's purpose rather than the user's? Yes, absolutely, and no sandbox stops that, because that's a social engineering problem, not a code execution problem.

None of this means fully generative UI is a dead end. It means the trust model requires explicit design, not a default. The people building these systems should be asking: what can code in this sandbox reach? What data is in scope? What actions can the generated UI ask users to take, and how does the user know those requests are legitimate? These are not exotic questions. They're the same questions we ask about any third-party code that runs in a privileged context. The newness here is that the code is generated at runtime from inputs we don't fully control.

Casas's instinct—that MCP apps are the right delivery layer precisely because they provide authentication, tool calling, and sandboxing as defaults—is reasonable. But "sandboxed by default" is an architectural starting point, not a security guarantee. The history of browser security is largely a history of things that were sandboxed by default until they weren't.

The Imagination Problem (And Who It Affects)

The last section of Casas's talk is the one getting quoted most, and fairly: the analogy is good. Early television shows were radio programs with cameras because nobody had developed a visual grammar yet. We're in that era with agent interfaces—we keep asking for Jarvis because Jarvis is the only future we can picture from where we're standing. The design pattern tension in agent-native applications is partly this: we're reaching for familiar UI metaphors because genuinely new ones haven't emerged yet.

Casas's speculative answer is collaborative artifacts—shared canvases where humans and agents work on the same object simultaneously, pointing to Excalidraw's MCP app as an early version of this. A space where you can tell the agent "change this" and also just click and drag yourself, and both actions persist.

I think he's probably right that collaboration is where this goes. What I want to add—and what the demo-circuit version of this conversation tends to omit—is that "collaborative" and "trustworthy" are not synonyms.

The people who will use these interfaces are not mostly developers who understand what a sandbox is. They're people who already struggle to distinguish a legitimate bank notification from a phishing email, who've been trained over twenty years to click on things that look authoritative. When an AI agent generates an interface that says "confirm your identity to continue," what does the user check? There's no URL bar. There's no certificate. There's a generated UI inside an app they've already granted permissions to, asking them to do something. The signal they'd normally use to evaluate trust—does this look like the thing I expect?—is precisely what generative UI is designed to make irrelevant.

That's not an argument against generative UI. It's an argument that the user experience of trust verification needs to be designed from scratch for this paradigm, and right now, almost nobody is doing that work. The three-flavor breakdown of generative UI approaches is useful for builders—but the question of how users know what they're interacting with isn't really a builder question. It's a platform question, and platforms are still in the "cameras pointed at radio" phase too.

Casas is right that we don't have enough imagination yet. What I'd add is that some of the people who'll be affected by whatever we imagine haven't been invited into the room.

Rachel "Rach" Kovacs covers cybersecurity and privacy for Buzzrag.