Warp's Oz Platform: When Engineers Eat Their Own

There's a particular kind of software that only gets good when the people building it have to use it every day. Warp's new Oz platform—a way to run AI agents in the cloud—started as infrastructure. Then it became their engineers' daily workflow. The gap between those two states tells you something important about how agent platforms actually need to work.

The team at Warp spent an hour walking through how they built Oz and, more interestingly, how they've deployed it internally across engineering, growth, and recruiting. What emerges isn't just another "we built a thing" story. It's a view into what happens when you can't hide from your own abstractions.

The Hello World That Actually Worked

Oz is technically simple to describe: it's an agent that runs in the cloud. You can trigger it from a REST API, a CLI, an SDK, GitHub Actions, Slack—basically anywhere. All runs are auditable through a web interface. You can configure it with different models, different skills, different levels of autonomy.

But the team's first real test wasn't a technical demo. It was a Slack bot they called Warpie.

"We were like, okay, what is going to be the hello world here?" Ian, who led the Oz team, explained. "We want to be able to either tag an issue on Linear or at a Warp bot of some sort on Slack and have it kick off an agent in the cloud and start working on a task and come back with a solution."

They launched it with a mandate: everyone had to start their tasks in Slack. One channel. All engineers. Publicly tagging @warpie to handle their work.

Lily, an engineer on the Oz team, described what happened: "It felt like this big collective moment... being able to see the ways in which other people interacted with it was very fascinating. We're all trying to figure out how to use the bot, how to prompt agents generally."

People started noticing patterns. How Ian prompted the agent versus how Aloke did. What information different engineers considered essential. Sometimes Ian would tag Warpie and call it "a silly agent" when it got something wrong. The entire log was public, auditable, a shared learning space.

This wasn't just dogfooding. It was something closer to collective sense-making about a new interface.

The Feedback Channel Problem

Warp tracks user feedback in Slack channels—a setup that works because they're building a developer tool and can use it to build itself. Before Oz, the workflow was predictable: feedback appears, someone converts it to a Linear issue, it goes on a backlog, eventually it gets triaged and assigned.

With Oz, that changed. "It takes 30 seconds," Ian said. "If there is a feedback thread that comes up in Slack, it used to be like, 'Okay, let's file an issue and then maybe there's like a triage meeting where we find an owner.' But now it's like just fire off an agent, see how far the agent can get."

The team described a tiered approach. First level: can Oz just fix it? If so, it does, submits a PR, done. Second level: if confidence is lower, Oz investigates and posts theories about causes and potential fixes. Even if it gets the issue only 70-80% of the way there, an engineer with a research summary and a partial solution is in a vastly different position than one starting from scratch.

"That investigative step is work in and of itself that is very automatable right now," Ian noted.

This raises a question that the team acknowledged but didn't fully answer: do you even need a backlog anymore? For a certain class of issues—the ones agents can handle or meaningfully advance—the traditional issue tracking workflow starts to look like overhead. For larger, more complex work, the backlog still matters. But the boundary is shifting, and it's not clear where it'll settle.

Planning as the Real Work

One pattern kept surfacing in the conversation: the value isn't in the agent's execution—it's in the planning step.

"I think we've all found, at least I found, that majority of my time working with the agent is trying to construct a really good plan," Lily said. "And then once you do all the work of building this beautiful plan, I don't really need to be around when the agent is executing it."

This is where Oz as a platform starts to make sense. If the hard part is building the plan—defining what needs to happen, what success looks like, how to verify the work—then you need interactive tools for that phase. But once the plan exists, running it doesn't need to happen on your machine. It can run in the cloud, with all the tools it needs, while you move on to something else.

The handoff matters too. "These models are still getting better. They're not perfect," Ian acknowledged. "The seamless ability to be like, 'Okay, I have most of this problem figured out or the agent figured out most of this problem. How can I quickly get this onto my local machine so I can get it across the finish line in like one click?' That's something we really wanted to build into the product."

This is a more honest framing than you usually get with agent platforms. The agents don't do everything. But they can do enough that having a PR that's 90% correct is materially better than starting from zero.

The PR Review Bot That Got Better

Lily built a GitHub review bot using Oz. When it first launched, the feedback channel filled with complaints: long blocks of incorrectly formatted text, unhelpful suggestions, sometimes just wrong.

Two things changed. First, the models improved—Opus 4.5 made a noticeable difference. Second, the team iterated heavily on the prompt, particularly around explaining PR structure and desired output format.

One of the engineers—the team has multiple Bens, they call them "Benjeneers"—took an interesting approach. He had an agent scrape recent PR review logs and identify what was going wrong. Then he revised the prompt based on that analysis. An agent analyzing agent output to improve agent prompts. The meta-loop is real.

What This Reveals

Warp's Oz story is useful because it's not aspirational. They're not claiming agents will replace engineers or eliminate backlogs or revolutionize software development. They're describing what actually happened when they gave themselves a flexible agent platform and started using it for real work.

Some issues get fully automated. Some get researched and summarized. Some get a plan that makes the remaining human work much faster. The GitHub bot got better through model improvements and prompt iteration—both things that will keep improving, but neither magical.

The platform approach—REST API, SDK, CLI, GitHub Actions, Slack integrations—matters because different workflows need different entry points. Warpie in Slack worked for quick issue triage. GitHub Actions worked for PR reviews. The CEO built a custom TUI for issue management over a holiday break. All of these use the same underlying infrastructure.

Aloke, Warp's founding engineer, described it as "agents as infrastructure." That framing cuts through a lot of hype. Infrastructure is boring, essential, something you build on. It doesn't promise to change everything—it promises to be there when you need it.

The open question is whether this model generalizes. Warp is building a developer tool, with engineers who can program agents and iterate on prompts. They have the technical sophistication to work at this level of abstraction. For teams without that capacity, the gap between "flexible agent platform" and "thing I can actually use" might be larger than it appears.

But watching a team eat their own agent food—and genuinely rely on it—is more convincing than any demo.

—Dev Kapoor