Anthropic's Claude Code Update: AI Agents Get

Anthropic shipped Claude Code version 2.1.92 on April 1st with two major features: Ultra Plan, which lets developers review and edit AI-generated project plans before execution, and Managed Agents, a fully-hosted infrastructure for deploying autonomous AI agents. The update addresses a fundamental problem in AI coding tools—the black box problem—while simultaneously lowering the technical barrier to deploying agents in production.

The timing matters. Dario Amodei, Anthropic's CEO, recently stated that full software engineering automation could arrive within 12 months. Whether that timeline holds depends partly on whether developers trust these tools enough to use them for high-stakes work. Trust requires transparency, which is exactly what Ultra Plan provides.

The Planning Problem

Before this update, Claude Code operated like most AI coding assistants: you gave it a task, it started executing, and you hoped for the best. The AI might decide to refactor your entire database schema when you asked it to add a single feature. You wouldn't know until it was done—or broken.

Ultra Plan changes this by separating planning from execution. When you assign a complex task, Claude Code now uses what Anthropic calls "multi-agent exploration and critique" to generate a detailed implementation plan. Multiple AI agents examine the problem from different angles, check each other's work, then present you with a unified plan in a browser interface.

You can read the plan. Edit it. Add inline comments. Approve it. Only then does execution begin.

The feature auto-creates a cloud environment, so there's no infrastructure setup required. You type your request, review the plan, and proceed. The friction between "I have an idea" and "I'm reviewing how the AI plans to implement it" has collapsed to near-zero.

What's technically interesting here is the architecture. As the video describes it: "Ultra Plan mode uses multi-agent exploration and critique to create a detailed implementation plan. So it's not just one Claude thinking through your task, it's multiple agents exploring the problem, checking each other's work, and giving you the best possible plan before a single line of code gets written."

This addresses the oversight problem that has plagued autonomous AI systems. Regulators worry about AI systems making consequential decisions without human review. Enterprise customers worry about AI tools breaking production systems. Ultra Plan threads the needle—it maintains AI autonomy while inserting a mandatory human checkpoint.

The question is whether this actually scales. Reviewing detailed implementation plans for every task could become its own bottleneck. If the goal is automation, requiring human review of every plan defeats the purpose. Anthropic's bet appears to be that this is a transitional architecture—training wheels until the models get reliable enough that the review step becomes optional.

Infrastructure as a Non-Problem

The second major feature, Managed Agents, solves a different problem: deployment complexity. Until now, running an autonomous AI agent in production required building significant infrastructure. You needed sandboxing for code execution, state management for long-running tasks, monitoring systems, error handling, the full engineering stack.

Managed Agents provides all of that as a service. Anthropic hosts it, scales it, monitors it. Developers define what the agent should do and provide the necessary tools. Everything else runs in Anthropic's infrastructure.

The architecture splits into three components: brain, hands, and session. The session maintains an append-only log of all interactions, preserving state across hours-long tasks. The harness loops between calling Claude and routing tool calls to the appropriate infrastructure. The sandbox provides secure execution for code, file operations, and external service interactions.

This modular design means each component can evolve independently. You can change what an agent does without rewriting how it reasons. That's crucial for production systems where stability matters.

Notable companies are already using this in production. The video mentions Notion, Rakuten, and Sentry as early adopters with "concrete and measurable use cases." These aren't startups experimenting with prototypes. They're established companies putting AI agents into customer-facing products.

What the video doesn't address: what happens when something breaks? Managed infrastructure means you're dependent on Anthropic's operations team and SLAs. If the service goes down, your agents stop working. That's a meaningful dependency for production systems. The convenience of managed infrastructure comes with a loss of control that some organizations won't accept.

The Leaked Features

On March 31st, Anthropic accidentally shipped source code inside Claude Code v2.1.88, exposing unreleased features. One is called Kairos—a cross-session assistant with a four-phase memory consolidation system that runs when Claude Code is idle. The phases are orient, gather, consolidate, and prune.

As described in the video: "It scans recent sessions, pulls relevant patterns, compresses them into durable memory, and removes noise, all while you're not even using the tool."

This addresses a real pain point. Current AI coding assistants have no persistent memory. Every new session starts from scratch. You re-explain your project architecture, your coding conventions, your business logic. Kairos would change that by maintaining context across sessions and days.

The privacy implications are significant. A system that automatically consolidates memories from your coding sessions is processing potentially sensitive business logic, proprietary algorithms, and confidential data. Anthropic hasn't published details on how Kairos will handle data retention, what gets stored where, or whether users can audit or delete consolidated memories.

The video also mentions "Buddy," a Tamagotchi-style AI pet for your terminal with stats like "debugging, patience, chaos, wisdom, and snark." This appears to be Anthropic experimenting with user engagement and gamification. Whether that's valuable or gimmicky depends on whether developers actually want their tools to have personalities.

Ultra Plan is the first leaked feature to ship publicly. More are coming. The gap between what's in Anthropic's internal builds and what's in production is narrowing.

What This Means for AI Development Tools

The video claims Claude Code's autonomy has doubled in three months, from 25-minute to 45-minute uninterrupted sessions. That metric—how long an AI can work independently before requiring human intervention—is one way to measure progress toward full automation.

But autonomy isn't the only variable that matters. Reliability, accuracy, and alignment with developer intent matter just as much. An AI that can work for 45 minutes without human input isn't useful if it spends that time building the wrong thing.

The more interesting question is what happens when these tools become reliable enough for high-stakes production work. If AI agents can plan, review, execute, and deploy complex systems with minimal human oversight, that fundamentally changes who can build software and what building software means.

That raises workforce questions that tech policy hasn't seriously engaged with yet. If the barrier to deploying production-grade automation drops to "describe what you want in plain English," what happens to the engineering profession? What happens to the companies that can't access or afford these tools? What happens when every competitor has access to the same AI-powered acceleration?

Amodei's 12-month timeline for full software engineering automation might be optimistic. But even if the real timeline is 24 or 36 months, that's still fast enough that policy, education, and labor markets won't adapt in time. We're not having those conversations yet because the tools still feel experimental. This update suggests they're moving past experimental faster than most people realize.

Samira Okonkwo-Barnes covers technology policy and regulation for Buzzrag.