AI Agents Are Getting Persistent—And That Changes

Something quietly fundamental is shifting in AI development, and it's not about bigger models or flashier demos. Three recent releases—Anthropic's Conway, Z.ai's GLM-5V-Turbo, and Alibaba's Qwen 3.6 Plus—point toward the same architectural pivot: AI is moving from conversation to persistence, from answering questions to staying inside your workflow.

Let me map what's actually happening here, because the technical details matter more than the hype.

Conway: The AI That Doesn't Sleep

Anthtropic is testing something called Conway, and based on leaked internal documentation, it's not just Claude with a new coat of paint. It's a different paradigm entirely.

Instead of launching a chat session, Conway boots up what their code calls a "Conway instance"—persistent, environment-like, more operating system than chatbot. You get a sidebar interface with sections for search, chat, and system management. That last part is where things get interesting.

The system section lets you install custom tools via something called CNW ZIP files—basically Anthropic building an extension ecosystem. You can add UI tabs, define context handlers, manage connectors. The video source describes it as "Anthropic building its own extension ecosystem similar to an app store where developers can package tools specifically for this environment."

But here's the part that actually matters: Conway includes a full webhook system. External services can hit public URLs that wake up the agent. It can sit in the background, get triggered by events, and start working without you opening anything.

That's not a chatbot. That's an always-on operator.

The technical architecture here suggests Anthropic is building toward something closer to a platform than a model—where Claude becomes the substrate other tools plug into, rather than the end product you interact with. Your browser can connect directly into Conway. Your development environment becomes part of the agent loop.

While Conway represents the big architectural shift, Anthropic also shipped something smaller but genuinely useful: no-flicker mode for Claude Code. If you've spent time with terminal-based AI tools, you know the problem—constant rerendering, flickering content, performance degradation during long sessions. The new mode uses a full-screen buffer (similar to Vim or htop) that only updates visible content. CPU and memory usage stay stable even during extended multi-agent workflows.

They also added full mouse support—click to position cursor, expand outputs, open URLs, drag to select and copy. Double-click selects words. Triple-click gets the whole line. The terminal starts behaving like a GUI, which matters because friction kills adoption.

You enable it with one environment variable: CLOUD_CODE_NO_FLICKER=1. Still experimental, but according to the source, most internal users already prefer it.

GLM-5V-Turbo: When AI Needs to Actually See

Z.ai launched GLM-5V-Turbo with a different problem in mind: most AI models are either good at vision or good at code, rarely both at full strength. You get models that can describe what's in an image but struggle to turn that into useful actions, or vice versa.

GLM-5V-Turbo is built around the premise that real developer work doesn't arrive as clean text prompts. It shows up as bug screenshots, messy PDFs, broken UI layouts, screen recordings of what went wrong. The model uses CogVLT Vision Encoder to preserve fine visual detail and layout structure, plus multi-token prediction (MTP) for speed and longer outputs.

It supports a 200,000 token context window and was trained across 30+ tasks simultaneously—STEM reasoning, visual grounding, video analysis, tool use. The practical upshot: you can show it a screenshot of a bug or a rough feature mockup, and it can suggest code based on what it sees. As the source puts it, "That's a much more natural workflow because that's how people actually work. Sometimes they do not write a perfect technical explanation. They just point at the screen and say, 'This part is wrong.'"

Z.ai is positioning this for OpenClaw and Claude Code workflows—environments where the AI needs to navigate real visual interfaces, not just respond to text descriptions of them. The company claims state-of-the-art performance on benchmarks like CCBench V2, ZClaw Bench, and Claw Eval, which test multimodal coding and multi-step execution.

Qwen 3.6 Plus: When Context Actually Matters

Alibaba's Qwen 3.6 Plus tackles a different constraint: memory. Agents need room to track what happened earlier, which files matter, what tools were used, what still needs doing. Qwen 3.6 Plus ships with a 1 million token context window by default.

That's not a typo. One million tokens. You can feed it massive codebases, lengthy documentation, extended instruction chains, and it maintains thread far longer than previous generations.

Alibaba describes the architecture as a "full capability loop"—perceive, reason, act, all inside one connected workflow. It's built for repository-level engineering, meaning it can work across an entire project rather than isolated code snippets. The source notes, "Instead of just answering and stopping, it can break down a task, work through the steps, test things, refine things, and keep moving toward a usable result."

Alibaba put a preview version on OpenRouter with free access to that full million-token context, which opens the door for developers to actually test these claims. The model integrates with OpenClaw, Claude Code, and Cline—fitting into agent workflows people are already building.

On the multimodal front, it can parse dense documents, analyze real-world visuals, reason over long videos, and turn interface screenshots or hand-drawn wireframes into working front-end code.

The Pattern Underneath

Three companies, three different approaches, one clear direction: AI that persists, sees, and stays inside workflows rather than just responding to prompts.

Anthtropic is building the infrastructure layer—an environment where agents can run continuously, respond to external triggers, and plug into arbitrary tools. Z.ai is solving the visual understanding problem—bridging the gap between what's on screen and what code needs to happen. Alibaba is removing the memory constraint—giving agents enough context to actually complete repository-scale work.

None of this is theoretical. Conway is in internal testing with a webhook system and extension marketplace. GLM-5V-Turbo is shipping with OpenClaw integration. Qwen 3.6 Plus is available on OpenRouter right now with that million-token context.

The architecture of AI tools is changing faster than the public conversation about them. We're still debating whether chatbots are useful while the actual development has moved on to persistent agents that don't wait for you to ask questions.

Which raises the actual question here: if AI stops being something you talk to and becomes something that runs in the background, integrates with your tools, and takes action based on external triggers—what exactly are we building? And more pressingly, who gets to decide how it behaves when we're not watching?

—Yuki Okonkwo, AI & Machine Learning Correspondent