Chinese AI Agents GLM 5.2, Kimi K2.7, N2: What to

Before you pipe your business content through a free API that expires in two weeks, let me ask you something: do you know where that data goes?

That's the question sitting underneath Julian Goldie's recent demo of three Chinese AI agents — Next AGI's N2, GLM 5.2, and Kimi K2.7 — and it's the question the demo doesn't answer. That's not a knock on Goldie; he's an SEO and automation practitioner, not a privacy auditor. But if you're a reader who came here because you want to actually understand what adopting these tools costs you — not just in dollars, but in data exposure and workflow risk — then we need to go further than the demo reel.

So let's do that. Because the capabilities on display are genuinely worth your attention, and the security questions are genuinely worth your caution.

What Goldie Actually Showed

The demo covers three models routed through an "agent operating system" — a structured workflow layer using tools like Hermes and OpenRouter that lets multiple AI agents coordinate, critique each other's outputs, and build projects end-to-end. Games, websites, apps, fully produced videos. The outputs shown are more polished than you'd expect if you still have a 2023-era mental model of what open-weight Chinese models can produce.

Goldie's framing: "These are probably the best versions of Chinese AI that I've ever seen." He recommends GLM 5.2 as his overall pick and N2 as the best free option, with Kimi K2.7 worth watching as it moves toward wider availability.

The rapid capability gains from Chinese labs have been accelerating for months, and this demo fits that pattern. The performance case isn't hard to make. The privacy and operational case is more complicated.

The "Free" API Question

N2 from Next AGI is currently available at no cost via OpenRouter — Goldie describes it as free "for the next two weeks, something like that." He also mentions a context window somewhere in the 200K+ range, though the exact figure varies in the transcript. Before you build a workflow dependency on it, check Next AGI's current OpenRouter listing directly; context window specs for newly released models get revised frequently, and demo videos capture a snapshot that may already be outdated by the time you're reading this.

But the more important question isn't the context window. It's the data handling terms.

When you route requests through OpenRouter, your prompts — and everything in them — transit a third-party aggregation layer before reaching the underlying model. OpenRouter has its own privacy policy and data retention practices, and so does Next AGI. Those are two separate documents you'd want to read before sending anything sensitive. "Free API" in this context means Next AGI is absorbing inference costs, presumably to build adoption. That's a reasonable business strategy. It also means you should be clear-eyed about what they may be doing with request logs, and for how long.

If you're using these agents for low-stakes personal projects, your risk surface is low. If you're routing client data, internal business logic, proprietary code, or anything commercially sensitive through them — even as part of a multi-agent pipeline — you're making a data-sharing decision that deserves more than a two-minute setup tutorial.

The Judge Agent Is Interesting. Its Data Footprint Is Too.

One of the more technically substantive things Goldie demonstrates is the "judge agent" — a separate model that reviews outputs from other agents and drives iterative improvement. "The judge can look at the work, look at what's been created, and then actually give feedback and tell the team to iterate until it's completed and the job is actually done properly," he explains.

From a quality-control standpoint, this is a smart architecture. From a privacy standpoint, it raises a question most workflow demos skip entirely: what gets stored across those iteration loops?

Every time the judge reviews an output and sends it back for revision, that's another API call. Depending on how the orchestration layer handles logging — and Goldie describes an Obsidian-based memory system that persists context across sessions — your iteration history, your intermediate drafts, and the content of your judge's critiques may all be accumulating somewhere. "Memory" in an agent workflow isn't a metaphor; it's actual stored data. Before you rely on any persistent memory feature, you want to know who holds that storage, what their retention policy is, and whether you can delete it.

The swarm architecture that makes these multi-agent systems powerful is exactly what makes their data footprint harder to trace. More agents, more API calls, more potential log entries — across multiple providers if you're using the fusion panel approach Goldie describes.

The Fusion Panel and That Benchmark Claim

Goldie walks through a setup where N2, GLM 5.2, and Kimi K2.7 run in parallel via OpenRouter's fusion API, with a judge model synthesizing the outputs. He claims this approach outperforms "Fable's results" on something called the "Draco score." To his credit, he says explicitly: "Don't believe me, don't believe that benchmark, test it yourself."

I'd extend that skepticism a step further: "Fable" and the "Draco score" don't correspond to widely-recognized public benchmarks in AI evaluation literature. They may be proprietary metrics, internal leaderboards, or evaluation frameworks specific to Goldie's community. That doesn't make them meaningless — internal benchmarks can be useful for comparing models on your specific task distribution — but it does mean you can't compare these claims against external standards without more context. If the benchmark methodology isn't public, the performance claim is interesting but not independently verifiable.

What is verifiable: the fusion approach itself — combining multiple models and having a judge synthesize outputs — has legitimate theoretical backing. Ensemble methods have improved performance in ML for decades. Whether this specific implementation delivers on that promise depends on the task and the models, and that's exactly what Goldie is inviting you to test.

The Open-Source Question

Goldie describes both GLM 5.2 and Kimi K2.7 as "open-source projects." This is where I'd pump the brakes slightly, and it matters to my readers more than it might to a general tech audience.

"Open-source" and "open-weights" are not synonymous. Open-weights means the model parameters are publicly released — you can download and run them. Open-source means the training code, data, and methodology are also available for inspection and modification. Many Chinese model releases, including earlier versions of GLM and Kimi, have released weights without releasing full training pipelines. The distinction matters if you're doing security or compliance evaluation: open-weights lets you run the model locally, which changes your data flow entirely, but it doesn't necessarily let you audit what the model was trained to do.

Verify the current licensing status for GLM 5.2 and Kimi K2.7 on their respective Hugging Face or GitHub repositories before assuming you know what "open" means in this context. This is a fast-moving space and the terms shift between versions.

What This Actually Costs to Run Right

The capability story here is real. Goldie demonstrates 3D games, full websites, produced video content — all generated through coordinated agent workflows. The performance trajectory of these models has moved fast enough that skepticism about Chinese open-weight AI needs to be updated.

But the orchestration layer Goldie advocates — Hermes, OpenRouter fusion, Obsidian memory, Claude Code integration — is not trivial to set up securely. Each component in that stack is a potential point of data exposure, a potential point of vendor dependency, and a potential point of failure when any one service changes its terms or goes paid.

That last point applies directly to N2. If you build a workflow that depends on a free API with a two-week promotional window, you're not building a workflow. You're building a demo. The moment Next AGI flips the pricing switch — which is their prerogative and likely their plan — you're either paying whatever rate they set or rebuilding. That's not a reason to avoid N2. It's a reason to test it without depending on it.

The agents are impressive. The orchestration is genuinely useful. Just know what you're signing up for — technically, contractually, and in terms of where your data lands — before you automate anything you can't afford to have logged.

Rachel "Rach" Kovacs is Buzzrag's cybersecurity and privacy correspondent.