Power Users Are Breaking OpenClaw in Interesting

Matthew Berman has spent 200 hours and "billions of tokens" perfecting his OpenClaw setup. That's either dedication or obsession, depending on your perspective. Either way, it produces useful data about what happens when you push an AI agent system past its default configuration.

His latest video is a technical deep-dive into OpenClaw optimization—the kind of granular workflow engineering that most users won't touch but reveals something about where this technology actually lives versus where the hype suggests it should. The gap between those two places is instructive.

The Memory Problem Nobody Talks About

Berman's first major unlock sounds almost embarrassingly simple: use threaded conversations instead of one long chat window. He creates separate Telegram groups for different topics—CRM, knowledge base, cron updates—each with its own context window.

"I always wondered why people were struggling with OpenClaw's memory, and OpenClaw would just forget things all the time and I literally never had that problem once," Berman explains. "Well, this is the reason."

The problem with single-thread conversations is context pollution. When you interleave multiple topics in one chat, the entire history loads into the context window. The model has to sort through discussions about your CRM while you're asking it to debug code. It's like trying to remember where you left your keys while someone reads you their grocery list.

Threading solves this by creating topic-specific context windows. Each conversation loads only relevant history. It's a workaround, not a solution—the underlying issue is that these systems still can't reliably maintain coherent long-term memory across contexts. But it's a workaround that actually works.

What's interesting is that this isn't an OpenClaw problem specifically. It's an attention mechanism problem. Every transformer-based model deals with this. Berman just found a structural solution that works with, rather than against, the architecture's limitations.

Model Routing Gets Complicated Fast

Berman doesn't use one model for everything. He uses Opus 4.6 for main chat orchestration, Sonnet for routine tasks, GPT-5.4 for fallback, Gemini for video processing, Grok for search. The list goes on.

"You should not be using a single model for everything you're doing with OpenClaw," he says. "You should be using a wide spectrum of models."

This makes sense in theory. Different models have different strengths. Use the expensive frontier model for complex planning, the cheap fast model for simple queries. Classic optimization.

In practice, it gets messy. Each model responds differently to prompt structure. Opus doesn't like ALL CAPS or negative instructions. GPT-5.4 prefers both. So Berman maintains separate prompt files for each model, with a scheduled job that runs nightly to keep them in sync while respecting model-specific preferences.

This is where the engineering becomes almost comically intricate. He's built a system that downloads best practices documents from each model provider, uses OpenClaw to optimize prompts for each model based on those documents, stores them in separate directories, and runs a cron job every night to check for drift.

It works. It's also the kind of system that requires constant maintenance and breaks in interesting ways. The question isn't whether this is worth doing—for Berman's use case, clearly it is. The question is what it tells us about where AI agents actually are versus where we pretend they are.

Delegation Sounds Simple Until It Isn't

Berman's core principle for sub-agents is straightforward: delegate early and often. Anything that takes more than 10 seconds goes to a sub-agent so the main agent doesn't block.

Coding work goes to Cursor Agent CLI. Multi-step tasks get delegated. File operations beyond simple reads get delegated. The main agent orchestrates; sub-agents execute.

"A very frustrating thing I come across is if I give a task or a question to my main agent and it just sits there thinking and everything else I want to do at the same time becomes blocked by that," Berman notes. "And so by delegating to a sub agent, you are unblocking your main agent."

This is sensible architecture. It's also revealing about the current state of agent systems. In a truly autonomous system, you wouldn't need to manually configure delegation rules. The system would understand its own bottlenecks and route around them.

Instead, we have users building elaborate decision trees: if task type equals X and estimated duration exceeds Y seconds, delegate to sub-agent Z using model A unless it's a fallback scenario in which case use model B. It works, but it's the opposite of autonomous.

Cron Jobs at 3 AM

Berman runs scheduled tasks throughout the night—sponsor inbox refresh, documentation drift checks, prompt quality review, config consistency verification. He staggers them every five minutes to avoid hitting API quota limits during the rolling window period.

This is optimization taken to its logical extreme. It's also a sign that these systems are still fundamentally constrained by infrastructure designed for human-paced interaction. The solution isn't better AI—it's better scheduling around rate limits.

There's something almost poetic about an AI agent system that requires you to schedule its housekeeping tasks for 3 AM so it doesn't interfere with your daytime usage or blow through your API quota. It's the difference between what "AI agent" sounds like and what it actually means in practice.

What This Actually Tells Us

Berman's setup is impressive. It's also exhausting just to hear about. And that exhaustion is data.

When someone needs to spend 200 hours optimizing an AI agent to work reliably, we're not in the autonomous intelligence era. We're in the advanced automation era. The difference matters.

Advanced automation is valuable. It saves time, scales operations, handles complexity that humans can't track. But it requires constant tuning, breaks in predictable ways, and needs someone who understands the system deeply to keep it running.

Berman has built something genuinely useful. He's also built something that couldn't exist without his specific technical knowledge and willingness to maintain it. That's not a criticism of his work—it's an observation about where the technology actually is.

The hype suggests we're approaching systems that manage themselves. The reality is we're building systems that require increasingly sophisticated human oversight to function reliably. Both things can be progress, but they're different kinds of progress.

Berman's willingness to document the unglamorous parts—the cron jobs, the prompt file management, the quota juggling—is more useful than another breathless demo of an agent doing something impressive once. This is what it looks like when someone actually uses these tools for real work over extended time.

The question isn't whether his optimizations work. Clearly they do. The question is what it means that they're necessary.

—Marcus Chen-Ramirez