This Free Plugin Makes Claude Code Actually Think

Here's a pattern I've seen since the 1980s: someone builds a tool that makes coding faster, everyone rushes to use it, and six months later we're all dealing with the technical debt from moving too fast. AI coding assistants are just the latest iteration.

Now there's a plugin called Superpowers that tries to solve this by forcing Claude Code to slow down and think. Instead of letting the AI immediately start writing code when you ask for something, it imposes a five-phase workflow: clarify, design, plan, code, verify. The promise is better code with fewer revisions. The question is whether adding bureaucracy to an AI actually helps, or just burns tokens.

The Waterfall Model, But For AI

Superpowers, created by Jesse Vincent, is an open-source plugin that fundamentally changes how Claude Code operates. Install it, and Claude can't just start coding anymore. It has to go through a structured process first.

The plugin includes 14 different "skills" that activate automatically based on what you're trying to do. There's a master orchestrator that decides which skills to invoke, then skills for brainstorming, planning, execution, testing, and debugging. Nate Herk, who's been testing the plugin for months, describes it this way: "Think of this like hiring a developer who does proper discovery before touching anything and building anything versus one who just takes your request and just starts writing code immediately."

What's interesting is the brainstorming phase. Before Claude writes any code, it asks clarifying questions—sometimes five or more—to extract details you didn't think to mention. Then it generates visual mockups in a local browser showing you different approaches. You pick one, and only then does it start planning the implementation.

This is remarkably similar to traditional software development methodologies we've been using (and arguing about) for decades. The difference is it's happening inside an AI workflow instead of between humans.

The Token Economics Question

The obvious concern: doesn't all this upfront work burn through tokens? More questions, more planning, more verification—that should cost more, right?

Herk ran a 12-session experiment to find out. Six runs with Superpowers, six without. Same prompts, same model (Claude Opus 4.6), zero human intervention. The tasks ranged from simple to complex.

The results were counterintuitive. Overall, Superpowers used 14% fewer tokens and cost 9% less. But that headline number hides important nuance.

For simple tasks, Superpowers actually used more tokens—about 8% overhead. Makes sense. If you're asking Claude to write a basic function, you don't need five clarifying questions and a formal test suite. The structure gets in the way.

For medium and complex tasks, the numbers flipped. Superpowers used fewer tokens because it prevented expensive revision cycles. As Herk puts it: "If you think about it like this when you are doing the planning phase and you know brainstorming you want it to use more tokens if it can get it right quicker because otherwise in the long run you're using more tokens if you have to do four or five revisions."

The without-Superpowers runs also showed two to three times more variance in token usage—sometimes they'd nail it efficiently, sometimes they'd spiral into revision hell. The Superpowers runs were more consistent.

Code Quality Improvements (With Caveats)

Token efficiency is one thing. Code quality is another.

Herk's experiment evaluated the generated code on correctness, code structure, test coverage, error handling, robustness, and other metrics. Superpowers beat the baseline on most of them—particularly code structure and error handling on medium-complexity tasks.

But here's what didn't improve: domain knowledge and spec compliance. As Herk notes, "That's still on the model." No amount of process can make Claude understand your business domain better or magically know requirements you haven't specified.

This tracks with what we've learned from decades of software methodology debates. Process helps with execution and consistency. It doesn't replace understanding.

There's also a sample size issue. Twelve runs across three task types is directional data, not proof. Herk acknowledges this: "12 runs across three small tasks is just directional data, not proof." The experiment suggests the plugin helps, but we're nowhere near statistical significance.

The Human-in-the-Loop Problem

Superpowers is designed for iterative collaboration. It asks questions, shows you mockups, waits for feedback. That's the whole point—getting alignment before burning tokens on the wrong solution.

But Herk's experiment removed the human from the loop entirely. He automated both versions and let them run for hours unattended. This probably skewed results in ways that are hard to measure. How many of those Superpowers questions would have steered the project differently with a human answering them? We don't know.

In real-world usage, you'd be there answering questions, reviewing mockups, course-correcting. That changes the economics. Maybe it saves even more tokens by catching misunderstandings early. Maybe it adds overhead from the back-and-forth. The experiment can't tell us.

Installation Is Trivial, Decision Is Not

Getting Superpowers running takes about 30 seconds. You install it globally from the Claude Code marketplace with one command. After that, it just works—automatically invoking the right skills based on context.

The harder question is whether you should install it.

For simple, one-off coding tasks, probably not. The 8% token overhead buys you nothing. Just let Claude code.

For medium-to-complex projects where you're building something substantial and revisions are expensive? The economics flip. The upfront structure pays for itself by preventing expensive backtracking.

There's also a workflow preference question. Some developers like the conversational, iterative flow of raw Claude Code. Others prefer structure and checkpoints. Neither is objectively better—it's about matching the tool to how you think.

What This Actually Tells Us About AI Coding

The Superpowers plugin is interesting less for what it does than for what it reveals: current AI coding assistants are really good at writing code, but not great at understanding what code to write.

We've essentially built incredibly fast touch-typists with poor reading comprehension. They'll implement whatever you ask for with impressive technical skill, but they won't naturally stop to verify they understood the assignment correctly.

Adding process structure compensates for that weakness. It forces the pause, the clarification, the verification. In some ways, we're rebuilding the guardrails that human developers internalized through experience—the instinct to ask "wait, what are we actually trying to accomplish here?" before diving into implementation.

The question is whether this is a temporary fix—compensating for current AI limitations—or a permanent pattern. As models improve, will they develop that pause instinct naturally? Or will we always need external process frameworks to keep them from optimizing toward the wrong solution very efficiently?

Herk's experiment suggests the answer depends on task complexity. Simple problems don't need process overhead. Complex ones benefit from it. That's been true of human developers for decades. Maybe it'll remain true for AI developers too.

Mike Sullivan is a technology correspondent for Buzzrag. He's been writing about software development tools since people debugged with printlines and hope.