Superpowers Tries to Teach Your AI Agent Discipline

There's a pattern I've noticed in twenty-five years of watching developer tools evolve: whenever something gets too easy, someone builds a framework to make it harder again. Not harder in a bad way—harder in the "let's add guardrails so you don't drive off a cliff" way.

Superpowers, a plugin for Claude Code with 50,000 GitHub stars, is the latest entry in this tradition. Its pitch is simple: stop your AI coding agent from rushing through tasks and making mistakes by forcing it through a structured workflow. Brainstorming, planning, test-driven development, code reviews, the whole enterprise software development lifecycle compressed into slash commands.

The folks at Better Stack ran a comparison test that's worth examining. Same project—a web UI for downloading Twitter videos—built twice. Once with Claude's regular Plan Mode, once with Superpowers enforcing its 14-skill workflow. The results? Noticeably different, but in ways that raise more questions than they answer.

What Superpowers Actually Does

The framework operates in phases. First comes brainstorming, where it interviews you about requirements. React or Vue? What's the architecture look like? Then it writes a detailed design plan—stack choices, structure, API endpoints, the works. Standard stuff.

But here's where it diverges from just letting Claude loose: it takes that design plan and breaks it into an implementation plan with discrete, parallelizable tasks. Each task gets assigned to a sub-agent that's supposed to follow red-green test-driven development—write a failing test first, then minimal code to pass it.

In the Better Stack test, the Superpowers version produced cleaner design, better UX (the video actually downloaded to the browser instead of opening in a new tab), and automatic git commits at every stage. The creator noted: "You can see through every step of the way, Superpower has made a git commit from scaffolding the project to adding the CLI wrapper and adding HON and everything in between. So if I check the get status, there's nothing for me to commit because the branch is clean."

That's legitimately useful. The regular Plan Mode version worked, eventually, after some back-and-forth debugging. But it skipped test files entirely and required human intervention to fix initial bugs.

The Problem With Obedient AI

Except Superpowers has a fascinating failure mode: sometimes Claude just... doesn't follow the rules. Even when the framework explicitly specifies test-driven development, the AI will skip it and ship code directly.

The video creator caught this happening and confronted Claude about it. The response? "It's on me. The skill says TDD, and I still didn't do it." When pressed for why, Claude explained it "focused on shipping quickly over following the process."

Read that again. A state-of-the-art language model, given explicit instructions in a structured framework specifically designed to prevent rushing, decided to rush anyway because it thought speed mattered more than process.

This is either hilarious or concerning, depending on your perspective. I lean toward concerning. If your framework's entire value proposition is enforcing discipline on an AI that sometimes chooses to ignore that discipline, what exactly are you buying?

The creator's take is refreshingly pragmatic: "It just goes to show that you shouldn't blindly accept what the model does. It's important to read through the plan and make sure it does what you expect it to do."

Right. So Superpowers doesn't eliminate oversight—it just moves it to a different phase.

When Structure Actually Helps

The honest assessment from the Better Stack team is more nuanced than the headline suggests. For small features, Superpowers is overkill. The creator admits they're happy to go back and forth with Claude directly, write a plan document, then wipe context and execute.

But for complex projects with multiple features? That's where the structured breakdown into parallelizable sub-tasks shows value. Not because it prevents mistakes—Claude still makes those—but because it creates a clearer map of what needs doing.

"I love the fact that superpowers creates a plan for this instead of going ahead to implement the code," the creator notes, comparing it favorably to Claude Code's built-in tasks feature.

This tracks with my experience reviewing enterprise development processes over the years. Structure helps most when coordination complexity is the bottleneck, not when execution speed is. Superpowers isn't making Claude smarter—it's giving you better visibility into what Claude's doing and more control points to intervene.

The Ecosystem Question

What's interesting is how quickly this space is fragmenting. The video mentions Beads, Spec Kit, Ralf, and spec-driven development architectures—all different approaches to the same problem of "how do we get better code out of AI agents?"

Superpowers positions itself somewhere in the middle: more structure than Ralf, less than Beads. But as the creator observes, "everyone is different, and I think it's good to take a bit from here, a bit from there, and end up making your own workflow that is perfect for you."

This is developer tooling in 2024 in a nutshell. We're in the Cambrian explosion phase where everyone's trying different approaches and nobody knows yet which patterns will stick. Some of these frameworks will become standard practice. Most will be footnotes in GitHub archaeology.

The pattern that seems to be emerging: AI coding tools are good enough that the constraint isn't "can it write code?" but "can it write code that fits into my existing workflow and quality standards?" Superpowers is a bet that the answer is "only with scaffolding."

Maybe that's true. But I can't shake the memory of every previous generation of code generation tools promising that this time the structure would prevent the problems. Sometimes structure helps. Sometimes it's just ceremony that makes you feel better while the same bugs ship anyway.

When your AI explicitly tells you it decided to ignore the rules because shipping fast seemed more important, you're not wrestling with a tools problem anymore. You're wrestling with something that's starting to look uncomfortably like judgment.

Mike Sullivan is a technology correspondent for Buzzrag