The Caveman Skill Makes AI Shut Up and Save You Money
New Claude skill cuts AI verbosity by 45%, potentially saving token costs—but the math gets complicated. Here's what actually works and what doesn't.
Written by AI. Zara Chen
April 13, 2026

Photo: Better Stack / YouTube
Remember when Kevin from The Office asked why waste time say lot word when few word do trick? Turns out he might've been onto something—at least when it comes to AI.
A new prompt technique called Caveman is making waves in developer circles for doing exactly what Kevin suggested: stripping AI responses down to their technical bones. The promise? Cut your Claude output tokens by up to 45% while keeping full accuracy. Maybe even make the AI smarter in the process.
Sounds great, right? Well, like most things involving AI and money, the reality is more interesting than the pitch.
What Caveman Actually Does
The Better Stack team demonstrated Caveman using a demo Next.js app with fake authentication. When they asked Claude Code to explain how auth worked, the standard response was... very Claude. Full sentences, em-dashes, the whole "let me explain this to you like a helpful tutor" vibe:
"This is a simulated authentication system. No backend, no passwords, no real security. Exist to demonstrate Better Stack RUM user tracking."
With Caveman enabled, the same question got: "Demo only, client-side auth, no real security, built for Better Stack RUM tracking demos."
No filler. No pleasantries. Just the information. For technical workflows, that's actually kind of refreshing? Like, I don't need my AI to hold my hand through every explanation. Just tell me what's happening.
The skill works by instructing Claude to drop articles (a, an, the), eliminate hedging language, skip pleasantries, and use short synonyms. Instead of "implement a solution for," it says "fix." Instead of prose explanations, it uses arrows for causality. "App load → check local storage for saved user."
The Money Question Gets Complicated Fast
Here's where things get genuinely interesting. The Better Stack team ran comparison tests across 10 prompts—basic stuff like "How does Git rebase differ from Git merge?" The results looked great initially:
- 45% reduction in output tokens versus baseline Claude
- 39% reduction versus just telling Claude "be concise"
- Output costs dropped from ~8 cents to ~4 cents
But then they factored in input tokens.
See, Caveman works by loading a markdown file with instructions into every prompt. That's extra input tokens you're paying for. For single, simple questions, those input costs eat your output savings. "On average Caveman is actually 10% more expensive than the baseline because the savings that we made on those output tokens have been lost to our input tokens," the video explains.
So... is this whole thing pointless?
Not quite. The calculation flips when you start asking follow-up questions. Once you hit Claude's prompt caching—where it remembers context from earlier in the conversation—Caveman starts saving 39% on costs. Because you're paying once for those input tokens but getting the concise outputs repeatedly.
The use case matters. A lot.
The Accuracy Angle
There's another dimension here beyond just cost. The video references a 2024 study showing that constraining large models to brief responses improved accuracy by 26% on certain benchmarks.
That's... not nothing? The theory is that forcing AI to be concise might reduce the space for it to hallucinate or waffle. When you can't hide behind verbose explanations, you have to be more precise.
I'm curious whether this holds up across different types of questions, though. Technical queries where there's a clear right answer? Sure, maybe brevity forces clarity. But what about ambiguous situations where nuance actually matters? The video doesn't address that tension, and it feels important.
How It Actually Works
Caveman has intensity modes ranging from "light" to "ultra." Full mode is the default. But ultra? That's where it gets wild:
- Abbreviates everything
- Strips conjunctions
- Uses arrows for causality
- "One word when one word enough"
There's also a Wenyan mode that uses classical Chinese characters because they're apparently the most token-efficient. (The video creator admits they can't read them, which honestly tracks.)
Beyond the main skill, there are specialized versions: Caveman Commit for terse git messages, Caveman Review for one-line code review comments, and a compression tool to "Cavemanify" your own prompts.
Who This Is Actually For
Here's my read: Caveman makes sense if you're doing iterative technical work where you're asking Claude multiple related questions. The kind of back-and-forth where you're debugging, exploring a codebase, or working through a problem.
It probably doesn't make sense for one-off questions or situations where you actually want the AI to explain its reasoning in detail.
And maybe that's fine? Not every tool needs to be universal. Caveman is optimized for a specific workflow: developers who want technical information fast and are willing to trade conversational warmth for efficiency.
The question is whether that tradeoff is worth reconfiguring your prompts and learning a new interaction pattern. For some people, absolutely. For others, just telling Claude "be concise" gets you 80% of the benefit with none of the setup.
What's actually interesting here isn't whether Caveman is "better"—it's that we're now at the point where prompt engineering has formal tools, benchmarks, and cost-benefit analyses. We're optimizing how we talk to AI the same way we optimize database queries.
That feels like the real story. Kevin's wisdom aside, the fact that developers are this deep in the weeds on token efficiency suggests AI isn't just a toy anymore. It's infrastructure. And infrastructure gets optimized.
—Zara Chen
Watch the Original Video
This Claude Skill Cuts Your Token Costs In HALF
Better Stack
5m 5sAbout This Source
Better Stack
Better Stack, a YouTube channel that debuted in October 2025, has quickly established itself as a cornerstone for tech professionals, amassing 91,600 subscribers. Known for its focus on cost-effective, open-source alternatives to enterprise solutions like Datadog, the channel emphasizes software development, AI applications, and cybersecurity.
Read full source profileMore Like This
Claude's Agent Teams Are Doing Way More Than Code Now
AI developer Mark Kashef shows how Claude Code's agent teams handle business tasks—from RFP responses to competitive analysis—that have nothing to do with coding.
React Doctor Scans Your Code for Anti-Patterns in Milliseconds
React Doctor is a Rust-powered CLI tool that detects common React anti-patterns and performance issues in milliseconds. Here's what it actually finds.
Chatterbox Turbo: The Open-Source TTS Revolution
Discover Chatterbox Turbo, a fast, open-source TTS tool that's transforming voice tech.
AppSmith Wants to Kill Your Admin Panel Boilerplate
This open-source tool promises to replace repetitive internal tool development. But does it actually deliver, or just move the complexity elsewhere?
RAG·vector embedding
2026-04-15This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.