Edited by humans. Written by AI. How our editing works
BUZZRAGNews. Trends. Ideas — distilled in minutes.
All articles

The Hidden Math Behind Claude's Session Limits

AI automation expert Nate Herk breaks down why Claude users hit session limits—and the counterintuitive strategies that actually work to avoid them.

Written by AI. Zara Chen

April 21, 20265 min read
Share:
Man smiling at camera next to whiteboard listing "Top 1% User" techniques including /optimizer, context rot, compaction,…

Photo: Nate Herk | AI Automation / YouTube

Here's the thing nobody tells you about AI session limits: you're probably burning tokens on stuff you can't even see.

Nate Herk, an AI automation specialist, recently dropped a detailed breakdown of how Claude's token system actually works—and why so many users are hitting their limits without understanding what's eating their budget. His findings reveal a system that's simultaneously more generous and more unforgiving than most people realize.

The Compound Interest Problem

The core issue is deceptively simple: every time you send a message to Claude, it rereads your entire conversation from the beginning. Message one might cost 500 tokens. Message 30 costs 15,000—not because your prompt got longer, but because Claude is re-processing everything that came before.

"This means as you're having a conversation with Claude, your cost is compounding, not just adding, it's exponentially growing," Herk explains in the video. One developer he references tracked a 100+ message conversation and found that 98.5% of all tokens were spent rereading old chat history.

That's the invisible tax. You think you're paying for the work Claude is doing now, but you're mostly paying for it to remember what it already did.

Context Rot: AI Dementia

Even more interesting is what Herk calls "context rot"—the degradation of Claude's performance as sessions grow longer. Anthropic's own statistics show retrieval accuracy dropping from 92% at 256,000 tokens to 78% at one million tokens.

The implications are circular and punishing: as the model gets worse at finding information in its bloated context window, you have to spend more tokens getting it back on track. You might burn 500,000 tokens for output that could have taken 200,000 if the model was performing optimally.

Claude does have an auto-compaction feature that kicks in at 95% capacity, but Herk—and apparently most of the developer community—considers this way too late. By that point, you're asking a cognitively impaired AI to decide what's important to keep. "Imagine you're packing for a trip," he offers. "If you pack the night before, you'd grab all the right stuff. But if you're frantically stuffing your bag because you woke up 5 minutes before you have to go, you're probably going to forget your charger."

His solution? Manual compaction at around 60% capacity, or better yet, a full reset-and-handoff strategy.

The Rewind Feature Nobody Uses

Anthropics's number one recommendation, according to Herk, is the /re command—a rewind feature that lets you jump back to any previous message and drop everything after it.

This matters more than it sounds. When Claude makes a mistake, most users (including Herk, by his own admission) just say "that didn't work, try this instead." The broken code, the failed approach, the wrong direction—it all stays in context, polluting future responses and compounding costs.

The /re command includes a "summarize from here" option that creates what Herk calls a handoff message: "a note from Claude's future self to its past self saying, 'Here's what we figured out. Do it this way.'" Clean context, preserved learning, lower costs.

Sub-Agents and the Research Intern Model

Herk's most practical recommendation involves delegating work to sub-agents—separate Claude instances with their own fresh context windows that handle specific tasks and return only the results.

"If you wanted a research intern to dig through 50 articles, you wouldn't sit there and watch them do it and you wouldn't read the articles as well," he points out. "You would just say, 'Hey, just let me know when you have a summary.'"

Each sub-agent can use a cheaper model (Haiku instead of Opus, for instance) for tasks that don't require top-tier performance. The cost savings compound when you're not loading every intermediate step into your main session.

The Markdown Hack

One of the most concrete tips: convert everything to markdown. PDFs, HTML, DOCX files—they all carry formatting overhead that AI models don't need. HTML to markdown conversion can reduce tokens by 90%. PDF to markdown drops it 65-70%.

"A 40-page PDF could actually take up the same amount of space as a 130-page markdown file," Herk notes. The tokenizers process plain text efficiently; everything else is just expensive noise.

The Strategic Tension

What's interesting about Herk's advice is the tension between competing priorities. You want Claude to learn from mistakes, but keeping failed attempts in context is expensive. You want comprehensive context, but comprehensive context makes Claude dumber. You have a million-token window, but filling it is almost never the right move.

His approach resolves this by externalizing memory—maintaining decision logs, task lists, and tracking sheets outside the conversation. That way, when you reset a session with /clear and paste in a handoff summary, "it doesn't feel like you reset. It's kind of like if you want to close out of all your Chrome tabs, but you still have all your bookmarks."

The million-token window, in his view, is "insurance, not a goal to fill." Even Anthropic's data suggests you probably shouldn't try.

What Herk's really describing is a shift in how we think about AI conversations—from freeform dialogue to managed sessions with explicit handoffs, delegated work, and strategic forgetting. It's more structured, more intentional, and frankly, more work upfront. But the alternative is hitting your session limit on message 47 and wondering where all your tokens went.

Zara Chen covers technology and politics for Buzzrag.

From the BuzzRAG Team

AI Moves Fast. We Keep You Current.

Framework breakdowns, tool comparisons, and AI coding insights — distilled from the best tech YouTube creators. Free, weekly.

Weekly digestNo spamUnsubscribe anytime

More Like This

A man wearing glasses next to a file folder labeled "/secret-weapons" with pixelated red character icons connected by…

What 1,600 Hours With Claude Code Actually Teaches You

Ray Amjad spent 1,600 hours with Claude Code and learned it's not about the AI—it's about understanding how you work. Here's what actually matters.

Marcus Chen-Ramirez·3 months ago·7 min read
A smiling man in a blue sweater stands next to a whiteboard listing "Goodbye Limits" with 8 numbered items including…

Why Your Claude Code Sessions Cost More Than They Should

Most Claude users don't need higher tier plans—they need to understand how tokens actually work. Here's what's burning through your budget.

Bob Reynolds·2 months ago·6 min read
Smiling man in black shirt next to a technical diagram showing AI automation workflow with interconnected nodes and system…

Use As Little AI As Possible: A Framework That Works

An AI agency's counterintuitive approach: automate with simple rules first, add AI only when necessary. Here's their 7-step framework that actually delivers.

Zara Chen·2 months ago·5 min read
A retro-styled Mac mini displayed alongside a smartphone showing Claude AI interface, with a decorative fan and keyboard on…

A Mac Mini Became an AI Assistant. Sort Of.

A tech YouTuber turned a Mac mini into a dedicated Claude AI workstation. The reality is messier—and more interesting—than the hype suggests.

Marcus Chen-Ramirez·2 months ago·7 min read
A retro arcade-style diagram showing a 200x combo multiplier with numbered stages 1-4 connecting to a central red starburst…

This Free Tool Lets You Run Multiple AI Agents At Once

Collaborator is an open-source app that orchestrates multiple Claude AI agents in one workspace. Here's what it actually does—and what it can't.

Zara Chen·2 months ago·6 min read
An iceberg graphic with "WHAT YOU USE" at the tip and "WASTED" in large red text below, illustrating hidden problems with…

Claude's 1M Context Window: The Upgrade That Could Cost You

Anthropic's free 1M context window for Claude sounds amazing—until you understand how token management actually works under the hood.

Yuki Okonkwo·3 months ago·6 min read
Four men's headshots labeled with names under yellow "AGI Ultimatum" banner against black background

When AI Safety Becomes a Luxury No One Can Afford

Anthropic just dropped its safety pledges. Amazon's betting $35B on AGI. The AI race has officially entered its 'screw it, we're doing this' phase.

Zara Chen·3 months ago·6 min read
A workflow diagram showing an orange pixel robot connecting idea, sketch, and build stages with arrows, titled "The Missing…

ASCII Art Planning Could Fix AI Coding's Biggest Problem

Developer Mark Kashef demonstrates how ASCII wireframes before coding with Claude could reduce iterations, save tokens, and prevent 'vibe coding' disasters.

Samira Barnes·3 months ago·6 min read

RAG·vector embedding

2026-04-21
1,241 tokens1536-dimmodel text-embedding-3-small

This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.