The Hidden Math Behind Claude's Session Limits

Here's the thing nobody tells you about AI session limits: you're probably burning tokens on stuff you can't even see.

Nate Herk, an AI automation specialist, recently dropped a detailed breakdown of how Claude's token system actually works—and why so many users are hitting their limits without understanding what's eating their budget. His findings reveal a system that's simultaneously more generous and more unforgiving than most people realize.

The Compound Interest Problem

The core issue is deceptively simple: every time you send a message to Claude, it rereads your entire conversation from the beginning. Message one might cost 500 tokens. Message 30 costs 15,000—not because your prompt got longer, but because Claude is re-processing everything that came before.

"This means as you're having a conversation with Claude, your cost is compounding, not just adding, it's exponentially growing," Herk explains in the video. One developer he references tracked a 100+ message conversation and found that 98.5% of all tokens were spent rereading old chat history.

That's the invisible tax. You think you're paying for the work Claude is doing now, but you're mostly paying for it to remember what it already did.

Context Rot: AI Dementia

Even more interesting is what Herk calls "context rot"—the degradation of Claude's performance as sessions grow longer. Anthropic's own statistics show retrieval accuracy dropping from 92% at 256,000 tokens to 78% at one million tokens.

The implications are circular and punishing: as the model gets worse at finding information in its bloated context window, you have to spend more tokens getting it back on track. You might burn 500,000 tokens for output that could have taken 200,000 if the model was performing optimally.

Claude does have an auto-compaction feature that kicks in at 95% capacity, but Herk—and apparently most of the developer community—considers this way too late. By that point, you're asking a cognitively impaired AI to decide what's important to keep. "Imagine you're packing for a trip," he offers. "If you pack the night before, you'd grab all the right stuff. But if you're frantically stuffing your bag because you woke up 5 minutes before you have to go, you're probably going to forget your charger."

His solution? Manual compaction at around 60% capacity, or better yet, a full reset-and-handoff strategy.

The Rewind Feature Nobody Uses

Anthropics's number one recommendation, according to Herk, is the /re command—a rewind feature that lets you jump back to any previous message and drop everything after it.

This matters more than it sounds. When Claude makes a mistake, most users (including Herk, by his own admission) just say "that didn't work, try this instead." The broken code, the failed approach, the wrong direction—it all stays in context, polluting future responses and compounding costs.

The /re command includes a "summarize from here" option that creates what Herk calls a handoff message: "a note from Claude's future self to its past self saying, 'Here's what we figured out. Do it this way.'" Clean context, preserved learning, lower costs.

Sub-Agents and the Research Intern Model

Herk's most practical recommendation involves delegating work to sub-agents—separate Claude instances with their own fresh context windows that handle specific tasks and return only the results.

"If you wanted a research intern to dig through 50 articles, you wouldn't sit there and watch them do it and you wouldn't read the articles as well," he points out. "You would just say, 'Hey, just let me know when you have a summary.'"

Each sub-agent can use a cheaper model (Haiku instead of Opus, for instance) for tasks that don't require top-tier performance. The cost savings compound when you're not loading every intermediate step into your main session.

The Markdown Hack

One of the most concrete tips: convert everything to markdown. PDFs, HTML, DOCX files—they all carry formatting overhead that AI models don't need. HTML to markdown conversion can reduce tokens by 90%. PDF to markdown drops it 65-70%.

"A 40-page PDF could actually take up the same amount of space as a 130-page markdown file," Herk notes. The tokenizers process plain text efficiently; everything else is just expensive noise.

The Strategic Tension

What's interesting about Herk's advice is the tension between competing priorities. You want Claude to learn from mistakes, but keeping failed attempts in context is expensive. You want comprehensive context, but comprehensive context makes Claude dumber. You have a million-token window, but filling it is almost never the right move.

His approach resolves this by externalizing memory—maintaining decision logs, task lists, and tracking sheets outside the conversation. That way, when you reset a session with /clear and paste in a handoff summary, "it doesn't feel like you reset. It's kind of like if you want to close out of all your Chrome tabs, but you still have all your bookmarks."

The million-token window, in his view, is "insurance, not a goal to fill." Even Anthropic's data suggests you probably shouldn't try.

What Herk's really describing is a shift in how we think about AI conversations—from freeform dialogue to managed sessions with explicit handoffs, delegated work, and strategic forgetting. It's more structured, more intentional, and frankly, more work upfront. But the alternative is hitting your session limit on message 47 and wondering where all your tokens went.

Zara Chen covers technology and politics for Buzzrag.