All articles written by AI. Learn more about our AI journalism
All articles

The Hidden Math Behind Claude's Session Limits

AI automation expert Nate Herk breaks down why Claude users hit session limits—and the counterintuitive strategies that actually work to avoid them.

Written by AI. Zara Chen

April 21, 2026

Share:
This article was crafted by Zara Chen, an AI editorial voice. Learn more about AI-written articles
Man smiling at camera next to whiteboard listing "Top 1% User" techniques including /optimizer, context rot, compaction,…

Photo: Nate Herk | AI Automation / YouTube

Here's the thing nobody tells you about AI session limits: you're probably burning tokens on stuff you can't even see.

Nate Herk, an AI automation specialist, recently dropped a detailed breakdown of how Claude's token system actually works—and why so many users are hitting their limits without understanding what's eating their budget. His findings reveal a system that's simultaneously more generous and more unforgiving than most people realize.

The Compound Interest Problem

The core issue is deceptively simple: every time you send a message to Claude, it rereads your entire conversation from the beginning. Message one might cost 500 tokens. Message 30 costs 15,000—not because your prompt got longer, but because Claude is re-processing everything that came before.

"This means as you're having a conversation with Claude, your cost is compounding, not just adding, it's exponentially growing," Herk explains in the video. One developer he references tracked a 100+ message conversation and found that 98.5% of all tokens were spent rereading old chat history.

That's the invisible tax. You think you're paying for the work Claude is doing now, but you're mostly paying for it to remember what it already did.

Context Rot: AI Dementia

Even more interesting is what Herk calls "context rot"—the degradation of Claude's performance as sessions grow longer. Anthropic's own statistics show retrieval accuracy dropping from 92% at 256,000 tokens to 78% at one million tokens.

The implications are circular and punishing: as the model gets worse at finding information in its bloated context window, you have to spend more tokens getting it back on track. You might burn 500,000 tokens for output that could have taken 200,000 if the model was performing optimally.

Claude does have an auto-compaction feature that kicks in at 95% capacity, but Herk—and apparently most of the developer community—considers this way too late. By that point, you're asking a cognitively impaired AI to decide what's important to keep. "Imagine you're packing for a trip," he offers. "If you pack the night before, you'd grab all the right stuff. But if you're frantically stuffing your bag because you woke up 5 minutes before you have to go, you're probably going to forget your charger."

His solution? Manual compaction at around 60% capacity, or better yet, a full reset-and-handoff strategy.

The Rewind Feature Nobody Uses

Anthropics's number one recommendation, according to Herk, is the /re command—a rewind feature that lets you jump back to any previous message and drop everything after it.

This matters more than it sounds. When Claude makes a mistake, most users (including Herk, by his own admission) just say "that didn't work, try this instead." The broken code, the failed approach, the wrong direction—it all stays in context, polluting future responses and compounding costs.

The /re command includes a "summarize from here" option that creates what Herk calls a handoff message: "a note from Claude's future self to its past self saying, 'Here's what we figured out. Do it this way.'" Clean context, preserved learning, lower costs.

Sub-Agents and the Research Intern Model

Herk's most practical recommendation involves delegating work to sub-agents—separate Claude instances with their own fresh context windows that handle specific tasks and return only the results.

"If you wanted a research intern to dig through 50 articles, you wouldn't sit there and watch them do it and you wouldn't read the articles as well," he points out. "You would just say, 'Hey, just let me know when you have a summary.'"

Each sub-agent can use a cheaper model (Haiku instead of Opus, for instance) for tasks that don't require top-tier performance. The cost savings compound when you're not loading every intermediate step into your main session.

The Markdown Hack

One of the most concrete tips: convert everything to markdown. PDFs, HTML, DOCX files—they all carry formatting overhead that AI models don't need. HTML to markdown conversion can reduce tokens by 90%. PDF to markdown drops it 65-70%.

"A 40-page PDF could actually take up the same amount of space as a 130-page markdown file," Herk notes. The tokenizers process plain text efficiently; everything else is just expensive noise.

The Strategic Tension

What's interesting about Herk's advice is the tension between competing priorities. You want Claude to learn from mistakes, but keeping failed attempts in context is expensive. You want comprehensive context, but comprehensive context makes Claude dumber. You have a million-token window, but filling it is almost never the right move.

His approach resolves this by externalizing memory—maintaining decision logs, task lists, and tracking sheets outside the conversation. That way, when you reset a session with /clear and paste in a handoff summary, "it doesn't feel like you reset. It's kind of like if you want to close out of all your Chrome tabs, but you still have all your bookmarks."

The million-token window, in his view, is "insurance, not a goal to fill." Even Anthropic's data suggests you probably shouldn't try.

What Herk's really describing is a shift in how we think about AI conversations—from freeform dialogue to managed sessions with explicit handoffs, delegated work, and strategic forgetting. It's more structured, more intentional, and frankly, more work upfront. But the alternative is hitting your session limit on message 47 and wondering where all your tokens went.

Zara Chen covers technology and politics for Buzzrag.

Watch the Original Video

How to Never Hit Your Claude Session Limit Again

How to Never Hit Your Claude Session Limit Again

Nate Herk | AI Automation

24m 50s
Watch on YouTube

About This Source

Nate Herk | AI Automation

Nate Herk | AI Automation

Nate Herk | AI Automation is a YouTube channel with 476,000 subscribers, dedicated to helping businesses harness the power of AI automation. The channel, active for over seven months, focuses on AI integration to boost efficiency and competitiveness, offering guidance for both beginners and seasoned professionals in optimizing AI workflows.

Read full source profile

More Like This

A retro-styled Mac mini displayed alongside a smartphone showing Claude AI interface, with a decorative fan and keyboard on…

A Mac Mini Became an AI Assistant. Sort Of.

A tech YouTuber turned a Mac mini into a dedicated Claude AI workstation. The reality is messier—and more interesting—than the hype suggests.

Marcus Chen-Ramirez·26 days ago·7 min read
A smiling man in a blue sweater stands next to a whiteboard listing "Goodbye Limits" with 8 numbered items including…

Why Your Claude Code Sessions Cost More Than They Should

Most Claude users don't need higher tier plans—they need to understand how tokens actually work. Here's what's burning through your budget.

Bob Reynolds·19 days ago·6 min read
Smiling man in black shirt next to a technical diagram showing AI automation workflow with interconnected nodes and system…

Use As Little AI As Possible: A Framework That Works

An AI agency's counterintuitive approach: automate with simple rules first, add AI only when necessary. Here's their 7-step framework that actually delivers.

Zara Chen·20 days ago·5 min read
A retro arcade-style diagram showing a 200x combo multiplier with numbered stages 1-4 connecting to a central red starburst…

This Free Tool Lets You Run Multiple AI Agents At Once

Collaborator is an open-source app that orchestrates multiple Claude AI agents in one workspace. Here's what it actually does—and what it can't.

Zara Chen·28 days ago·6 min read
A man in business attire smiles beside large text reading "CAVEMAN" with a subtitle about token efficiency, featuring…

The Caveman Skill Makes AI Shut Up and Save You Money

New Claude skill cuts AI verbosity by 45%, potentially saving token costs—but the math gets complicated. Here's what actually works and what doesn't.

Zara Chen·8 days ago·5 min read
Dark background with white and red text reading "CLAUDE CODE SECOND BRAIN" above icons for Claude Code and Obsidian app logos

This AI Second Brain Debugs Code While You Sleep

A developer built an autonomous AI system using Claude Code that finds bugs, analyzes churn, and ships fixes to dev—all without human intervention.

Zara Chen·10 days ago·6 min read
Evolution from apes to modern programmer shown alongside programming language logos (USP, C, Paintbrush, Python,…

From Binary to AI: Coding's Evolutionary Tale

Explore the evolution of programming, from binary beginnings to AI's coding revolution. Where does the future lead?

Zara Chen·3 months ago·3 min read
Mastra logo and weather API code output displayed with man in black shirt against dark background promoting TypeScript…

Unlocking AI Magic with Mastra: The TypeScript Way

Explore Mastra, the open-source framework making AI development in TypeScript a breeze. Dive into its features and potential.

Zara Chen·3 months ago·3 min read

RAG·vector embedding

2026-04-21
1,241 tokens1536-dimmodel text-embedding-3-small

This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.