Claude's 1M Context Window Breaks at 40% Capacity

Anthropic's Claude Code shipped with a million-token context window, and developers celebrated having five times more space than the previous 200k limit. Turns out the celebration was premature. The expanded window doesn't solve the problems people thought it would—it makes some of them worse.

Tariq, the engineer working on Claude Code at Anthropic, wrote an article explaining what's actually happening. The degradation doesn't wait until you hit capacity. It starts around 300,000 to 400,000 tokens—roughly 40% of the total window. This isn't a bug. It's a fundamental characteristic of how these models handle information at scale.

The Double-Edged Sword

A larger context window sounds like pure upside: more room for documents, longer conversations, sustained work on complex projects without hitting limits. But there's a cost that doesn't show up in the marketing materials. More context means more things competing for the model's attention. Claude Code has to sort through everything in that window every time it responds, and when the window bloats, performance degrades in four specific ways.

Context pollution happens when irrelevant information accumulates and interferes with current reasoning. Goal drift occurs when the agent loses track of original objectives because they're buried under subsequent exchanges. Memory corruption means the model's internal state becomes inconsistent—it references outdated information or contradicts earlier decisions. Decision inaccuracy shows up as inconsistent choices in similar situations.

These aren't edge cases. They're the default behavior once you cross that 300-400k threshold. The video demonstrates how this plays out in real sessions: Claude hallucinates more frequently, forgets instructions given earlier, requires constant reminders of constraints already specified. Developers working on long-running tasks have noticed their sessions feeling progressively worse, even though the underlying model capabilities haven't changed.

Why Compaction Usually Makes It Worse

The reflexive fix is compaction—summarizing the existing context to free up space. Claude Code does this automatically when you hit the limit, but that's exactly when you shouldn't let it happen.

"Claude is actually least reliable during compaction," the video explains. "At that point, Claude's focus is purely on summarization, and it is stripped of supporting context like the system prompt and other elements that normally make it more capable."

Auto-compaction triggers mid-task, which means the model has to make judgment calls about what's important without full context about where you're headed. It defaults to preserving recent activity due to recency bias, which means older but still crucial information gets dropped. A warning encountered early in a debugging session? Gone. An important constraint you specified before the current focus? Treated as noise.

The summarization is lossy by design. Tool call history doesn't fully survive. Specific details that seem minor to Claude but matter to you disappear. If something was implemented incorrectly earlier, the model might lose awareness of that mistake entirely—it only has access to a transcript-level summary, not the actual project state.

Better approach: trigger compaction manually around 300-400k tokens, before context rot accelerates. And always provide explicit instructions about what to preserve—which decisions matter, which constraints are still active, which discovered issues need to carry forward. Claude responds more carefully when you tell it what to prioritize instead of letting it guess.

The Clear Alternative

Sometimes you don't want a summary at all. The clear command removes all context and starts fresh. Nothing carries forward. This makes sense when switching to unrelated tasks—if you just finished generating test cases and now need to debug something else, you probably don't want test generation details interfering with debugging logic.

The hybrid approach combines both: use a structured JSON format to capture exactly what you want to preserve (current task, state, constraints, discovered issues), save it to a file, then clear the context window. Start the new session by instructing Claude to reference that document. This gives you surgical control over what survives the transition.

"A schema is much stricter than prose," the video notes. "When Claude follows a defined structure, it can represent what is important more consistently and accurately."

Sub-Agents as Context Isolation

Sub-agents solve a different dimension of the problem. Each sub-agent runs in its own context window with full tool access, executes its assigned work, and returns only the final output to the main agent. All the intermediate reasoning, file reads, web searches, and tool calls stay isolated.

Research tasks demonstrate this clearly. You don't want raw information from multiple websites bleeding into your main context. Spawn a sub-agent to handle research independently, get back the synthesis, and keep your primary context clean. Same logic applies to refactoring tasks, summarization work, or document generation.

The question to ask: do you need access to intermediate steps, or just the final output? If it's the latter, isolate it.

Claude Code can spawn sub-agents automatically, but sometimes you need to explicitly request delegation in your prompt to ensure the work happens in isolation. This prevents context pollution before it starts rather than trying to clean it up later.

Rewinding vs. Re-Prompting

When Claude makes a mistake, most people try to correct forward—add another prompt explaining what went wrong and what to do instead. This leaves the incorrect reasoning in the context window. Better move: rewind (escape key twice or use the rewind command), remove the section where things went wrong, then provide the correct direction.

Rewinding cleans the context window. It ensures compaction summaries preserve only correct implementations. It prevents goal drift by not carrying forward sections where the agent deviated. If you're using sub-agents, rewinding ensures they receive clean context when tasks are handed off—incorrect approaches don't pollute their working state.

You can also summarize from the rewind point, preserving the conversation up to that stage while removing the problematic section. This maintains useful history without the baggage of failed approaches.

What This Reveals About Tool Design

The context window race—from 8k to 32k to 100k to 200k to now a million tokens—assumes bigger is straightforwardly better. Anthropic's own engineer pointing out that their flagship feature degrades at 40% capacity suggests the industry might be optimizing the wrong metric.

Context management isn't a power-user skill for edge cases. It's foundational to making these tools work at all for sustained projects. The people building with Claude Code every day have developed practices—manual compaction with explicit instructions, strategic use of clear commands, sub-agent isolation, rewinding instead of forward correction—that aren't documented in the official guides.

These aren't workarounds. They're the actual workflow. The million-token window creates space for these problems to manifest in new ways. Developers who think they can ignore context management because they have so much room are the ones who'll hit the degradation hardest.

The gap between marketing ("1 million token context!") and reality ("starts breaking at 300k") isn't unusual in AI tooling. What's interesting is seeing it documented by the people building the tool. Tariq's article isn't damage control—it's an insider acknowledging that scale introduces complexity the framing doesn't capture.

Which raises a question: if the people at Anthropic know context rot kicks in at 40% capacity, what does "solving" context windows even mean? More tokens won't fix this. Different architecture might. Better tooling around context management definitely would. But right now, developers are managing these systems manually because the systems can't manage themselves.

—Dev Kapoor