Token budget math for Claude Code: what 104K per task means

2026-03-11

Ran out of context mid-task today and lost a solid twenty minutes rebuilding the working set after the compact. The frustrating thing is I should have seen it coming. My working set was 140K tokens of loaded files, my conversation was 25K, and the tool definitions were eating another 15K. I was 60K past the sensible working ceiling before I even started editing.

Sat down and did the math I had been dodging. Claude Sonnet 4.5 has 200K of context. That is not 200K of workspace. It is 200K of everything, and "everything" includes a lot of scaffolding most people do not count.

2026-03-12

Breakdown of what actually fills the 200K, from the floor up:

System prompt. Claude Code's system prompt is around 4K tokens on a default setup. Custom instructions add more.
Tool definitions. Default tool set is ~8K tokens. Every tool you enable adds 200-800 tokens.
MCP server schemas. If you have MCP servers attached, their tool schemas get serialized in. A heavy MCP stack can eat 10-15K.
Conversation history. Grows as you work. Every user message, every assistant message, every tool result lives in here. A hour-long session is easily 30-50K.
Response headroom. Claude leaves room for itself to respond. Roughly 4-8K reserved.

That is 50-80K of "just being Claude Code" before you have loaded a single file from your repo. The actual workspace, the files you are reading and editing, lives in the remaining 120-150K.

2026-03-13

Did some napkin math on what a typical task uses.

System prompt + tools + MCP schemas: 25K (conservative estimate)
Conversation up to now: 20K (mid-session)
Response headroom: 8K
Working-set files: 40K (a few files loaded for context)
Total used: 93K.

Call it 100K for round numbers. That leaves 100K of room for deeper reads, tool outputs, or longer responses. Fine for most tasks. Not fine if you start reading a 60K-token file.

What 104K means in practice: that is the average token count I see for a "serious" agent task in my logs. Not trivial, not pathological, just real work. It is the number I budget from.

2026-03-14

Ways I blow the budget, in order of frequency:

Reading an entire large file when I needed 20 lines. A 800-line source file is roughly 8K tokens. Read with an offset+limit for the specific section.
Leaving failed tool outputs in context. A 500-line bash error stays in the conversation until compaction. Get in the habit of summarizing the error and moving on instead of re-running.
Loading entire directories. A glob that returns 30 files and then reading all 30 is a context disaster. Pick the files that actually matter.
Long tool outputs from grep/find. If the grep returns 500 lines, pipe through head or add --max-count.
MCP servers loaded but not used. Every attached MCP eats tool-definition space even if the session never calls it. If I am not using Supabase on this task, I do not attach the Supabase MCP.

2026-03-15

The compact. When you hit the context ceiling, Claude Code runs a compaction pass. It summarizes older conversation turns to free up room. This is automatic and usually seamless.

The cost: you lose fidelity on things that were discussed earlier. Specifics become summaries. The model "remembers" you talked about a bug, but it does not remember the exact error message. This matters when you are deep in a debugging session and the thing that matters is a specific stack trace from 40 minutes ago.

My workaround: when I hit about 150K used, I explicitly save state to a file (either a notes.md or a more structured debug-log.md) before the compact runs. That way the state survives the compaction even if the conversation's memory of it gets smudged.

2026-03-16

Sub-agents as a budget lever. The sub-agent post covers the dispatch decision. The budget angle: a sub-agent gets its own fresh 200K. If my main thread is at 140K and I need to do a 30K-token research task, I dispatch instead of doing it inline. The sub-agent's research stays in its own context, and I get back a small summary that does not bloat my main thread.

The arithmetic: my main thread pays maybe 3K to dispatch and 2K to absorb the summary. Total cost to my main budget: 5K. Cost if I had done the research inline: 30K plus whatever context the research touched. Sub-agent wins on the working-set math, ignoring the specialization benefits.

2026-03-17

The silent-killer budget waste: file contents I loaded an hour ago and never used again. If I read src/lib/checkout.ts at the start of a session because I thought I would edit it, and then I ended up working elsewhere, that 6K of file contents sits in my context until compaction. It is dead weight.

No clean way to free it short of asking Claude Code to explicitly drop it, which is not a native feature. The workaround I use: restart the session when the context feels cluttered. It is crude but reliable.

2026-03-19

Caching as a budget multiplier. If you are looping through a workflow with the same context, caching does not expand the 200K ceiling, but it makes every iteration after the first effectively free from a cost standpoint. See the prompt caching post for the dollar math. The context-budget math is separate: cache hits still occupy the context slots, you just pay less for them.

2026-03-20

A rule of thumb I have landed on: aim for 100K of peak usage, not 200K. The difference between "this task fits" and "this task crashes into the ceiling" is too small to run close to the wall. Plan the task around 100K. Reserve the other 100K for the model to think and for tool outputs to land.

If a task genuinely needs 150K of working set, I split it. Sub-dispatch one piece, come back, do the next piece. Sequential agent work with a fresh context per slice costs more tool dispatches but stays reliable.

2026-03-22

The numbers that matter, compressed:

200K is the hard ceiling.
100K is a sensible peak target. Past that, you are fragile.
40K working set plus 25K scaffolding plus 20K conversation is a typical task. 85K used, plenty of headroom.
Compaction triggers around 180K used. It costs fidelity.
Sub-agents reset the budget for the dispatched work.
Caching shifts cost, not size.

What the two weeks taught me

Budget from the work back. The mistake I kept making was "how much context can I load?" The right question is "how much context does this task need?" The answer is almost always less than the ceiling, and the rest is headroom for the model to do its job.

For operators running production agent workflows, the operator stack curriculum works through the full context budgeting discipline across real tasks. The agent handbook indexes the broader set of cost levers.

FAQ

Can I see my current context usage in Claude Code?

The Claude Code CLI shows context usage in the status line. Some configurations show a percentage bar. If yours does not, the /context command reports used versus ceiling.

Does context usage include tool outputs?

Yes. Every tool result that comes back is added to the conversation and counts toward the budget until compaction. Long tool outputs are one of the easiest ways to blow budget.

What happens when I hit the 200K ceiling?

Claude Code auto-compacts, summarizing older conversation turns to free room. The current task continues. You may lose fidelity on earlier context.

Is there a bigger context window available?

Yes. Claude Sonnet supports a 1M context window option on the Anthropic API for select tiers. Claude Code on the 1M model gives you 1M context, same math but with an extra zero. The pricing premium is meaningful; most solo workflows do not need it.

Does caching free up context space?

No. Caching reduces cost, not context footprint. A cached 40K block still occupies 40K of your 200K. To actually free space, you dispatch to a sub-agent or compact.