When to spawn a Claude Code sub-agent versus a tool call

I spent the first month with Claude Code sub-agents spawning one for every task: reading a file, running a grep, checking a config, dispatching no matter how small the job was. My runs were slow and my token bill was twice what it should have been. Sub-agents have a cost, and that cost is not visible in the moment you spawn one.

atmospheric wide of iridescent glass form on dark polished stone with electric-blue and hot-pink refraction - hero image for the Claude sub-agents when-to-use guide — // prism-atmo · iridescent family

This is the decision log I use now. Four questions, one answer, clear about when sub-agents pay rent and when they are just overhead wearing a fancy coat.

ultra-wide strip of iridescent glass form on dark polished stone with electric-blue and hot-pink refraction - companion image for the Claude sub-agents when-to-use guide — // prism-strip · iridescent family

The fork

Every task in Claude Code lands at the same decision point. Do I run this in the main thread with a regular tool call, or do I spawn a sub-agent via the Task tool? The two options feel similar. They are not.

A tool call is a single function invocation. The model calls it, the tool returns, the model continues in the same context. Cheap, fast, no overhead.

A sub-agent is a fresh context window with its own system prompt, its own tool definitions, and its own conversation. The spawn itself costs tokens (the tool definitions get serialized in). The return trip costs tokens (the summary comes back into the main thread). If the task is small, you pay more to dispatch than you save by offloading.

Option A: just make the tool call

What this gives you: zero dispatch overhead, instant result, everything stays in one context.

What it costs you: context pollution. Every file you read, every grep result, every error output stays in the main thread's context window. If the main thread is already loaded up, you are eating budget that the main task needs.

The tool-call route is the right call for:

Reading a specific file whose path you already know
Running a one-shot grep or glob for a known pattern
Running a short bash command whose output is a few lines
Any task where the result is a flat string or number you are going to use immediately

cinematic trail-aspect of iridescent glass form on dark polished stone with electric-blue and hot-pink refraction - companion image for the Claude sub-agents when-to-use guide — // prism-trail · iridescent family

Option B: spawn a sub-agent

What this gives you: a fresh context, isolation from the main thread, the ability to do long multi-step work without polluting your working set. Sub-agents can also run in parallel, which is the single biggest lever in a multi-agent workflow.

What it costs you: dispatch overhead (2K-4K tokens of tool definitions at spawn time), the round-trip latency of the Task tool, and the summarization cost when the sub-agent returns. Also you lose the interactive back-and-forth. Once spawned, the sub-agent runs to completion. If it goes off-track, you find out at the end.

The sub-agent route is the right call for:

Any research task that involves reading 5+ files to answer a question
Any build task that takes multi-step reasoning and produces a summary
Any work that can run in parallel with other work (see parallel versus sequential dispatch)
Any task that pollutes the main context with information the main task does not need to retain

portrait orientation of iridescent glass form on dark polished stone with electric-blue and hot-pink refraction - companion image for the Claude sub-agents when-to-use guide — // prism-portrait · iridescent family

The four signals I use now

After burning budget on a lot of unnecessary dispatches, I landed on four questions that decide for me.

1. Does this task need its own context window?

If the answer is "I am going to read ten files and come back with a paragraph," spawn. The ten-file read pollutes the main thread. The summary is what the main thread actually needs.

If the answer is "I need one file and I know its path," just make the Read call.

2. Does this task run in parallel with others?

If you are dispatching three tasks to three sub-agents and waiting on all three, that is textbook sub-agent work. The time savings alone pays for the dispatch cost.

If you are doing one thing at a time, check the other three signals before deciding.

3. Do you need tools the main thread does not have loaded?

The main Claude Code thread has a default tool set. Sub-agents can be spawned with a narrower or different tool set. If the task needs a specific tool (say, a Playwright browser or a Supabase MCP action) and the main thread does not already have it loaded, a sub-agent gives you isolation.

In practice this is the least common signal. Most of my sub-agents use the same tool set as the main thread.

4. Is the task big enough to justify dispatch overhead?

Rough math: a sub-agent dispatch costs about 2-4K tokens on the spawn and another 1-2K on the return summary. If the task itself is under 2K tokens of work, the dispatch overhead is larger than the work. Just do it in the main thread.

If the task is 10K+ tokens of work, the dispatch overhead is a rounding error and you get context isolation as a free bonus.

The token budget math post works the numbers in more detail.

tall vertical of iridescent glass form on dark polished stone with electric-blue and hot-pink refraction - companion image for the Claude sub-agents when-to-use guide — // prism-tall · iridescent family

Option C: the middle path, skills

There is a third option I did not mention at the top. Skills auto-activate when the model detects relevant work. A skill does not require explicit dispatch. It slots into the main thread and contributes patterns without a full context handoff.

For common recurring tasks (formatting a PR description, running a specific kind of audit, generating a specific kind of scaffold), a skill is cheaper than a sub-agent and more persistent than a one-off tool call. Building your first Claude skill covers the template I use.

The decision ladder I follow now:

If a skill covers it, the skill handles it.
If a single tool call covers it, make the tool call.
If neither, spawn a sub-agent.

What I chose and why

The big change in my workflow came from counting. I started logging every sub-agent dispatch, what it cost, and what I got back. After two weeks, the pattern was obvious. Half my dispatches were returning single-line summaries that I could have gotten from a tool call. The other half were paying their cost ten times over.

Now I use the four signals above as a mental filter. If two or more say "yes, spawn," I spawn. Otherwise I stay in the main thread.

The results: my per-task token count dropped about 30 percent over two weeks. Latency dropped because I was waiting on fewer round-trips. And the main thread stayed coherent because it was not polluted with research detritus.

macro detail of iridescent glass form on dark polished stone with electric-blue and hot-pink refraction - companion image for the Claude sub-agents when-to-use guide — // prism-edge-macro · iridescent family

What I would revisit

Two things I am not sure about yet.

The skill-versus-sub-agent line. Some tasks I handle as skills that could just as well be sub-agents. I lean on skills because they are cheaper per invocation, but the line is blurry and I change my mind about specific tasks as I use them more.

The parallelism threshold. I dispatch to parallel sub-agents at three-plus tasks. Below that I go sequential. But I have not actually measured whether two tasks is a good breakpoint or a bad one. The parallel dispatch post covers what I know; the edge I am still learning is where the break-even actually sits.

For shops building agent-first workflows, the patterns here show up in the Operator's Stack curriculum and in the broader agent handbook that organizes all of this.

FAQ

How many tokens does a sub-agent dispatch actually cost?

In my measurements, roughly 2-4K tokens on the spawn (tool definitions + system prompt) plus 1-2K on the return summary. Exact numbers depend on how many tools you load for the sub-agent. A narrowed sub-agent with four tools is cheaper than one with fifteen.

Can sub-agents call other sub-agents?

Yes, though you rarely want them to. Nested dispatch multiplies the overhead and makes debugging harder. If a sub-agent needs to split work, I usually have it return a plan and dispatch the sub-tasks from the main thread.

What happens if a sub-agent runs out of context mid-task?

It compacts, same as the main thread. But you lose granular visibility into what it compacted. This is one of the reasons I prefer sub-agents for bounded work rather than open-ended research.

Is there a way to kill a running sub-agent?

The harness has TaskStop. In practice, I set a time budget in the sub-agent's prompt and let it exit on its own. A hard kill from the harness is available but rarely needed if the prompt is bounded.

Should I use sub-agents for code review?

Yes, this is one of the clearest wins. Three parallel reviewers with different roles is cheaper than one sequential reviewer doing three passes. The parallel code review post covers the pattern.