Most Claude Code articles you find online treat agents like a novelty. You get a screenshot of something cool, a prompt, maybe a YouTube video of someone typing "write me a todo app" and squealing when it works. That is the demo layer. It is not the operator layer.
The operator layer is the one where an agent has to ship a real change to a real repo, in parallel with two other agents, with a CI budget, a token budget, and a deadline. The operator layer is where you learn which of Claude Code's features pay rent and which are scaffolding.
Spawn a sub-agent when the work needs its own context, not just its own tool call.
Read the entry →I have been shipping production work with Claude Code agents for about 14 months. Eleven of those months have been since sub-agents shipped. This handbook is the set of patterns I keep returning to, organized so you can find the one you need without reading the whole thing.
What this handbook covers
Eleven entries, each one tactical. No theory. No "the future of AI" pieces. Each one is either a decision (when to do X vs Y), a pattern (this shape shows up repeatedly), or a tutorial (here is how I actually do it).
The thread running through all eleven: an operator running a one-to-three person shop cannot afford to burn context on bad dispatch decisions or rebuild the same skill twice. The cost of doing it wrong is not a failed demo. It is a wasted day. I write these after wasting enough days to learn the shape.
When to reach for a sub-agent
Sub-agents are the single biggest shift in how I work. Before sub-agents, every task ran in the same context window. You hit the token ceiling, you compacted, you lost detail, you slowed down. With sub-agents, you dispatch a task to a fresh context, get a summary back, and keep moving in your main thread.
But dispatch is not free. A sub-agent pays a tool-definition tax every time it spins up. If the task is a single read or a single grep, a plain tool call is cheaper. I walk through the decision in when to spawn a Claude Code sub-agent versus a tool call, with the specific signals I use: task surface area, context requirement, tool overlap with main thread.
The short version: if the task needs its own context, spawn. If the task needs one tool call and returns a flat result, do not spawn.
MCP servers, skills, and tools
Claude Code has three extension points. Most people conflate them. The naming does not help.
- Tools are single function calls. The model calls them, gets a response, continues.
- Skills are packaged patterns. A SKILL.md plus a few helper files. Auto-activated when the model detects relevant work.
- MCP servers are stateful protocols that expose tools and resources over a standard interface. Think of them as a daemon you can share across agents.
I picked the wrong one more than once before the shapes clicked. The decision matrix lives in MCP vs skills vs tools: picking the right Claude extension. The one-line rule: skills own a task, MCP servers own a protocol, tools own a call. If you are writing Python to glue two local things together, it is almost always a skill.
For the full architecture of how an MCP server actually works, including the stdio versus SSE transport decision and when to even build one, MCP server architecture basics walks through a real server from scratch.
Building your first skill
Skills are the extension point I reach for most. They are cheap to write, cheap to throw away, and they compound. The five files I start with every time live in building your first Claude skill: the five-file template. The whole piece is a walkthrough, not a concept piece. You end with a working skill.
The part most people miss: the description field in SKILL.md is the most important line in the whole skill. The model matches work to skills based on that line. If it is vague, the skill never auto-activates. If it is specific, the skill fires exactly when you want it.
Agent teams and parallel review
Three reviewers looking at the same code find things one reviewer misses. This is a cliché in human code review. It is also true for agents. The parallel reviewer pattern I use now lives in agent teams for code review: the parallel-reviewer pattern. The trick is that each agent gets a different role (security, performance, style) and writes to a separate output file. Then a coordinator reads the three outputs and merges them.
The cost math works out because agent review is cheap compared to my time reviewing the same diff three times from three angles. Three parallel 5K-token reviews cost less than one 15K-token review that has to load three mental models into the same context.
For shops building products, this pattern shows up in the operator stack curriculum as one of the load-bearing workflows.
Token budgets and caching
Claude Code runs under a token budget whether you track it or not. The budget is roughly 200K context on the model, but the operational ceiling per task is much lower once you account for system prompt, tool definitions, working-set files, and the conversation itself. Token budget math for Claude Code: what 104K per task means works the numbers from first principles.
The single biggest lever is prompt caching. Claude's cache has a 5-minute TTL by default, which sounds short until you realize how often you loop through the same workflow within five minutes. Prompt caching economics: when the 5-minute TTL pays rent shows when caching is a line item worth optimizing and when it is not.
A cached token is roughly 10 percent of the cost of an uncached token on a read. Once you have a workflow that loops, prompt caching turns your bill from real-money into rounding-error.
Parallel versus sequential dispatch
Parallel dispatch is seductive. You spawn three agents, they all work at once, you get results in the time it takes the slowest to finish. But parallel is cheaper only if the tasks are actually independent. If they share state, or if one agent's output feeds another, you pay the coordination cost on top of the dispatch cost.
Parallel versus sequential agent dispatch: the real tradeoffs catalogs the specific shapes where each wins. The quick rule: parallel for research and review, sequential for builds that share a filesystem.
When agents fail
Every operator running Claude Code in production has a mental catalog of how agents fail. Mine has four shapes: context loss, tool confusion, infinite retry, and silent output drift. Each one has a recovery pattern that keeps the pipeline moving instead of burning the whole run. Agent failure modes and the recovery patterns that keep shipping walks through all four with the specific signals I watch for.
The hardest one is silent output drift, because by definition you do not know it is happening. I catch it with a verification pass at the end of every multi-agent run.
Background agents and worktrees
Two patterns for running more than one thing at once.
Background agents: run_in_background: true lets you start a long-running task (a build, a test run, a dev server) and keep working in your main thread. The agent notifies you when the task finishes. I used to just sleep through it. Now I use run_in_background for every command that takes more than 30 seconds. Background agents with run_in_background: when it pays off covers the trade-offs and the commands I always background.
Git worktrees: if you are running two or three Claude Code agents in parallel on the same repo, they will step on each other's file edits unless they are in separate worktrees. Git worktrees for parallel agents: the isolation pattern is the tutorial I wish I had when I first tried parallel dev. Worktrees plus file-ownership claims is the pattern that actually scales to three-plus agents.
Where this ties together
The operator stack product line is built on these eleven patterns. The Claude Code skills pack bundles the ones I ship most often. If you want the full curriculum, the Operator's Stack course covers this material with video walkthroughs and production templates.
For related operator reading, the solo brand hub covers the creative-tech discipline that makes all of this deployable, and the pricing hub for productized work covers how to bill for the output once you can ship it this fast.
FAQ
Do I need to know Python or TypeScript to use Claude Code skills?
You can write skills in either, but most of mine are pure bash plus a SKILL.md. The minimum viable skill is a single markdown file with a description and a body. You add scripting as the work requires it.
What is the minimum token budget I should plan for per agent task?
Plan for 20-40K input per serious task, with a peak of 100K on complex multi-file work. If you go past 100K routinely, something is wrong with how you are loading context. The token budget post works the numbers.
Is parallel dispatch always faster than sequential?
No. Parallel is faster only when tasks are actually independent. If agents share files, share branches, or need to coordinate, sequential plus a queue is usually cheaper and more reliable. The dispatch post shows the specific shapes.
How do I stop an agent that goes in circles?
TaskStop if you are using the harness. In practice, a hard timeout on the background call plus a max-iteration counter in the system prompt catches 95 percent of infinite loops. The failure modes post covers the pattern.
Can I run all of this on Claude Code's free tier?
Most of it, yes. Prompt caching, skills, background agents, and worktrees all work on the free tier. Sub-agents and long-running MCP servers start to push against usage limits. Max or Pro is where serious operator work lives.
Read next
- When to spawn a Claude Code sub-agent versus a tool call
- MCP server architecture: when to build one versus use a skill
- Building your first Claude skill: the five-file template
- Prompt caching economics: when the 5-minute TTL pays rent
- Parallel versus sequential agent dispatch: the real tradeoffs
- Git worktrees for parallel agents: the isolation pattern
