The Claude Code agent engineering handbook for operators

Most Claude Code articles you find online treat agents like a novelty. You get a screenshot of something cool, a prompt, maybe a YouTube video of someone typing "write me a todo app" and squealing when it works. That whole thing is the demo layer, and it lives one level above the operator layer.

The operator layer is where an agent has to ship a real change to a real repo, in parallel with two other agents, with a CI budget, a token budget, and a deadline. That's also where you learn which of Claude Code's features pay rent and which are scaffolding.

I have been shipping production work with Claude Code agents for about 14 months. Eleven of those months have been since sub-agents shipped. This handbook is the set of patterns I keep returning to, organized so you can find the one you need without reading the whole thing.

What this handbook covers

Eleven entries, each one tactical and specific. Each is either a decision (when to do X vs Y), a pattern (this shape shows up repeatedly), or a tutorial (here is how I actually do it).

The thread running through all eleven: an operator running a one-to-three person shop cannot afford to burn context on bad dispatch decisions or rebuild the same skill twice. The cost of doing it wrong is a wasted day, which lands harder than a failed demo. I write these after wasting enough days to learn the shape.

When to reach for a sub-agent

Sub-agents are the single biggest shift in how I work. Before sub-agents, every task ran in the same context window. You hit the token ceiling, you compacted, you lost detail, you slowed down. With sub-agents, you dispatch a task to a fresh context, get a summary back, and keep moving in your main thread.

But dispatch is not free. A sub-agent pays a tool-definition tax every time it spins up. If the task is a single read or a single grep, a plain tool call is cheaper. I walk through the decision in when to spawn a Claude Code sub-agent versus a tool call, with the specific signals I use: task surface area, context requirement, tool overlap with main thread.

The short version: if the task needs its own context, spawn. If the task needs one tool call and returns a flat result, do not spawn.

Tall black monolithic slab standing on a vast windswept high-desert plateau at twilight, placed off-center, pink alpenglow on the upper west face, distant mesas dissolving into bokeh. — // the plateau · single monolith, ground contact

MCP servers, skills, and tools

Claude Code has three extension points. Most people conflate them. The naming does not help.

Tools are single function calls. The model calls them, gets a response, continues.
Skills are packaged patterns. A SKILL.md plus a few helper files. Auto-activated when the model detects relevant work.
MCP servers are stateful protocols that expose tools and resources over a standard interface. Think of them as a daemon you can share across agents.

I picked the wrong one more than once before the shapes clicked. The decision matrix lives in MCP vs skills vs tools: picking the right Claude extension. The one-line rule: skills own a task, MCP servers own a protocol, tools own a call. If you are writing Python to glue two local things together, it is almost always a skill.

For the full architecture of how an MCP server actually works, including the stdio versus SSE transport decision and when to even build one, MCP server architecture basics walks through a real server from scratch.

Building your first skill

Skills are the extension point I reach for most. They are cheap to write, cheap to throw away, and they compound. The five files I start with every time live in building your first Claude skill: the five-file template. The whole piece is a walkthrough, not a concept piece. You end with a working skill.

The part most people miss: the description field in SKILL.md is the most important line in the whole skill. The model matches work to skills based on that line. If it is vague, the skill never auto-activates. If it is specific, the skill fires exactly when you want it.

Single tall black monolithic slab standing on a glacier ridge at twilight, pink alpenglow on the upper third of the monolith's west face, crisp blue-white ice striation on the glacier surface below. — // detail · monolith on glacier, twilight face

Agent teams and parallel review

Three reviewers looking at the same code find things one reviewer misses. This is a cliché in human code review. It is also true for agents. The parallel reviewer pattern I use now lives in agent teams for code review: the parallel-reviewer pattern. The trick is that each agent gets a different role (security, performance, style) and writes to a separate output file. Then a coordinator reads the three outputs and merges them.

The cost math works out because agent review is cheap compared to my time reviewing the same diff three times from three angles. Three parallel 5K-token reviews cost less than one 15K-token review that has to load three mental models into the same context.

For shops building products, this pattern shows up in the operator stack curriculum as one of the load-bearing workflows.

Token budgets and caching

Claude Code runs under a token budget whether you track it or not. The budget is roughly 200K context on the model, but the operational ceiling per task is much lower once you account for system prompt, tool definitions, working-set files, and the conversation itself. Token budget math for Claude Code: what 104K per task means works the numbers from first principles.

The single biggest lever is prompt caching. Claude's cache has a 5-minute TTL by default, which sounds short until you realize how often you loop through the same workflow within five minutes. Prompt caching economics: when the 5-minute TTL pays rent shows when caching is a line item worth optimizing and when it is not.

A cached token is roughly 10 percent of the cost of an uncached token on a read. Once you have a workflow that loops, prompt caching turns your bill from real-money into rounding-error.

Still tidepool at twilight reflecting a tall black monolithic slab on the far rocky shore, the monolith and its reflection forming a vertical-symmetry composition, pink alpenglow mirrored in the still pool. — // mirrored in the tidepool · same slab, inverted

Parallel versus sequential dispatch

Parallel dispatch is seductive. You spawn three agents, they all work at once, you get results in the time it takes the slowest to finish. But parallel is cheaper only if the tasks are actually independent. If they share state, or if one agent's output feeds another, you pay the coordination cost on top of the dispatch cost.

Parallel versus sequential agent dispatch: the real tradeoffs catalogs the specific shapes where each wins. The quick rule: parallel for research and review, sequential for builds that share a filesystem.

When agents fail

Every operator running Claude Code in production has a mental catalog of how agents fail. Mine has four shapes: context loss, tool confusion, infinite retry, and silent output drift. Each one has a recovery pattern that keeps the pipeline moving instead of burning the whole run. Agent failure modes and the recovery patterns that keep shipping walks through all four with the specific signals I watch for.

The hardest one is silent output drift, because by definition you do not know it is happening. I catch it with a verification pass at the end of every multi-agent run.

Background agents and worktrees

Two patterns for running more than one thing at once.

Background agents: run_in_background: true lets you start a long-running task (a build, a test run, a dev server) and keep working in your main thread. The agent notifies you when the task finishes. I used to just sleep through it. Now I use run_in_background for every command that takes more than 30 seconds. Background agents with run_in_background: when it pays off covers the trade-offs and the commands I always background.

Git worktrees: if you are running two or three Claude Code agents in parallel on the same repo, they will step on each other's file edits unless they are in separate worktrees. Git worktrees for parallel agents: the isolation pattern is the tutorial I wish I had when I first tried parallel dev. Worktrees plus file-ownership claims is the pattern that actually scales to three-plus agents.

Parade-line formation of tall black monolithic slabs of equal height standing across a vast windswept salt-flat plain at twilight, the line receding into deep distance, pink alpenglow on the upper west faces of all monoliths. — // the parade line · monoliths receding to distance

Where this ties together

The operator stack product line is built on these eleven patterns. The Claude Code skills pack bundles the ones I ship most often. If you want the full curriculum, the Operator's Stack course covers this material with video walkthroughs and production templates.

For related operator reading, the solo brand hub covers the creative-tech discipline that makes all of this deployable, and the pricing hub for productized work covers how to bill for the output once you can ship it this fast.

FAQ

Do I need to know Python or TypeScript to use Claude Code skills?

You can write skills in either, but most of mine are pure bash plus a SKILL.md. The minimum viable skill is a single markdown file with a description and a body. You add scripting as the work requires it.

What is the minimum token budget I should plan for per agent task?

Plan for 20-40K input per serious task, with a peak of 100K on complex multi-file work. If you go past 100K routinely, something is wrong with how you are loading context. The token budget post works the numbers.

Is parallel dispatch always faster than sequential?

No. Parallel is faster only when tasks are actually independent. If agents share files, share branches, or need to coordinate, sequential plus a queue is usually cheaper and more reliable. The dispatch post shows the specific shapes.

How do I stop an agent that goes in circles?

TaskStop if you are using the harness. In practice, a hard timeout on the background call plus a max-iteration counter in the system prompt catches 95 percent of infinite loops. The failure modes post covers the pattern.

Can I run all of this on Claude Code's free tier?

Most of it, yes. Prompt caching, skills, background agents, and worktrees all work on the free tier. Sub-agents and long-running MCP servers start to push against usage limits. Max or Pro is where serious operator work lives.