The Claude Bible
Home / Multi-agent orchestration
Level: Expert · 11 lessons

Multi-agent orchestration

Fan-out, pipelines, adversarial verification, judge panels. Putting fleets of agents to work.

Open the interactive course212 lessons, quizzes, exercises, 3 languages, free.

Parallel fan-out vs pipeline

Orchestrating several agents is choosing a topology. The two primitives:

The classic trap: putting a barrier (parallel) where a pipeline would do, just because the code looks cleaner. A barrier is only justified if stage N needs the complete result of stage N-1 (merge, dedup, early-exit if zero). Otherwise, pipeline.

Concrete application at Pierre's: his multi-language SEO audits (Eskimoz in 4 languages) are fan-outs; one agent per language, aggregated at the end. His model rule applies: Haiku/Sonnet for the mass agents, Opus for the synthesis.

Key points
  • Parallel/fan-out: N agents at once + barrier, when you want everything together
  • Pipeline: each item travels the stages with no barrier (default for multi-stage)
  • Barrier justified only if stage N needs the complete result of N-1
  • Mass agents in Haiku/Sonnet, synthesis in Opus

Adversarial verification and judge panels

An agent that finds bugs or findings produces plausible-but-false output (hallucination, again). The orchestration fix: have each finding verified by other agents before keeping it.

Quality patterns:

Guiding principle: confidence comes from independent perspectives that contradict each other, not from one self-assured agent. It is exactly the spirit of Pierre's reflex "reproduce via Playwright before patching": verify before believing, applied at agent scale.

Key points
  • Verify each finding with other agents before keeping it
  • Adversarial: N skeptics refute; keep if the majority fails to refute
  • Multi-perspective: different lenses; judge panel: N scored solutions
  • Loop-until-dry for unknown-size searches; verify before believing

Workflows: deterministic orchestration

When orchestration becomes complex (loops, conditions, fan-out, budgets), you move from an improvising agent to a workflow: a script that orchestrates the sub-agents deterministically. The control flow (who runs, when, in parallel or serial) is coded, not decided by the model.

Typical building blocks of a workflow engine:

The point versus one big agent: structure (decompose and cover in parallel), confidence (verify before concluding) and scale (migrations or audits a single context could not hold). You stay in the loop: you read each result before deciding the next step. It is the most advanced rung, to reserve for tasks that truly justify it, because it consumes a lot of tokens.

Key points
  • Workflow = script that orchestrates sub-agents deterministically
  • Blocks: agent(schema), pipeline, parallel, until-count/dry/budget loops
  • For: structure, confidence (verify), scale (massive migrations/audits)
  • Token-heavy: reserve for tasks that justify it

Barriers vs no barrier

In a multi-agent pipeline (a chain of AI agents where each does one job), you must decide at every handoff: does the next step need to wait for all previous results, or can it start as soon as any one result arrives? That decision is called placing a barrier (or not).

A barrier is a synchronization point. No agent downstream of the barrier starts until every agent upstream has finished. This is the right choice when the next step genuinely needs the complete picture before it can act. A no-barrier (also called streaming or fan-in without wait) lets results flow through one by one as they arrive, so downstream work begins immediately.

Ask yourself one question: "Can the next step produce a correct output with only partial data?" If yes, skip the barrier. If no, add one. Getting this wrong in either direction costs you: an unnecessary barrier serializes what could run in parallel, wasting time; a missing barrier corrupts results because downstream agents act on incomplete information.

Key points
  • A barrier holds all downstream agents until every upstream agent finishes.
  • Skip the barrier when each result is independently actionable.
  • Unnecessary barriers serialize parallel work and waste time.
  • Quorum barriers (wait for N of M) are a valid middle ground.

Loop until dry

Some tasks have an unknown boundary: you do not know how many items exist until you have finished collecting them. Pagination, recursive directory scans, and iterative web crawls all share this shape. The right pattern is a dry-run loop: repeat a search or fetch round, collect new results, and stop only when a round returns nothing new.

In a multi-agent context (where several Claude instances hand work to each other), the orchestrator agent runs the loop and dispatches each batch to worker agents. The orchestrator tracks a seen set, a deduplicated collection of everything already processed, and compares each new round against it. When the set stops growing, the loop exits.

Claude Code supports this pattern through chained shell commands and subagent calls. A minimal loop in a Claude Code task looks like this:

  1. Run a search or API call and capture the output.
  2. Diff the output against the seen set.
  3. If the diff is non-empty, add new items to the seen set, dispatch work, then go to step 1.
  4. If the diff is empty, stop and report.

Two safeguards are mandatory: a max-rounds cap (for example, 50 iterations) to prevent infinite loops caused by bugs or API quirks, and idempotent workers (workers that produce the same result if they accidentally process the same item twice). Without these, a dry-run loop can run forever or corrupt results.

Key points
  • Dry-run loop: repeat until a round returns nothing new
  • Seen set: deduplicated record of already-processed items
  • Orchestrator dispatches; workers are idempotent
  • Always cap max rounds to prevent infinite loops

Worktrees for parallel agents

When you run multiple Claude Code agents at once, they all operate on the same repository files by default. If two agents edit the same file simultaneously, one will overwrite the other. Git worktrees solve this: a worktree is an additional working directory linked to the same repository, checked out at its own branch, so each agent gets isolated files with no overlap.

You create a worktree with git worktree add. Each worktree has its own branch and its own copy of the working files on disk. The agents run in separate directories and never touch each other's files. When their work is done, you merge the branches normally.

Claude Code supports this pattern directly. The /worktrees command (and the --worktree flag when launching a sub-agent) tells an agent which worktree path to operate in. The orchestrator agent creates the worktrees, assigns one to each sub-agent, then waits for all to finish before merging.

Key points
  • git worktree add creates an isolated working directory on a separate branch
  • each parallel agent is pointed at one worktree so files never collide
  • the orchestrator merges branches after all agents finish
  • git worktree remove cleans up when done

Dispatching parallel agents

When a job can be split into independent pieces, running those pieces one after another wastes time. Fan-out means launching several agents (or sub-processes) at the same moment, each handling a separate slice of the work, then collecting all results when they finish. Claude Code supports this pattern through the Agent tool, which lets one orchestrating agent spawn child agents.

The key rule is independence: tasks you fan out must not depend on each other's output. If task B needs task A to finish first, those two must stay sequential. Good candidates for fan-out include: auditing separate files, translating the same content into several languages, running the same prompt against different data sets, or fetching several URLs in parallel.

A typical fan-out workflow has three stages:

  1. Decompose: the orchestrator breaks the goal into N independent sub-tasks.
  2. Dispatch: it calls the Agent tool N times, one call per sub-task, without waiting between calls.
  3. Collect: once all agents return, the orchestrator merges or summarises the results.

In Claude Code you can also fan out at the shell level using --print (non-interactive mode) and background processes, then join the outputs. This works well for simple tasks where you control the shell environment directly.

Key points
  • Fan-out: launching independent sub-tasks simultaneously instead of sequentially.
  • Orchestrator: the parent agent that dispatches and later collects child agents.
  • Independence check: fan-out only works when sub-tasks share no dependencies.
  • Collect phase: merging or summarising all agent outputs after they complete.

Scaling fan-out to a budget

A fan-out is when an orchestrator (the coordinating agent) spawns multiple sub-agents in parallel to tackle different parts of a problem at once. Each sub-agent consumes tokens, so the total cost of a fan-out run equals the sum of every agent's input and output tokens. Without a plan, costs compound fast.

The first lever is model selection per task. Not every sub-agent needs the most capable model. Assign claude-opus-4-8 only to tasks that require deep reasoning, such as architectural decisions or ambiguous analysis. Use claude-sonnet-4-6 for mid-complexity work like code generation, and claude-haiku-4-5 for high-volume, simple tasks like classification, formatting, or extraction. This alone can reduce a run's cost by 80 percent or more.

The second lever is context trimming. Each agent's input is billed in full. Pass only the slice of context that agent actually needs: a relevant file, a short summary, or a structured object rather than the entire conversation history. Prompt caching (reusing a shared prefix across agents) further cuts repeated-context charges when multiple agents share a large system prompt or reference document.

Practical budget controls to apply before launching a fleet:

Key points
  • Assign models by task complexity, not by habit
  • Trim each agent's context to only what it needs
  • Cap max_tokens and agent count before launching
  • Use prompt caching for shared prefixes across agents

Schemas for clean agent data

In a multi-agent pipeline (a chain of AI models passing results to each other), one agent's output becomes the next agent's input. If that output is free-form text, the receiving agent must guess at the structure, which causes silent errors. The fix is structured output: forcing the model to return data in a strict, machine-readable shape such as JSON.

Claude supports structured output through tool use. You define a JSON Schema (a formal description of the fields, types, and required properties you expect) and pass it as a tool definition. Claude then fills in that schema instead of writing prose. The result is a JSON object your code can parse and validate without any string manipulation.

Key reasons to enforce schemas in agent chains:

In Claude Code, the Claude API (the HTTP interface your agent calls programmatically) lets you pass a tools array with one tool whose input_schema defines exactly what you want back. Setting tool_choice to {"type":"tool","name":"your_tool"} forces Claude to call that tool every time, guaranteeing structured output on every request.

Key points
  • Structured output removes ambiguity between agents
  • JSON Schema defines the exact fields and types Claude must return
  • tool_choice forces a specific tool call on every request
  • Validate the schema immediately to catch errors before they propagate

Resuming and caching a workflow

A multi-agent workflow (a pipeline where several AI sub-agents handle different tasks in sequence) can be expensive to rerun from scratch every time you change one step. The solution is partial resumption: re-running only the steps whose inputs changed, and reusing the outputs of everything else.

Claude Code supports this through two complementary mechanisms. Prompt caching (an Anthropic API feature) stores the token-level computation for a long, stable system prompt or context block so the model skips reprocessing it on the next call. This cuts both latency and cost. Cache hits are billed at roughly 10 percent of the normal input-token rate. The cache is keyed by the exact prefix text, so even a single character change in the cached block invalidates it.

At the workflow level, you control resumption with checkpoints: saved outputs from each agent step written to disk or a store. When you rerun the pipeline, each step checks whether its checkpoint is still valid (inputs unchanged) before calling the model at all. Common patterns include:

In Claude Code, you can script this logic in a shell or Node orchestrator that calls claude with the --print flag (non-interactive, prints the response and exits) and writes each output to a file. On the next run, read the file first and skip the claude call entirely if the checkpoint is fresh.

Key points
  • Prompt caching cuts cost by reusing stable context across API calls
  • Checkpoints save each step output so only changed steps rerun
  • Hash or timestamp the inputs to decide whether a checkpoint is still valid
  • Use --print for non-interactive claude calls inside orchestration scripts

The completeness critic

In a multi-agent pipeline (a chain of AI agents each doing one job), the last bottleneck is rarely wrong content. It is missing content. A completeness critic is a final agent whose only job is to ask: "What should be here that is not?" It reviews the output of all previous agents against the original brief and flags gaps before the result reaches the user.

This agent is deliberately narrow. It does not rewrite, improve tone, or check facts. It only compares the scope of the brief against the scope of the output and returns a structured list of omissions. Keeping it narrow makes it fast, cheap (a Haiku-class model is usually enough), and easy to test.

Common things a completeness critic catches:

The critic feeds its findings back into the pipeline as a structured diff (a machine-readable list of differences). A second pass agent, or the orchestrator itself (the agent that coordinates all other agents), then decides which gaps to fill, which to accept, and which to escalate to the human.

Key points
  • Completeness critic: agent that finds missing content, not errors
  • Scope diff: comparing what the brief requested vs. what was delivered
  • Narrow role keeps the critic fast and testable
  • Output is a structured list fed back to the orchestrator
Work with me

Master Claude, Claude Code and LLMs, from your first prompt to multi-agent orchestration.

Like this course? I built it end to end. Need a web app, mobile app, AI automation or SEO/GEO? Let us talk.

Contact me on LinkedInSee a site I built