Orchestrating several agents is choosing a topology. The two primitives:
Parallel (fan-out): launch N agents at once on independent tasks, wait for everyone (a barrier), then aggregate. Use it when you need all the results together (global dedup, total count).
Pipeline: each item travels through all the stages independently, with no barrier between stages. Item A can be at stage 3 while B is still at stage 1. The default for multi-stage work: total time is the slowest single chain, not the sum of the slowest per stage.
The classic trap: putting a barrier (parallel) where a pipeline would do, just because the code looks cleaner. A barrier is only justified if stage N needs the complete result of stage N-1 (merge, dedup, early-exit if zero). Otherwise, pipeline.
Concrete application at Pierre's: his multi-language SEO audits (Eskimoz in 4 languages) are fan-outs; one agent per language, aggregated at the end. His model rule applies: Haiku/Sonnet for the mass agents, Opus for the synthesis.
Key points
Parallel/fan-out: N agents at once + barrier, when you want everything together
Pipeline: each item travels the stages with no barrier (default for multi-stage)
Barrier justified only if stage N needs the complete result of N-1
Mass agents in Haiku/Sonnet, synthesis in Opus
Adversarial verification and judge panels
An agent that finds bugs or findings produces plausible-but-false output (hallucination, again). The orchestration fix: have each finding verified by other agents before keeping it.
Quality patterns:
Adversarial verification: for each finding, launch N skeptic agents whose instruction is to refute it. Keep the finding only if the majority fails to refute it. Kills plausible false positives.
Multi-perspective verification: if a finding can fail in several ways, give each verifier a different lens (correctness, security, performance, reproducibility) rather than N identical copies.
Judge panel: generate N independent solutions from different angles, score them with parallel judges, synthesize from the winner while grafting the best ideas of the others.
Loop-until-dry: for a search of unknown size (bugs, edge cases), relaunch finders until K consecutive rounds return nothing new.
Guiding principle: confidence comes from independent perspectives that contradict each other, not from one self-assured agent. It is exactly the spirit of Pierre's reflex "reproduce via Playwright before patching": verify before believing, applied at agent scale.
Key points
Verify each finding with other agents before keeping it
Adversarial: N skeptics refute; keep if the majority fails to refute
Multi-perspective: different lenses; judge panel: N scored solutions
Loop-until-dry for unknown-size searches; verify before believing
Workflows: deterministic orchestration
When orchestration becomes complex (loops, conditions, fan-out, budgets), you move from an improvising agent to a workflow: a script that orchestrates the sub-agents deterministically. The control flow (who runs, when, in parallel or serial) is coded, not decided by the model.
Typical building blocks of a workflow engine:
agent(prompt, schema): launch a sub-agent, get a validated structured output.
pipeline(items, ...stages): pass each item through the stages with no barrier.
parallel(thunks): a barrier, everything at once.
Loops: loop-until-count (accumulate to N), loop-until-dry (until exhaustion), loop-until-budget (until a token quota).
The point versus one big agent: structure (decompose and cover in parallel), confidence (verify before concluding) and scale (migrations or audits a single context could not hold). You stay in the loop: you read each result before deciding the next step. It is the most advanced rung, to reserve for tasks that truly justify it, because it consumes a lot of tokens.
Key points
Workflow = script that orchestrates sub-agents deterministically
In a multi-agent pipeline (a chain of AI agents where each does one job), you must decide at every handoff: does the next step need to wait for all previous results, or can it start as soon as any one result arrives? That decision is called placing a barrier (or not).
A barrier is a synchronization point. No agent downstream of the barrier starts until every agent upstream has finished. This is the right choice when the next step genuinely needs the complete picture before it can act. A no-barrier (also called streaming or fan-in without wait) lets results flow through one by one as they arrive, so downstream work begins immediately.
Ask yourself one question: "Can the next step produce a correct output with only partial data?" If yes, skip the barrier. If no, add one. Getting this wrong in either direction costs you: an unnecessary barrier serializes what could run in parallel, wasting time; a missing barrier corrupts results because downstream agents act on incomplete information.
Use a barrier when you are aggregating scores, joining datasets, writing a final summary, or any operation that is undefined on a subset.
No barrier needed when each result is independently actionable: translating documents, resizing images, sending individual notifications, or streaming partial answers to a user.
Partial barriers are also valid: wait for the first N results (a quorum), then proceed, discarding stragglers.
Key points
A barrier holds all downstream agents until every upstream agent finishes.
Skip the barrier when each result is independently actionable.
Unnecessary barriers serialize parallel work and waste time.
Quorum barriers (wait for N of M) are a valid middle ground.
Loop until dry
Some tasks have an unknown boundary: you do not know how many items exist until you have finished collecting them. Pagination, recursive directory scans, and iterative web crawls all share this shape. The right pattern is a dry-run loop: repeat a search or fetch round, collect new results, and stop only when a round returns nothing new.
In a multi-agent context (where several Claude instances hand work to each other), the orchestrator agent runs the loop and dispatches each batch to worker agents. The orchestrator tracks a seen set, a deduplicated collection of everything already processed, and compares each new round against it. When the set stops growing, the loop exits.
Claude Code supports this pattern through chained shell commands and subagent calls. A minimal loop in a Claude Code task looks like this:
Run a search or API call and capture the output.
Diff the output against the seen set.
If the diff is non-empty, add new items to the seen set, dispatch work, then go to step 1.
If the diff is empty, stop and report.
Two safeguards are mandatory: a max-rounds cap (for example, 50 iterations) to prevent infinite loops caused by bugs or API quirks, and idempotent workers (workers that produce the same result if they accidentally process the same item twice). Without these, a dry-run loop can run forever or corrupt results.
Key points
Dry-run loop: repeat until a round returns nothing new
Seen set: deduplicated record of already-processed items
Orchestrator dispatches; workers are idempotent
Always cap max rounds to prevent infinite loops
Worktrees for parallel agents
When you run multiple Claude Code agents at once, they all operate on the same repository files by default. If two agents edit the same file simultaneously, one will overwrite the other. Git worktrees solve this: a worktree is an additional working directory linked to the same repository, checked out at its own branch, so each agent gets isolated files with no overlap.
You create a worktree with git worktree add. Each worktree has its own branch and its own copy of the working files on disk. The agents run in separate directories and never touch each other's files. When their work is done, you merge the branches normally.
Claude Code supports this pattern directly. The /worktrees command (and the --worktree flag when launching a sub-agent) tells an agent which worktree path to operate in. The orchestrator agent creates the worktrees, assigns one to each sub-agent, then waits for all to finish before merging.
No file collisions: each agent writes only to its own directory.
No branch conflicts: each worktree is on its own branch.
Clean merge point: the orchestrator merges all branches after the agents report completion.
Easy cleanup:git worktree remove deletes the directory and deregisters it.
Key points
git worktree add creates an isolated working directory on a separate branch
each parallel agent is pointed at one worktree so files never collide
the orchestrator merges branches after all agents finish
git worktree remove cleans up when done
Dispatching parallel agents
When a job can be split into independent pieces, running those pieces one after another wastes time. Fan-out means launching several agents (or sub-processes) at the same moment, each handling a separate slice of the work, then collecting all results when they finish. Claude Code supports this pattern through the Agent tool, which lets one orchestrating agent spawn child agents.
The key rule is independence: tasks you fan out must not depend on each other's output. If task B needs task A to finish first, those two must stay sequential. Good candidates for fan-out include: auditing separate files, translating the same content into several languages, running the same prompt against different data sets, or fetching several URLs in parallel.
A typical fan-out workflow has three stages:
Decompose: the orchestrator breaks the goal into N independent sub-tasks.
Dispatch: it calls the Agent tool N times, one call per sub-task, without waiting between calls.
Collect: once all agents return, the orchestrator merges or summarises the results.
In Claude Code you can also fan out at the shell level using --print (non-interactive mode) and background processes, then join the outputs. This works well for simple tasks where you control the shell environment directly.
Key points
Fan-out: launching independent sub-tasks simultaneously instead of sequentially.
Orchestrator: the parent agent that dispatches and later collects child agents.
Independence check: fan-out only works when sub-tasks share no dependencies.
Collect phase: merging or summarising all agent outputs after they complete.
Scaling fan-out to a budget
A fan-out is when an orchestrator (the coordinating agent) spawns multiple sub-agents in parallel to tackle different parts of a problem at once. Each sub-agent consumes tokens, so the total cost of a fan-out run equals the sum of every agent's input and output tokens. Without a plan, costs compound fast.
The first lever is model selection per task. Not every sub-agent needs the most capable model. Assign claude-opus-4-8 only to tasks that require deep reasoning, such as architectural decisions or ambiguous analysis. Use claude-sonnet-4-6 for mid-complexity work like code generation, and claude-haiku-4-5 for high-volume, simple tasks like classification, formatting, or extraction. This alone can reduce a run's cost by 80 percent or more.
The second lever is context trimming. Each agent's input is billed in full. Pass only the slice of context that agent actually needs: a relevant file, a short summary, or a structured object rather than the entire conversation history. Prompt caching (reusing a shared prefix across agents) further cuts repeated-context charges when multiple agents share a large system prompt or reference document.
Practical budget controls to apply before launching a fleet:
Set max_tokens per agent to the minimum needed for that task type.
Cap the number of parallel agents: more concurrency raises cost without always raising quality.
Add a dry-run estimate step: count tokens in planned inputs before committing to a full run.
Use early termination: if an intermediate result already meets the success criterion, cancel remaining agents.
Log token usage per agent call and set a hard ceiling in the orchestrator loop.
Key points
Assign models by task complexity, not by habit
Trim each agent's context to only what it needs
Cap max_tokens and agent count before launching
Use prompt caching for shared prefixes across agents
Schemas for clean agent data
In a multi-agent pipeline (a chain of AI models passing results to each other), one agent's output becomes the next agent's input. If that output is free-form text, the receiving agent must guess at the structure, which causes silent errors. The fix is structured output: forcing the model to return data in a strict, machine-readable shape such as JSON.
Claude supports structured output through tool use. You define a JSON Schema (a formal description of the fields, types, and required properties you expect) and pass it as a tool definition. Claude then fills in that schema instead of writing prose. The result is a JSON object your code can parse and validate without any string manipulation.
Key reasons to enforce schemas in agent chains:
Reliability: downstream agents receive predictable keys and types, not ambiguous text.
Validation: you can reject or retry a response the moment a required field is missing, before bad data propagates.
Observability: structured logs are easier to search, diff, and monitor in production.
Composability: any agent that speaks the same schema can be swapped in or out without rewriting the pipeline glue code.
In Claude Code, the Claude API (the HTTP interface your agent calls programmatically) lets you pass a tools array with one tool whose input_schema defines exactly what you want back. Setting tool_choice to {"type":"tool","name":"your_tool"} forces Claude to call that tool every time, guaranteeing structured output on every request.
Key points
Structured output removes ambiguity between agents
JSON Schema defines the exact fields and types Claude must return
tool_choice forces a specific tool call on every request
Validate the schema immediately to catch errors before they propagate
Resuming and caching a workflow
A multi-agent workflow (a pipeline where several AI sub-agents handle different tasks in sequence) can be expensive to rerun from scratch every time you change one step. The solution is partial resumption: re-running only the steps whose inputs changed, and reusing the outputs of everything else.
Claude Code supports this through two complementary mechanisms. Prompt caching (an Anthropic API feature) stores the token-level computation for a long, stable system prompt or context block so the model skips reprocessing it on the next call. This cuts both latency and cost. Cache hits are billed at roughly 10 percent of the normal input-token rate. The cache is keyed by the exact prefix text, so even a single character change in the cached block invalidates it.
At the workflow level, you control resumption with checkpoints: saved outputs from each agent step written to disk or a store. When you rerun the pipeline, each step checks whether its checkpoint is still valid (inputs unchanged) before calling the model at all. Common patterns include:
Content hash check: hash the step inputs and compare to the hash stored with the checkpoint. Match means skip.
Timestamp check: skip if the checkpoint file is newer than every source file the step reads.
Explicit invalidation: pass a --from step-name flag to your orchestrator to force rerun from a named step onward.
Dependency graph: model which steps depend on which outputs; invalidate only the downstream steps when an upstream output changes.
In Claude Code, you can script this logic in a shell or Node orchestrator that calls claude with the --print flag (non-interactive, prints the response and exits) and writes each output to a file. On the next run, read the file first and skip the claude call entirely if the checkpoint is fresh.
Key points
Prompt caching cuts cost by reusing stable context across API calls
Checkpoints save each step output so only changed steps rerun
Hash or timestamp the inputs to decide whether a checkpoint is still valid
Use --print for non-interactive claude calls inside orchestration scripts
The completeness critic
In a multi-agent pipeline (a chain of AI agents each doing one job), the last bottleneck is rarely wrong content. It is missing content. A completeness critic is a final agent whose only job is to ask: "What should be here that is not?" It reviews the output of all previous agents against the original brief and flags gaps before the result reaches the user.
This agent is deliberately narrow. It does not rewrite, improve tone, or check facts. It only compares the scope of the brief against the scope of the output and returns a structured list of omissions. Keeping it narrow makes it fast, cheap (a Haiku-class model is usually enough), and easy to test.
Common things a completeness critic catches:
A section mentioned in the brief that never appeared in the output
An example promised in the introduction but never written
A constraint (word count, audience, language) that was silently dropped
An action item from a meeting summary that was paraphrased out of existence
The critic feeds its findings back into the pipeline as a structured diff (a machine-readable list of differences). A second pass agent, or the orchestrator itself (the agent that coordinates all other agents), then decides which gaps to fill, which to accept, and which to escalate to the human.
Key points
Completeness critic: agent that finds missing content, not errors
Scope diff: comparing what the brief requested vs. what was delivered
Narrow role keeps the critic fast and testable
Output is a structured list fed back to the orchestrator
Work with me
Master Claude, Claude Code and LLMs, from your first prompt to multi-agent orchestration.
Like this course? I built it end to end. Need a web app, mobile app, AI automation or SEO/GEO? Let us talk.