Home / Multi-agent orchestration

Level: Expert · 14 lessons

Multi-agent orchestration

Fan-out, pipelines, adversarial verification, judge panels. Putting fleets of agents to work.

Open the interactive course237 lessons, quizzes, exercises, a final exam with a diploma, 3 languages, free.

Parallel fan-out vs pipeline

Orchestrating several agents is choosing a topology. The two primitives:

Parallel (fan-out): launch N agents at once on independent tasks, wait for everyone (a barrier), then aggregate. Use it when you need all the results together (global dedup, total count).
Pipeline: each item travels through all the stages independently, with no barrier between stages. Item A can be at stage 3 while B is still at stage 1. The default for multi-stage work: total time is the slowest single chain, not the sum of the slowest per stage.

The classic trap: putting a barrier (parallel) where a pipeline would do, just because the code looks cleaner. A barrier is only justified if stage N needs the complete result of stage N-1 (merge, dedup, early-exit if zero). Otherwise, pipeline.

Concrete application at Pierre's: his multi-language SEO audits (Eskimoz in 4 languages) are fan-outs; one agent per language, aggregated at the end. His model rule applies: Haiku/Sonnet for the mass agents, Opus for the synthesis.

Key points

Parallel/fan-out: N agents at once + barrier, when you want everything together
Pipeline: each item travels the stages with no barrier (default for multi-stage)
Barrier justified only if stage N needs the complete result of N-1
Mass agents in Haiku/Sonnet, synthesis in Opus

Adversarial verification and judge panels

An agent that finds bugs or findings produces plausible-but-false output (hallucination, again). The orchestration fix: have each finding verified by other agents before keeping it.

Quality patterns:

Adversarial verification: for each finding, launch N skeptic agents whose instruction is to refute it. Keep the finding only if the majority fails to refute it. Kills plausible false positives.
Multi-perspective verification: if a finding can fail in several ways, give each verifier a different lens (correctness, security, performance, reproducibility) rather than N identical copies.
Judge panel: generate N independent solutions from different angles, score them with parallel judges, synthesize from the winner while grafting the best ideas of the others.
Loop-until-dry: for a search of unknown size (bugs, edge cases), relaunch finders until K consecutive rounds return nothing new.

Guiding principle: confidence comes from independent perspectives that contradict each other, not from one self-assured agent. It is exactly the spirit of Pierre's reflex "reproduce via Playwright before patching": verify before believing, applied at agent scale.

Key points

Verify each finding with other agents before keeping it
Adversarial: N skeptics refute; keep if the majority fails to refute
Multi-perspective: different lenses; judge panel: N scored solutions
Loop-until-dry for unknown-size searches; verify before believing

Workflows: deterministic orchestration

When orchestration becomes complex (loops, conditions, fan-out, budgets), you move from an improvising agent to a workflow: a script that orchestrates the sub-agents deterministically. The control flow (who runs, when, in parallel or serial) is coded, not decided by the model.

Typical building blocks of a workflow engine:

agent(prompt, schema): launch a sub-agent, get a validated structured output.
pipeline(items, ...stages): pass each item through the stages with no barrier.
parallel(thunks): a barrier, everything at once.
Loops: loop-until-count (accumulate to N), loop-until-dry (until exhaustion), loop-until-budget (until a token quota).

The point versus one big agent: structure (decompose and cover in parallel), confidence (verify before concluding) and scale (migrations or audits a single context could not hold). You stay in the loop: you read each result before deciding the next step. It is the most advanced rung, to reserve for tasks that truly justify it, because it consumes a lot of tokens.

Key points

Workflow = script that orchestrates sub-agents deterministically
Blocks: agent(schema), pipeline, parallel, until-count/dry/budget loops
For: structure, confidence (verify), scale (massive migrations/audits)
Token-heavy: reserve for tasks that justify it

Barriers vs no barrier

In a multi-agent pipeline (a chain of AI agents where each does one job), you must decide at every handoff: does the next step need to wait for all previous results, or can it start as soon as any one result arrives? That decision is called placing a barrier (or not).

A barrier is a synchronization point. No agent downstream of the barrier starts until every agent upstream has finished. This is the right choice when the next step genuinely needs the complete picture before it can act. A no-barrier (also called streaming or fan-in without wait) lets results flow through one by one as they arrive, so downstream work begins immediately.

Ask yourself one question: "Can the next step produce a correct output with only partial data?" If yes, skip the barrier. If no, add one. Getting this wrong in either direction costs you: an unnecessary barrier serializes what could run in parallel, wasting time; a missing barrier corrupts results because downstream agents act on incomplete information.

Use a barrier when you are aggregating scores, joining datasets, writing a final summary, or any operation that is undefined on a subset.
No barrier needed when each result is independently actionable: translating documents, resizing images, sending individual notifications, or streaming partial answers to a user.
Partial barriers are also valid: wait for the first N results (a quorum), then proceed, discarding stragglers.

Key points

A barrier holds all downstream agents until every upstream agent finishes.
Skip the barrier when each result is independently actionable.
Unnecessary barriers serialize parallel work and waste time.
Quorum barriers (wait for N of M) are a valid middle ground.

Loop until dry

Some tasks have an unknown boundary: you do not know how many items exist until you have finished collecting them. Pagination, recursive directory scans, and iterative web crawls all share this shape. The right pattern is a dry-run loop: repeat a search or fetch round, collect new results, and stop only when a round returns nothing new.

In a multi-agent context (where several Claude instances hand work to each other), the orchestrator agent runs the loop and dispatches each batch to worker agents. The orchestrator tracks a seen set, a deduplicated collection of everything already processed, and compares each new round against it. When the set stops growing, the loop exits.

Claude Code supports this pattern through chained shell commands and subagent calls. A minimal loop in a Claude Code task looks like this:

Run a search or API call and capture the output.
Diff the output against the seen set.
If the diff is non-empty, add new items to the seen set, dispatch work, then go to step 1.
If the diff is empty, stop and report.

Two safeguards are mandatory: a max-rounds cap (for example, 50 iterations) to prevent infinite loops caused by bugs or API quirks, and idempotent workers (workers that produce the same result if they accidentally process the same item twice). Without these, a dry-run loop can run forever or corrupt results.

Key points

Dry-run loop: repeat until a round returns nothing new
Seen set: deduplicated record of already-processed items
Orchestrator dispatches; workers are idempotent
Always cap max rounds to prevent infinite loops

Worktrees for parallel agents

When you run multiple Claude Code agents at once, they all operate on the same repository files by default. If two agents edit the same file simultaneously, one will overwrite the other. Git worktrees solve this: a worktree is an additional working directory linked to the same repository, checked out at its own branch, so each agent gets isolated files with no overlap.

You create a worktree with git worktree add. Each worktree has its own branch and its own copy of the working files on disk. The agents run in separate directories and never touch each other's files. When their work is done, you merge the branches normally.

Claude Code supports this pattern directly. The /worktrees command (and the --worktree flag when launching a sub-agent) tells an agent which worktree path to operate in. The orchestrator agent creates the worktrees, assigns one to each sub-agent, then waits for all to finish before merging.

No file collisions: each agent writes only to its own directory.
No branch conflicts: each worktree is on its own branch.
Clean merge point: the orchestrator merges all branches after the agents report completion.
Easy cleanup: git worktree remove deletes the directory and deregisters it.

Key points

git worktree add creates an isolated working directory on a separate branch
each parallel agent is pointed at one worktree so files never collide
the orchestrator merges branches after all agents finish
git worktree remove cleans up when done

Dispatching parallel agents

When a job can be split into independent pieces, running those pieces one after another wastes time. Fan-out means launching several agents (or sub-processes) at the same moment, each handling a separate slice of the work, then collecting all results when they finish. Claude Code supports this pattern through the Agent tool, which lets one orchestrating agent spawn child agents.

The key rule is independence: tasks you fan out must not depend on each other's output. If task B needs task A to finish first, those two must stay sequential. Good candidates for fan-out include: auditing separate files, translating the same content into several languages, running the same prompt against different data sets, or fetching several URLs in parallel.

A typical fan-out workflow has three stages:

Decompose: the orchestrator breaks the goal into N independent sub-tasks.
Dispatch: it calls the Agent tool N times, one call per sub-task, without waiting between calls.
Collect: once all agents return, the orchestrator merges or summarises the results.

In Claude Code you can also fan out at the shell level using --print (non-interactive mode) and background processes, then join the outputs. This works well for simple tasks where you control the shell environment directly.

Key points

Fan-out: launching independent sub-tasks simultaneously instead of sequentially.
Orchestrator: the parent agent that dispatches and later collects child agents.
Independence check: fan-out only works when sub-tasks share no dependencies.
Collect phase: merging or summarising all agent outputs after they complete.

Scaling fan-out to a budget

A fan-out is when an orchestrator (the coordinating agent) spawns multiple sub-agents in parallel to tackle different parts of a problem at once. Each sub-agent consumes tokens, so the total cost of a fan-out run equals the sum of every agent's input and output tokens. Without a plan, costs compound fast.

The first lever is model selection per task. Not every sub-agent needs the most capable model. Assign claude-opus-4-8 only to tasks that require deep reasoning, such as architectural decisions or ambiguous analysis. Use claude-sonnet-4-6 for mid-complexity work like code generation, and claude-haiku-4-5 for high-volume, simple tasks like classification, formatting, or extraction. This alone can reduce a run's cost by 80 percent or more.

The second lever is context trimming. Each agent's input is billed in full. Pass only the slice of context that agent actually needs: a relevant file, a short summary, or a structured object rather than the entire conversation history. Prompt caching (reusing a shared prefix across agents) further cuts repeated-context charges when multiple agents share a large system prompt or reference document.

Practical budget controls to apply before launching a fleet:

Set max_tokens per agent to the minimum needed for that task type.
Cap the number of parallel agents: more concurrency raises cost without always raising quality.
Add a dry-run estimate step: count tokens in planned inputs before committing to a full run.
Use early termination: if an intermediate result already meets the success criterion, cancel remaining agents.
Log token usage per agent call and set a hard ceiling in the orchestrator loop.

Key points

Assign models by task complexity, not by habit
Trim each agent's context to only what it needs
Cap max_tokens and agent count before launching
Use prompt caching for shared prefixes across agents

Schemas for clean agent data

In a multi-agent pipeline (a chain of AI models passing results to each other), one agent's output becomes the next agent's input. If that output is free-form text, the receiving agent must guess at the structure, which causes silent errors. The fix is structured output: forcing the model to return data in a strict, machine-readable shape such as JSON.

Claude supports structured output through tool use. You define a JSON Schema (a formal description of the fields, types, and required properties you expect) and pass it as a tool definition. Claude then fills in that schema instead of writing prose. The result is a JSON object your code can parse and validate without any string manipulation.

Key reasons to enforce schemas in agent chains:

Reliability: downstream agents receive predictable keys and types, not ambiguous text.
Validation: you can reject or retry a response the moment a required field is missing, before bad data propagates.
Observability: structured logs are easier to search, diff, and monitor in production.
Composability: any agent that speaks the same schema can be swapped in or out without rewriting the pipeline glue code.

In Claude Code, the Claude API (the HTTP interface your agent calls programmatically) lets you pass a tools array with one tool whose input_schema defines exactly what you want back. Setting tool_choice to {"type":"tool","name":"your_tool"} forces Claude to call that tool every time, guaranteeing structured output on every request.

Key points

Structured output removes ambiguity between agents
JSON Schema defines the exact fields and types Claude must return
tool_choice forces a specific tool call on every request
Validate the schema immediately to catch errors before they propagate

Resuming and caching a workflow

A multi-agent workflow (a pipeline where several AI sub-agents handle different tasks in sequence) can be expensive to rerun from scratch every time you change one step. The solution is partial resumption: re-running only the steps whose inputs changed, and reusing the outputs of everything else.

Claude Code supports this through two complementary mechanisms. Prompt caching (an Anthropic API feature) stores the token-level computation for a long, stable system prompt or context block so the model skips reprocessing it on the next call. This cuts both latency and cost. Cache hits are billed at roughly 10 percent of the normal input-token rate. The cache is keyed by the exact prefix text, so even a single character change in the cached block invalidates it.

At the workflow level, you control resumption with checkpoints: saved outputs from each agent step written to disk or a store. When you rerun the pipeline, each step checks whether its checkpoint is still valid (inputs unchanged) before calling the model at all. Common patterns include:

Content hash check: hash the step inputs and compare to the hash stored with the checkpoint. Match means skip.
Timestamp check: skip if the checkpoint file is newer than every source file the step reads.
Explicit invalidation: pass a --from step-name flag to your orchestrator to force rerun from a named step onward.
Dependency graph: model which steps depend on which outputs; invalidate only the downstream steps when an upstream output changes.

In Claude Code, you can script this logic in a shell or Node orchestrator that calls claude with the --print flag (non-interactive, prints the response and exits) and writes each output to a file. On the next run, read the file first and skip the claude call entirely if the checkpoint is fresh.

Key points

Prompt caching cuts cost by reusing stable context across API calls
Checkpoints save each step output so only changed steps rerun
Hash or timestamp the inputs to decide whether a checkpoint is still valid
Use --print for non-interactive claude calls inside orchestration scripts

The completeness critic

In a multi-agent pipeline (a chain of AI agents each doing one job), the last bottleneck is rarely wrong content. It is missing content. A completeness critic is a final agent whose only job is to ask: "What should be here that is not?" It reviews the output of all previous agents against the original brief and flags gaps before the result reaches the user.

This agent is deliberately narrow. It does not rewrite, improve tone, or check facts. It only compares the scope of the brief against the scope of the output and returns a structured list of omissions. Keeping it narrow makes it fast, cheap (a Haiku-class model is usually enough), and easy to test.

Common things a completeness critic catches:

A section mentioned in the brief that never appeared in the output
An example promised in the introduction but never written
A constraint (word count, audience, language) that was silently dropped
An action item from a meeting summary that was paraphrased out of existence

The critic feeds its findings back into the pipeline as a structured diff (a machine-readable list of differences). A second pass agent, or the orchestrator itself (the agent that coordinates all other agents), then decides which gaps to fill, which to accept, and which to escalate to the human.

Key points

Completeness critic: agent that finds missing content, not errors
Scope diff: comparing what the brief requested vs. what was delivered
Narrow role keeps the critic fast and testable
Output is a structured list fed back to the orchestrator

Subagents now run in the background

As of Claude Code v2.1.198 (released July 1, 2026), subagents (helper Claude instances you delegate a sub-task to, such as "run the test suite and report back") run in the background by default. The main conversation keeps working while the subagent runs, and you get a notification when it finishes. That notification shows up in the claude agents view, a panel listing every subagent you have running or completed, so you can check status without interrupting your own work.

This is a real shift from the old mental model. Before this release, spawning a subagent blocked the main thread: you asked Claude to delegate work, and the whole session waited until that subagent returned a result before you could type anything else. That mental model is now obsolete. Since July 1, 2026, delegation is fire-and-continue by default: you can hand off a task to a subagent and immediately keep chatting, editing files, or launching a second subagent, while the first one runs in parallel.

The same release removed the /agents creation wizard, the old interactive command that walked you through creating a subagent step by step. As of July 1, 2026, there are two ways left to create a subagent: ask Claude in plain language to create one for you (for example, "create a subagent that reviews pull requests for security issues"), or edit the definition files directly under .claude/agents/ in your project. There is no more guided wizard between those two paths.

Agent teams (a feature for coordinating multiple subagents on a shared task) were overhauled earlier, on June 15, 2026 in v2.1.178. The TeamCreate and TeamDelete tools, which used to let you spin up or tear down a named team explicitly, were removed. Instead, every session now has one implicit team automatically: there is nothing to create or name. That team model added teammate plan approval (a step where a teammate's proposed plan must be approved before it executes) and team lifecycle hooks (scripts that fire automatically at points like team start or team end, letting you log or gate what happens).

Dynamic workflows (Claude Code's mechanism for automatically deciding how many subagents to spin up and how to sequence them for a given task) gained two user-facing controls after that. First, a "Dynamic workflow size" setting appeared in /config as of v2.1.202 on July 6, 2026, letting you cap or tune how large these automatic workflows are allowed to grow. Second, the same line of releases added OpenTelemetry attributes workflow.run_id and workflow.name. OpenTelemetry is a standard format for exporting traces and metrics that observability tools (like Grafana or Datadog) can ingest; with these two attributes attached, an orchestration run (one execution of a multi-subagent workflow) becomes traceable as a named, identifiable unit inside whatever standard observability stack your team already uses.

Practically, this changes three habits. First, lean into fire-and-continue delegation: for independent sub-tasks (a lint pass, a doc update, a background research query), hand them to a subagent and keep working rather than waiting idle. Second, still wait synchronously when the next step genuinely depends on the subagent's result, for example if you cannot write the summary until the research subagent returns its findings; background-by-default does not mean every step should run unattended. Third, adjust your review loop: instead of watching a blocking spinner, you now periodically check the claude agents view for completion notifications, which means reviewing finished subagent work becomes a distinct, deliberate step rather than something forced on you the moment a task finishes.

Key points

Since Claude Code v2.1.198 (July 1, 2026), subagents run in the background by default and notify you on completion in the claude agents view.
The /agents wizard was removed in the same release; create subagents by asking Claude or by editing files in .claude/agents/.
Agent teams (v2.1.178, June 15, 2026) dropped TeamCreate/TeamDelete for one implicit team per session, with plan approval and lifecycle hooks.
Dynamic workflow size (v2.1.202, July 6, 2026) plus OpenTelemetry attributes workflow.run_id and workflow.name make orchestration runs configurable and traceable.

Mass fan-out that actually finishes

A fan-out (spawning many agents at once to work on independent pieces of a job) sounds simple until you try it at scale. This course you are reading right now was built by a 166-agent fan-out, one agent per lesson, and getting that to actually finish (all 166 files landing on disk, correct and complete) required four hard-won rules. Break any one of them and the run either stalls, silently drops work, or burns money for no gain.

Rule 1: never ask one agent to produce an entire large deliverable in a single response. This is the anti-monolith rule. If you tell one agent "write all 166 lessons and return them to me," it will hit the per-message output limit (the maximum amount of text a model can generate in one reply) partway through, the response gets cut off, nothing gets written to disk, and the whole task looks impossible even though each individual lesson was easy. The fix is to fractionate by axis: one agent per language, one agent per file, one agent per lesson. Each agent writes its own file to disk using its own Write tool call, and the parent orchestrator only collects a tiny status line back ("done, p9l13.js, 3200 bytes") instead of the full content. The parent's context window never has to hold 166 lessons at once, only 166 one-line receipts.

Rule 2: know the concurrency ceiling before you plan the run. Orchestrators do not run unlimited agents in parallel. As of July 2026, Claude Code workflows run at most about 16 agents simultaneously, queuing the rest to start as slots free up, with a lifetime cap of 1,000 agents per workflow. That means a 166-agent job runs in roughly 10-11 waves of 16, not as one instant burst. Beyond roughly 80 agents launched in a single burst, server-side throttling (the infrastructure slowing or rejecting requests to protect itself) shows up even when using cheaper models like Sonnet or Haiku, not just the expensive ones. The practical move is to chunk big jobs into deliberate waves (for example, batches of 15-20) rather than firing everything at once and hoping the scheduler sorts it out.

Rule 3: verify coverage by artifact, not by agent claims. Agents miscount, especially under load: one may claim success while writing to the wrong path, skipping a file, or silently truncating. Never trust the stream of "done!" messages as proof the job is complete. Instead, after the wave finishes, diff the set of expected output keys or filenames (the list you planned before launch) against what actually exists on disk. For a 166-lesson run that means listing the expected 166 filenames and comparing against a directory listing. In practice this usually surfaces 1-2 gaps, not dozens, so the fix is to fill those by hand or with one targeted retry agent rather than relaunching the entire 166-agent job.

Rule 4: switch to the Batch API once a job is embarrassingly parallel and exceeds about 50 calls. "Embarrassingly parallel" means each unit of work is fully independent of the others, no shared state, no ordering requirement. The Batch API (Anthropic's asynchronous bulk-processing endpoint) runs on its own rate-limit pool, completely separate from your normal interactive quota, at a 50 percent discount versus standard pricing. The tradeoff is that results come back asynchronously (you submit the whole batch, then poll or wait for completion) rather than streaming back turn by turn. For a one-off 166-lesson run interactive fan-out was the right call because iteration speed mattered more than cost; for a recurring job of 50+ independent calls with no urgency, Batch API is the better default.

Two more habits pay off on every large run. First, route mechanical stages (formatting, extraction, simple rewrites, straightforward file generation) to cheaper models such as Sonnet or Haiku, and reserve the most expensive model for the stages that require judgment: reviewing quality, resolving conflicts, synthesizing final output. Second, always declare any silent caps out loud: if you only sampled the top 20 results, or processed the first 50 files and stopped, say so explicitly in the output. Nobody downstream should mistake a partial run for a complete one just because the summary reads confidently.

Key points

Never make one agent write an entire large deliverable in one response; fractionate by axis and have each agent write its own file to disk.
Orchestrators cap concurrency (about 16 agents at once, 1,000 lifetime per workflow as of July 2026); chunk bursts past roughly 80 agents into waves.
Verify by diffing expected files against what's actually on disk, not by trusting agent status claims; patch the 1-2 gaps by hand instead of relaunching everything.
Past about 50 independent API calls, consider the Batch API: separate rate-limit pool, 50 percent discount, asynchronous results.

Re-audit persisted workflows before relaunch

A workflow script that ran perfectly last week can quietly cost you five times more today, without a single line having changed. This lesson covers a failure mode specific to persisted orchestration: saved workflow scripts, reusable pipelines, any automation file that outlives the session that wrote it.

The mechanism is inheritance. In a workflow script, an agent() call without an explicit model option inherits the model of the session that runs it, not the model of the session that wrote it. Write the script during a session on a mid-tier model and the mechanical stages implicitly run cheap. Relaunch the same file next week from a session on the top-tier model, and every one of those stages now runs on the most expensive brain available: silently, with identical results on the easy work, at several times the cost and latency. Nothing errors. Nothing warns. The bill is the only symptom.

The author hit exactly this in July 2026: a persisted pipeline from a previous session, relaunched under a flagship-model session, sent a fleet of mechanical extraction agents to the flagship model because the script's early stages carried no explicit routing. The fix took two minutes; noticing it took longer. Hence the rule that generalizes:

Any persisted script gets re-audited before relaunch, even if it already ran. The checklist is short:

Every mechanical stage carries an explicit model (and a low effort where supported). Extraction, classification, formatting, translation: smallest model that passes.
Judge and verification stages carry an explicitly chosen strong model, upgraded in capability rather than multiplied in count.
Nothing relies on inheritance for anything that matters: model, effort, output paths, caps. Inherited defaults are context, and context changes between sessions.
Stated caps still match reality: a top-N bound or a skip-retry policy that made sense at authoring time may silently truncate today's larger inputs.

The deeper principle reaches beyond model routing: a persisted automation is a snapshot of assumptions. The session model, the quota landscape, the directory layout, the API's rate limits, even which files exist: all of it was true when the script was written and none of it is guaranteed today. Version-pin what must not drift (explicit options), and re-verify at relaunch what cannot be pinned (the environment). Treat a dusty workflow like a dusty deployment script: you would not point last month's deploy at today's production without reading it first.

A practical convention that makes the audit near-free: keep a header comment in every persisted workflow stating its routing table (which stages run on which model and why) and the date it was last audited. The relaunch check then takes thirty seconds of diffing the header against the code, instead of ten minutes of re-deriving intent from the pipeline structure.

Key points

agent() without an explicit model inherits the RUNNING session's model, not the authoring session's: a persisted script relaunched under a bigger model silently upgrades every unrouted stage
Re-audit every persisted workflow before relaunch, even if it already ran: explicit model + effort on mechanical stages, chosen strong model on judge stages
A persisted automation is a snapshot of assumptions (model, quotas, paths, caps): pin what must not drift, re-verify what cannot be pinned
Keep a routing-table header comment with a last-audited date in every persisted workflow: turns the audit into a 30-second diff

Work with me

Need this level of execution on your project?

I am Pierre Bottazzi. I built this entire course solo, end to end: 237 lessons in 3 languages, the app, the design, the SEO, the accounts system. That is what I do for clients too: web apps, mobile apps, AI automation, SEO/GEO. First call is free, no strings attached.

Contact me on LinkedIn See sept-tools.com (industry)See totemsauvage.com (art gallery)

Inspiration

Inspired by 0xloucash

One of my inspirations. Loucash (0xloucash) has a gift for always digging up the sharpest AI tips and tricks, then turning them into setups that actually work. With InstallClaw he configures your own OpenClaw AI agent, at your place, in 48 hours.

His Instagram InstallClaw