The Claude Agent SDK

The Claude Agent SDK (Software Development Kit) is the official Anthropic library that lets you build programs where Claude acts autonomously: it reads inputs, decides which tools to call, executes them, reads the results, and loops until the task is done. This is different from a single chat message because the model drives a multi-step workflow, not just a single reply.

The SDK is published as @anthropic-ai/claude-code-sdk for JavaScript/TypeScript and mirrors a similar interface in Python. At its core it exposes a streaming function called query() that sends a prompt to Claude, streams back every assistant turn and every tool use block (each discrete action Claude decides to take), and lets your code react to each event in real time.

What you can build covers a wide range:

Headless coding agents: run Claude Code non-interactively inside a CI/CD pipeline (Continuous Integration pipeline, automated build-and-test system) to write, fix, or review code on every commit.
Custom IDE integrations: embed Claude inside any editor by piping its output into your own UI instead of the terminal.
Multi-agent pipelines: one Claude instance acts as an orchestrator that spawns sub-agents, each with narrower permissions and a specific subtask.
Approval workflows: intercept every tool call before execution so a human (or another model) can approve, reject, or modify it.

The SDK enforces the same permission model as interactive Claude Code: you declare which tools are allowed (file reads, shell commands, MCP servers) and the agent cannot exceed that scope. This makes it safe to run in automated environments where no human is watching.

Key points

The SDK lets code control Claude as an autonomous agent, not just a chatbot.
query() streams assistant turns and tool-use blocks your program can intercept.
Headless mode runs Claude Code without a terminal UI, ideal for CI pipelines.
Permission declarations keep the agent inside a safe, declared scope.

Anatomy of an agent loop

An agent loop is the repeating cycle that lets an AI model do multi-step work autonomously. Instead of answering once and stopping, the model runs through four phases over and over until the task is done or it decides to stop.

The four phases are:

Perceive: the model reads its current context (your goal, previous results, tool outputs, memory).
Decide: it chooses the next action (call a tool, write a file, ask a clarifying question, or declare done).
Act: it executes that action, for example running a shell command or editing a file.
Observe: it reads the result (stdout, error message, file diff) and adds it to context, then loops back to Perceive.

Claude Code (the CLI and IDE coding agent) is built around this loop. Each iteration is called a turn. The loop ends when the model emits a final text reply instead of another tool call, or when a stop condition you set is met (token budget, max turns, or an explicit exit).

Understanding the loop matters because every failure you will ever debug in an agent, whether it spins forever, gives up too early, or misses a step, traces back to one of these four phases going wrong.

Key points

Agent loop: perceive, decide, act, observe, repeat
Each iteration is called a turn
The loop ends on a final text reply or a stop condition
Bugs trace to one broken phase in the loop

Defining tools an agent can trust

A tool is a function you expose to an agent so it can take actions beyond generating text: querying a database, calling an API, reading a file. The agent decides when to call each tool based entirely on the information you give it in the tool definition. If that definition is vague, the agent guesses and often guesses wrong.

Every tool definition has three parts the agent reads before deciding whether to use it:

Name: short, unambiguous, in snake_case (for example get_order_status rather than tool1 or fetch). The name alone should hint at what the tool does.
Description: one or two sentences explaining what the tool does, when to use it, and when NOT to use it. This is the most important field. The model reads it as reasoning context, not just documentation.
Schema: a JSON Schema object listing every parameter, its type, whether it is required, and a short description of each field. Ambiguous or missing parameter descriptions cause silent hallucinations where the agent fills gaps with plausible-looking values.

A well-defined tool is self-contained: another developer (or another model) should be able to read the definition alone and know exactly when and how to call it. Treat the description as a contract, not a comment.

Key points

Tool name must be unique and self-explanatory
The description tells the agent when to call and when to skip
JSON Schema parameters need their own descriptions, not just types
Ambiguous definitions cause the agent to hallucinate parameter values

Headless automation in scripts

Headless mode means running Claude Code without any interactive prompt: no keyboard, no terminal waiting for you. You pipe input in, Claude processes it, and your script reads the output. This is how you embed Claude inside CI pipelines, cron jobs, or any automated workflow.

The key flag is --print (short: -p), which tells Claude Code to print the final answer and exit immediately. Combine it with --output-format json to get structured output your script can parse reliably. Use --model to pin a specific model id so your pipeline never silently upgrades.

A few flags matter in automation:

--print: non-interactive, print result and exit.
--output-format json: wrap the response in a JSON envelope with result, cost, and session_id fields.
--model claude-sonnet-4-6: pin the model (options: claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5).
--max-turns N: cap the number of agentic turns so a runaway loop cannot spin forever.
--no-ansi: strip color codes so log files stay clean.

Stdin works too: pipe a file or a generated prompt directly into claude. The process exits with code 0 on success and a non-zero code on error, so your shell or Node script can branch on failure the normal way.

Key points

--print flag exits after one response
--output-format json for machine-readable output
Pin model with --model to avoid silent upgrades
Exit code signals success or failure to the shell

Claude in GitHub Actions

GitHub Actions is a CI/CD platform (Continuous Integration / Continuous Delivery) built into GitHub. Every push, pull request, or scheduled trigger can run a workflow, a YAML file that executes steps inside a container. Claude can be one of those steps, turning a human-level review into an automated gate that runs on every pull request without waiting for a teammate.

The official entry point is claude-code-action, an open-source GitHub Action published by Anthropic. You add it to your workflow YAML, pass your ANTHROPIC_API_KEY as a secret, and the action spins up Claude Code in a headless container. Claude reads the diff, your repo files, and any instructions you provide, then posts its findings as a PR comment or sets a failing check status.

Common automation patterns in CI include:

PR review on every open or update: Claude reads the diff and posts a structured code-review comment.
Security scan gate: Claude checks for hardcoded secrets or obvious injection risks and fails the check if found.
Auto-labelling: Claude reads the diff and applies GitHub labels such as bug, feature, or docs.
Release notes draft: On a push to main, Claude summarises all merged PR titles into a changelog entry.

Model choice matters for cost and speed. The claude-haiku-4-5 model (the fastest, cheapest tier) handles labelling and short summaries well. claude-sonnet-4-6 is the recommended default for full PR reviews. claude-opus-4-8 is best reserved for deep security audits where precision outweighs the higher cost per token.

Key points

claude-code-action runs Claude Code headless inside a GitHub Actions container
Pass ANTHROPIC_API_KEY as an encrypted repository secret, never hardcode it
Match model to task: Haiku for labelling, Sonnet for reviews, Opus for deep audits
Claude posts results as PR comments or sets a check status to block merges

Scheduled cloud agents

The /schedule skill lets you create cloud agents (also called routines) that run Claude Code on a recurring schedule without you being present. The agent executes in the cloud, reads your repo or files, performs work, and can commit results or send notifications, all on a cron schedule (a time-based trigger format like "every day at 9 am").

You invoke the skill in Claude Code by typing /schedule followed by a plain-English description of what you want done and when. Claude translates that into a scheduled routine stored in your account. You can also use it for one-time future runs ("run this once at 3 pm") without any recurring pattern.

Common use cases for scheduled cloud agents include:

Daily SEO or performance audits that commit a report to your repo each morning
Weekly dependency checks that open a pull request when updates are found
Hourly monitoring of an API or service that alerts you on anomalies
Nightly data pipeline jobs that process logs and update a dashboard file

You manage your routines with the same skill: list them to see what is scheduled, update a routine to change its timing or instructions, or delete one when it is no longer needed. Each routine runs as a full Claude Code agent session, so it has access to tools, can read and write files, and can call external services within your configured permissions.

Key points

/schedule skill creates recurring cloud agent runs
Cron syntax defines the timing (e.g. daily, weekly, hourly)
Agents run headlessly: no human in the loop during execution
Use list, update, and delete to manage existing routines

The /loop pattern

The /loop skill in Claude Code lets you run a prompt (or another slash command) repeatedly, either on a fixed time interval or at a pace the model itself decides between iterations. Think of it as a cron job (a scheduled repeating task) built directly into your Claude session, with no external scheduler required.

To start a loop, type /loop followed by an optional interval and a prompt or command. If you omit the interval, Claude self-paces: it finishes one run, decides how long to wait based on context, then fires again. This is useful for monitoring tasks where the right cadence depends on what the model finds.

Common use cases for /loop:

Polling a deploy log every 2 minutes until a keyword appears
Running a linter or test suite on a recurring interval while you edit
Checking an API endpoint for a status change without writing a shell script
Executing a custom slash command (like /babysit-prs) on a schedule

To stop a running loop, use /stop or press Ctrl+C. Each iteration is a normal Claude Code turn, so the model has full access to tools (file reads, shell commands, web fetch) on every cycle.

Key points

/loop runs a prompt or slash command repeatedly inside a Claude Code session
Specify an interval (e.g. 5m) or omit it for self-paced execution
Each iteration is a full agent turn with access to all tools
Use /stop or Ctrl+C to cancel the loop

Agents in isolated worktrees

A git worktree is a second (or third, or fourth) checked-out copy of the same repository, living in its own directory on disk, sharing the underlying git object store. Each worktree has its own working files and its own branch, so changes in one worktree cannot touch another until you explicitly merge.

When you run multiple Claude Code agents in parallel, each agent mutating the same files on the same branch is a recipe for conflicts. The safe pattern is one agent, one worktree, one branch. The agents work in complete isolation; you review and merge when they finish.

Claude Code ships two built-in slash commands for this workflow:

/worktree create <name> creates a new worktree and a matching branch, then drops the agent into it.
/worktree list shows every active worktree and which agent (if any) is running inside it.
/worktree remove <name> deletes the worktree directory after the agent finishes.
Standard git also works: git worktree add ../repo-feat-a feat/a creates a worktree manually if you prefer.

The payoff is safe parallel mutation: three agents can refactor three different modules at the same time, each on its own branch, with zero risk of one agent overwriting another's in-progress edits. You collect their pull requests and merge in sequence.

Key points

One worktree per agent, one branch per worktree
/worktree create launches an isolated agent environment
Agents share the git object store but not the working tree
Merge branches sequentially after agents finish

Production MCP and remote tools

The Model Context Protocol (MCP) is an open standard that lets an AI agent call external tools, read resources, and receive structured data, all through a common interface. Instead of hardcoding API calls inside a prompt, you expose them as MCP tool definitions that any compliant agent can discover and invoke at runtime.

A production MCP setup separates concerns clearly. The MCP server owns the connection to your backend (database, REST API, file system). The MCP client inside Claude Code reads the server manifest, knows which tools exist and what arguments they expect, and decides when to call them. Claude never sees your credentials directly; the server handles auth and returns only the data the agent needs.

Registering a remote MCP server in Claude Code requires a single entry in your project or global settings. The server can run locally or on a remote host over stdio (standard input/output, for local processes) or SSE (Server-Sent Events, the HTTP streaming transport used for remote servers). Key reliability practices include:

Schema validation: every tool must declare its JSON Schema so the agent can form correct calls and reject bad responses early.
Idempotent tools: write operations should be safe to retry, because the agent may call a tool again after a timeout.
Scoped permissions: list only the tools the agent needs for a given workflow; a smaller surface means fewer mistakes and easier audits.
Structured error responses: return machine-readable error objects, not plain strings, so the agent can decide whether to retry, escalate, or stop.

Key points

MCP server exposes tools; MCP client (Claude Code) calls them
Use SSE transport for remote servers, stdio for local processes
Declare strict JSON Schema per tool to prevent malformed calls
Return structured errors so the agent can handle failures correctly

The parallel execution mindset

Most people give an agent one task, wait for the answer, then give the next task. That is serial thinking, and it is slow. The parallel execution mindset treats your goal as a tree: break it into independent branches, run all branches at once, then merge the results.

The four-step pattern is decompose, fan out, verify, synthesize. Decompose means splitting the goal into sub-tasks that share no blocking dependency on each other. Fan out means launching all of them simultaneously, either by asking Claude to spawn sub-agents or by sending multiple calls yourself. Verify means checking each result before trusting it (spot wrong answers early, not after synthesis). Synthesize means combining the verified outputs into the final deliverable.

In Claude Code you drive this with the --dangerously-skip-permissions flag for non-interactive runs, or with the Task tool inside an agent prompt, which lets one Claude instance spawn parallel sub-agents. The Batch API (Anthropic's dedicated endpoint) is the right layer when you need hundreds of independent calls at 50 percent cost and without touching your per-minute rate limit.

Common patterns where this mindset pays off:

Research fan-out: send the same question to multiple search queries at once, then reconcile.
Multi-file audit: assign each file or module to a separate sub-agent, merge findings.
Translation pipeline: translate to four languages simultaneously instead of sequentially.
Adversarial verify: after a sub-agent writes an answer, a second sub-agent critiques it before synthesis.

Key points

Decompose into dependency-free sub-tasks before assigning work
Fan out: launch all independent sub-tasks simultaneously
Verify each branch output before merging
Synthesize: combine verified results into one coherent deliverable

Adversarial verification at scale

When a single AI call produces a finding, you cannot know whether it is correct, hallucinated, or biased by the phrasing of your prompt. Adversarial verification solves this by running multiple independent agents on the same task and then reconciling their outputs, so errors cancel out instead of propagating silently.

The core pattern is a judge panel: you send the same question (or the same piece of evidence) to several Claude instances, each with a slightly different system prompt or temperature setting (temperature controls how random the model's word choices are). Each judge returns a verdict. A majority vote aggregator then picks the answer that appears most often. If the panel is split, the system can escalate to a stronger model such as claude-opus-4-8 for a tie-break, rather than blindly accepting any single answer.

At scale this becomes a pipeline. A fan-out step dispatches one task to N parallel agents simultaneously, using Claude Code's --dangerously-skip-permissions flag or a headless batch script to avoid interactive prompts. A reducer step collects all responses and applies the voting rule. The reducer itself can be a Claude call with a strict prompt that only counts explicit verdicts, ignoring hedged language.

Key design choices for a reliable panel:

Odd panel size: use 3 or 5 judges to avoid ties on binary questions.
Prompt diversity: vary wording across judges so they are not all fooled by the same framing.
Structured output: force each judge to return a machine-readable label such as PASS or FAIL before any explanation, so the reducer parses cleanly.
Confidence weighting: optionally ask each judge for a 0-to-10 confidence score and weight the vote accordingly instead of treating all votes equally.

Key points

Judge panels run the same task on multiple agents to catch errors
Majority voting picks the most common verdict across judges
Prompt diversity prevents all judges from sharing the same blind spot
Structured output labels make the reducer reliable and fast

When not to use agents

An agent is a loop: the model plans, calls tools, reads results, and repeats until the task is done. That loop costs time, tokens, and introduces failure points at every step. Many tasks do not need it.

A one-shot task is one where you can supply all necessary context upfront and the model can return a complete, correct answer in a single call. Wrapping that in an agent adds overhead without benefit. Use the simplest tool that gets the job done.

Reach for a plain prompt (no agent, no tools) when the task fits any of these:

The answer requires no external data beyond what you paste in.
No file system, browser, or API access is needed.
The output is short enough to review in one read.
Retrying on failure is cheaper than building retry logic into an agent loop.
Latency matters and a round-trip agent loop would feel slow.

A useful rule: if you could answer the question yourself given a search result or a pasted document, a single Claude.ai chat or a claude CLI call with piped input is enough. Reserve Claude Code agents and multi-step pipelines for tasks that genuinely require planning across many unknown steps.

Key points

Agents add cost and latency; one-shot calls are often enough.
Use an agent only when the number or identity of steps is unknown upfront.
Self-contained text tasks (summarize, translate, classify) are one-shot by nature.
Simpler pipelines are easier to debug and cheaper to run.

Agents and automation

The Claude Agent SDK

Anatomy of an agent loop

Defining tools an agent can trust

Headless automation in scripts

Claude in GitHub Actions

Scheduled cloud agents

The /loop pattern

Agents in isolated worktrees

Production MCP and remote tools

The parallel execution mindset

Adversarial verification at scale

When not to use agents

Need this level of execution on your project?

Inspired by 0xloucash