The Claude Bible
Home / Advanced prompt engineering
Level: Advanced · 12 lessons

Advanced prompt engineering

Chaining, tool use, multi-vendor composition (Frankenstein), citation, anti-hedging.

Open the interactive course212 lessons, quizzes, exercises, 3 languages, free.

Chaining prompts

A complex task run in a single monolithic prompt fails more often than a chain of simple prompts, each with one responsibility. The principle: split, and feed the output of one as the input of the next.

Example: "write a complete SEO article" gives something lukewarm. A chain gives something solid:

  1. Prompt 1: research and list the angles + keywords.
  2. Prompt 2: from that list, produce a detailed outline.
  3. Prompt 3: write section by section from the outline.
  4. Prompt 4: proofread and fix (a critic agent).

Each link is verifiable and fixable independently. It is also the mental basis of multi-agent workflows: a pipeline where each step is a specialized agent. Pierre's rule "Opus for architecture, Haiku for the repetitive" plays out here: you chain by assigning the right model to each link.

Key points
  • Split a complex task into a chain of simple prompts
  • Output of one link = input of the next; each link verifiable
  • Basis of workflows: a pipeline of specialized agents
  • Assign the right model to each link

Tool use: giving the model hands

Tool use (tool calling, or function calling) lets Claude call functions you define: query a database, hit an API, do a calculation, read a file. You describe the tool (name, description, parameter schema), Claude decides when and how to call it, you execute and return the result.

It is the engine of every agent (Claude Code, Cowork, MCP). Best practices for defining a tool, drawn from the best harnesses:

Structured output (forcing Claude to call a tool that validates a JSON schema) is the most reliable way to get program-usable data, better than parsing free text.

Key points
  • Tool use = Claude calls functions you define (the engine of agents)
  • Clear description + strict parameter schema
  • Don't name the tool to the user; parallelize if independent
  • Schema-validated structured output > parsing free text

Composing a system prompt: the Frankenstein case

Pierre's most advanced technique: composing a system prompt by assembling the best rules from several elite system prompts. His "Frankenstein" fuses eight sources (the Fable 5 roleplay, plus the disciplines of Cursor, GPT-5, Perplexity, Lovable, v0, Cline, Devin) and layers his own absolute rules on top, with priority.

It is not fine-tuning, it is pure prompt engineering: you don't change the model's weights, you change its behavior by instruction. Structure of the document, in descending priority:

  1. Identity and absolute user rules (override everything).
  2. Tool-use discipline (don't name tools, read before editing, max 3 attempts).
  3. Anti-hedging: an explicit list of forbidden openings and closings.
  4. Style, code rules, UI/UX directives, search, citation, refusal, error recovery, workspace safety.

Transferable lessons, even without copying his setup:

Key points
  • Composing a system prompt = assembling the best rules from several sources
  • Pure prompt engineering, no fine-tuning
  • Priority rules at the top, declared as priority
  • Forbid by explicit list > ask vaguely; encode each incident into a rule

Citation and anti-hedging in daily use

Two style disciplines that change the perceived quality, drawn straight from the Frankenstein.

Citation (Perplexity discipline), whenever you do research:

Anti-hedging (GPT-5 + Cline): ban empty openings ("Sure", "Of course") and opt-in closers ("would you like me to..."). At most one clarification question at the start if necessary, never at the end. If the next step is obvious, execute it rather than propose it.

Why it matters: hedging dilutes the signal and slows the reader. An answer that acts (or explains) directly respects the user's time. It is exactly the tone of this Bible, by construction.

Key points
  • Citation: inline brackets, one source per bracket, max 3, no References section
  • Anti-hedging: no empty opening or opt-in closer
  • At most one clarification question, at the start, never at the end
  • If the next step is obvious, execute it

Prompt chaining patterns

A single prompt has limits. When a task is complex, breaking it into a prompt chain (a sequence of linked prompts where each output feeds the next) produces far better results than cramming everything into one giant instruction. Each link in the chain has a narrow, well-defined job, which makes it easier to spot and fix mistakes.

The core technique is result passing: you take the output of step N and paste it (or inject it programmatically) as context into step N+1. In Claude Code, you can build chains inside scripts, using shell variables or files as the bridge between calls. In a chat session you simply copy the relevant part of the answer into your next message.

Common chain patterns worth knowing:

Keep each prompt in the chain atomic (doing one thing only) and include a brief context header at the top of each subsequent prompt so Claude is not starting cold. A chain of three focused prompts consistently beats one sprawling mega-prompt for accuracy and editability.

Key points
  • Prompt chain: a sequence of prompts where each output feeds the next step
  • Result passing: injecting a previous answer as context into the following prompt
  • Atomic prompt: a prompt with exactly one clearly bounded task
  • Draft-critique pattern: generate first, then review in a separate call

The tool use loop in depth

When you give Claude a tool (a function it can call, such as a web search or a code executor), the model does not just answer once. It enters a tool use loop: it decides to call a tool, reads the result, then decides what to do next, repeating until it can give a final answer. Each round trip is called a turn.

The sequence inside one loop iteration is always the same:

  1. Tool call: Claude emits a structured request naming the tool and its arguments (for example, {"name": "search", "input": {"query": "Claude Opus 4 release date"}}).
  2. Execution: Your code (or the host environment) runs the tool and returns a tool result block containing the output.
  3. Continuation: Claude reads the result as part of the conversation context and either calls another tool or produces a final text response.

Three things control how the loop behaves. The system prompt tells Claude what tools exist and when to use them. The tool definition (name, description, JSON schema for inputs) shapes whether Claude picks the right tool with the right arguments. The tool result you return must be clear and complete, because Claude cannot ask the tool a follow-up question: it can only call it again with different arguments.

Common failure modes: vague tool descriptions cause Claude to skip the tool or pass wrong arguments; truncated or error-free-looking results (when the real call failed) cause Claude to hallucinate the next step; and loops that never terminate happen when the tool keeps returning ambiguous output. A well-designed tool description is often more important than prompt length.

Key points
  • Tool use loop: call, execute, read result, repeat or finish
  • Tool definition quality controls argument accuracy
  • Tool result clarity prevents hallucinated follow-up steps
  • The host environment runs the tool, not the model itself

Structured outputs with a schema

A schema is a formal description of the shape you want your data to take: which fields exist, what type each field holds (string, number, boolean, array), and which fields are required. When you attach a schema to a Claude prompt, you are telling the model exactly what JSON (JavaScript Object Notation, a lightweight text format for structured data) to return, and nothing else.

Claude supports schema enforcement in two ways. First, you can describe the schema inside your prompt as a plain JSON object and instruct Claude to follow it. Second, when calling the Anthropic API directly, you can use tool use (also called function calling): you define a tool whose input schema matches the object you want, then instruct Claude to call that tool. The API guarantees the response fits the schema, so you get machine-readable output without parsing free text.

Even with schema enforcement, outputs can still fail validation in edge cases: a required field may be null, a number may arrive as a string, or an enum value may be misspelled. A robust pipeline therefore adds a validation and retry loop: parse the JSON, run a validator (such as a JSON Schema library), and if it fails, send the error message back to Claude in a follow-up turn so it can correct only the broken fields.

Key principles for reliable structured output:

Key points
  • A schema defines the exact shape (fields, types, required status) of the JSON you want back.
  • Tool use (function calling) enforces schema compliance at the API level.
  • Always validate the output programmatically and retry with the error message if it fails.
  • Flat schemas with explicit enum values produce the fewest errors.

Writing evals

An eval (short for evaluation) is a small, structured test you run against your prompt to measure whether it reliably produces the output you want. Without evals, you are guessing: a prompt that works on one example might silently break on ten others.

The core idea is to build a test set: a fixed collection of inputs paired with the expected outputs (or a scoring rule). You run every test case through your prompt and track the pass rate. When you revise the prompt, you run the test set again and compare scores. This turns prompt improvement from intuition into measurement.

A minimal eval has three parts:

Even a five-row spreadsheet beats zero structure. Start small, add cases each time a real user finds a bug, and never remove a case once it catches a regression (a previously working behavior that breaks after a prompt change).

Key points
  • Eval: a repeatable test set that scores prompt quality
  • Test set: fixed inputs paired with expected outputs or scoring rules
  • Pass rate: fraction of cases where the output meets the criteria
  • Regression: a behavior that worked before and silently breaks after a prompt change

Meta-prompting

Meta-prompting means using a language model (LLM) to write, critique, or improve a prompt, rather than writing that prompt entirely by hand. The idea is recursive: the model becomes a collaborator in shaping the instructions it will later follow.

This technique is useful when you are stuck on phrasing, when a prompt works but you suspect it could work better, or when you need to generate many prompt variants quickly. The model has seen enormous amounts of text about how models respond, so it can often spot weaknesses you would miss.

A basic meta-prompt has three parts:

You can go further by chaining steps: first ask the model to critique, then ask it to rewrite based on its own critique, then ask it to generate three variations ranked by expected clarity. Each step costs tokens but narrows in on a stronger prompt without you having to guess what is wrong.

Key points
  • Meta-prompting: a prompt whose job is to improve another prompt.
  • Include context, the draft, and a specific instruction.
  • Chain critique then rewrite for sharper results.
  • Treat the output as a draft, not a final answer, and test it.

Guardrails and validation

A model can produce fluent, confident output that is factually wrong, structurally broken, or unsafe to act on. Guardrails are checks you add between the model's raw output and any action that consumes it. They turn a blind trust in the model into a controlled pipeline.

The simplest guardrail is a format check: verify that the output is the shape you asked for (valid JSON, a specific number of items, no forbidden strings) before passing it downstream. A second layer is semantic validation: ask a second, cheaper model call to judge whether the answer is coherent, on-topic, or safe. This is sometimes called an LLM-as-judge pattern.

In Claude Code (the CLI and IDE coding agent), you can chain validation steps in a shell pipeline or a script. Common approaches include:

Guardrails add latency and cost, so apply them proportionally. High-stakes actions (sending an email, writing to a database, deploying code) deserve hard checks. Low-stakes actions (drafting a summary for human review) can rely on a lighter touch or none at all.

Key points
  • Guardrails check model output before it is acted on
  • Format checks and schema assertions are the first line of defense
  • LLM-as-judge uses a second model call to validate the first
  • Apply stricter guardrails to irreversible or high-stakes actions

Self-consistency and voting

Most prompting strategies send one request and trust the first answer. Self-consistency breaks that assumption: you sample the same question several times, let the model reason independently each time, then pick the answer that appears most often. That majority vote is statistically more reliable than any single reply, especially on math, logic, and multi-step reasoning tasks.

The core idea comes from a 2022 paper (Wang et al.) showing that language models do not always land on the same reasoning path twice. Some paths are wrong. If you run the same prompt five times and four runs agree, the odds that all four share the same error are low. Voting (also called majority aggregation) exploits that independence.

When to use it:

Trade-off: cost and latency multiply by the number of samples. Use temperature (the randomness knob, where 0 is deterministic and 1 is creative) above 0 so each sample diverges. A value around 0.7 works well. Then parse the answers programmatically and count the most common one.

Key points
  • Self-consistency: sample the same prompt N times, take the majority answer
  • Temperature above 0 is required so each run produces a different reasoning path
  • Voting filters noise without changing the model or the prompt
  • Cost scales linearly with sample count, so reserve this for hard or high-stakes questions

Adversarial self-check

An adversarial self-check is a technique where you ask the model to argue against its own answer immediately after it produces one. Instead of treating the first response as final, you prompt the model to act as a critic and find flaws, gaps, or errors in what it just said. This exploits the model's reasoning ability to catch mistakes that a single forward pass often misses.

Why does this work? Language models (LLMs, meaning large language models trained on text) are prone to confirmation bias in generation: once a reasoning chain starts in one direction, each token makes the next token more likely to continue that direction. A separate critic pass resets that momentum and can surface contradictions, missing edge cases, or overconfident claims.

There are two main forms of adversarial self-check:

After the critic produces objections, you run a synthesis pass: ask the model (or judge yourself) which objections are valid, then request a revised answer that addresses only the valid ones. Three turns total: generate, critique, synthesize.

Key points
  • Adversarial self-check catches errors that a single response misses
  • Use an inline devil's advocate section or a separate critic message
  • Reset confirmation bias by treating the critic as a fresh perspective
  • Always run a synthesis pass to filter valid objections from noise
Work with me

Master Claude, Claude Code and LLMs, from your first prompt to multi-agent orchestration.

Like this course? I built it end to end. Need a web app, mobile app, AI automation or SEO/GEO? Let us talk.

Contact me on LinkedInSee a site I built