Tier 6

Tier 6 — Orchestration: Run many agents at once, to ship more work in parallel

Reach for this tier when one agent is too slow or floods its own context, and the build is too big for one pass. The next gain is parallelism — many subagents on independent slices, specialist roles, and a second model catching what the first missed.

The model toolkit — bring in more models (the multi-model playbook)

Using more than one model is usable at any tier — but you lean on it hardest here (per-subagent models, racing agents across labs, specialist roles), which is why it leads off Tier 6.

It works for one reason: different models have different blind spots, so a second model catches what the first missed.

Reach for it when a task is high-stakes, decomposable, or verifiable; skip it when one strong model already clears the bar (see "when not to" at the end).

You met the basics in Tier 2 — opusplan (smart model plans, cheap model builds) and the spec cross-check. This is the rest of the toolkit, in two levels — get fluent with the first before reaching for the second:

Within the Anthropic family (Opus / Sonnet / Haiku) — explicit model choice, per-subagent models, opusplan, advisor. Fully supported, built in, no extra infra. This is where most of the value is.
Beyond Anthropic — other frontier labs (GPT-5.5-class, Gemini 3.x), open-weight models (DeepSeek/Qwen/Llama-class), or models you self-host. More power and more cost/privacy control, but you leave the natively-supported path and take on compatibility + governance work.

Level 1 — Multiple Anthropic models inside Claude Code

The native path: built in, fully supported, costs nothing but a flag or a frontmatter line. Default division of labor: Opus for reasoning/planning/judgment, Sonnet for implementation, Haiku for search/explore/classification.

Beyond the /model switching and opusplan you already saw in Tier 2, two moves add real leverage:

Assign a model per subagent. (see Primitives)

Instead of: every subagent inheriting your expensive main model.

Prefer: set model: in each .claude/agents/*.md — routing a grep-heavy explorer to Haiku instead of Opus is free money.

How: per-subagent frontmatter, or a session-wide default for everything set to inherit:

# .claude/agents/explorer.md
---
name: explorer
description: Search the repo and report which files matter. Use proactively.
tools: Read, Grep, Glob
model: haiku        # cheap + fast for bulk search
---

# default for all subagents that use `model: inherit`
export CLAUDE_CODE_SUBAGENT_MODEL="claude-sonnet-4-6"

Keep a stronger model on call during execution — the advisor.

Instead of: paying for a top model for the whole run, or letting a cheap one guess at the hard moments.

Prefer: run a cheaper main model and let it consult a stronger one only at decision points (before committing to an approach, when stuck on a recurring error, before declaring done).

flowchart LR
  E[Executor · Sonnet/Haiku<br/>drives task, calls tools] -->|hard decision| Ad[Advisor · Opus<br/>short plan, no tools]
  Ad -->|guidance| E
  E --> R[Result]

How: claude --advisor opus with a Sonnet/Haiku main (or the /advisor command / advisorModel setting). Anthropic's own numbers: Sonnet + an Opus advisor beat Sonnet alone by 2.7 pts on SWE-bench Multilingual while cutting cost per task ~12% — the advisor reads the shared context and writes only a short plan/correction (~400–700 tokens), never tools or user-facing output. (Needs Claude Code v2.1.98+ on the Anthropic API.)

Keep the fleet alive when a model is overloaded — fallback chains.

Instead of: a run that dies on a single overloaded / unavailable response.

Prefer: declare an ordered fallback so the turn retries on the next model instead of failing.

How: claude --fallback-model sonnet,haiku (or "fallbackModel": ["sonnet","haiku"] in settings) — tried in order on overload, capped at 3, lasts the turn. This is a reliability lever (availability), distinct from the advisor's quality/cost lever — both matter once runs go unattended (Tier 7–8).

Level 2 — Reaching beyond Anthropic (other frontier, open-weight, self-hosted)

The advanced case. The payoff is real — a different lab's blind spots aren't your model's blind spots, and open-weight/self-hosted models can cut cost or keep code on your own hardware — but you leave Claude Code's fully-supported native path.

Do this once Level 1 is second nature.

Have a different lab review the diff. (the cross-model form of the review-agent pattern in Tier 8)

Instead of: the model that wrote the code reviewing its own code — it carries the same blind spots into the review.

Prefer: Claude writes, Codex (GPT-5.5-class) or Gemini reviews the diff. A different training distribution catches bug classes self-review misses. Loop until the reviewer comes back clean — a human still owns the merge.

Practitioner reports are consistent: a cross-lab reviewer routinely finds real bugs (logic errors, injection vulns) the authoring model missed. Two models agreeing is strong signal, not proof.

How to wire non-Anthropic models into Claude Code. Claude Code speaks the Anthropic Messages API, so anything you route to must present an Anthropic-compatible endpoint. Three ways, easiest first:

OpenRouter (managed, hosted models). One key in front of every provider — automatic failover, one billing dashboard, per-key budget caps. Its "Anthropic Skin" is wire-compatible, so no proxy is needed. Swap the env vars:

# ~/.zshrc or ~/.bashrc
export OPENROUTER_API_KEY="sk-or-..."                 # from openrouter.ai/keys
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
export ANTHROPIC_API_KEY=""                            # MUST be explicitly empty
# pick models per role with OpenRouter slugs (copy exact slugs from openrouter.ai/models):
export ANTHROPIC_MODEL="openai/gpt-5.5"                     # main
export ANTHROPIC_SMALL_FAST_MODEL="anthropic/claude-haiku-4-5"  # background/fast
# suffixes: :floor = cheapest provider · :nitro = fastest · :free = free tier

Restart, then /status should show ANTHROPIC_AUTH_TOKEN and base URL https://openrouter.ai/api.

Gotchas: if you were logged in with an Anthropic account, run /logout and restart (else confusing model-not-found errors); a real ANTHROPIC_API_KEY left in your shell wins until you restart the terminal; for the GitHub Action, pass the key as anthropic_api_key and set ANTHROPIC_BASE_URL in the step env.

Claude Code Router / LiteLLM (rules-based, incl. open-weight & self-hosted). A local gateway that accepts Anthropic Messages and forwards to OpenAI-compatible or local backends, with rules per task type (background / think / long-context). This is also how you reach self-hosted / open-weight models — point the gateway at Ollama, vLLM, or LM Studio and keep code on your own hardware (on-prem or air-gapped). The gateway does the protocol translation Claude Code needs; you run the infra.

When NOT to (and the Level-2 tax). More models means more latency, cost, and aggregation seams.

Specifically for Level 2: non-Anthropic models routed natively may lose tool use, extended thinking (Anthropic-only), or prompt caching — test on a real task, and discount "replace Claude with a free model" hype for production.

Open-weight/self-hosted models still trail the frontier on agentic coding, and self-hosting is real ops. There's also a trust boundary: routing through a third party (or a non-Anthropic model) means that provider's safety characteristics and data handling, not Anthropic's — keep sensitive code on Anthropic or your own infra.

And before assembling any multi-model "council," try the cheaper trick first: run your best model a few times and pick the best answer — research (Self-MoA) finds that often beats mixing in weaker models, which can drag the average down.

Multi-model is a scalpel for decomposable, verifiable, high-stakes work — not a default.

The orchestration tips

45. Let it self-orchestrate big, parallel work — it fans out faster than you can by hand.

Instead of: manually spawning and merging a dozen subagents.

Prefer: /effort ultracode → "audit every endpoint for missing auth, fix it, prove each fix with a test." It fans out parallel subagents and self-verifies; you spot-check the report.

46. Use subagents to isolate context.

Instead of: a giant exploration that floods your main thread.

Prefer: "subagent: trace how auth is wired across the repo; return a summary plus the 3 files I should read." (Low effort for batch subagents to control cost.)

47. Race several agents on the same task; keep the winner.

Instead of: betting a high-value task on one agent's single attempt.

Prefer: run the same spec in several parallel git worktrees — and vary the model across them (Claude, Codex, Gemini) so their blind spots differ — then diff the results and merge the best. Budget ~2–4× the cost; reserve it for work where being right beats being cheap.

How:

git worktree add ../try-a -b try/a   # repeat for try/b, try/c
# run one agent per worktree (vary the model), then compare the diffs

Or assign the same issue to several agents in GitHub Agent HQ (Tier 8) and pick the winning PR.

48. Decompose complex builds into specialist roles — narrow context per role beats one big pass.

Instead of: one prompt that builds everything in one pass.

Prefer: architect → backend → frontend → tests → security-review, each a subagent with narrow context, reviewing each other.

flowchart LR
  A[Architect<br/>read-only] -->|approved plan| B[Backend]
  A --> F[Frontend]
  B --> T[Test-writer]
  F --> T
  T --> S[Security-reviewer<br/>read-only]
  S -->|findings| B

How: define each role as a .claude/agents/<role>.md (see Primitives) with a tight tool list — architect read-only (Read, Grep, Glob), backend gets + Write, Edit, Bash, security-reviewer stays read-only — then chain them in one ask:

Plan this with the architect subagent and STOP for my approval. Then hand the approved
plan to the backend subagent to implement, the test-writer subagent to cover it, and
the security-reviewer subagent to audit the diff. Each returns a summary.

49. Engineer the long-horizon hand-off — so the next session doesn't guess.

Instead of: "build the whole app" in one window (it runs out of context mid-feature and the next session guesses).

Prefer: an initializer session that writes a checklist; fresh sessions execute it one item at a time.

How:

# Session 1 — initializer (writes the plan, builds nothing)
"Scaffold the project and write PROGRESS.md: every feature as an unchecked checkbox,
 plus 'Current state' and 'Next step' sections. Do not implement yet."

# Session 2+ — workers (fresh context each time)
"Read PROGRESS.md. Run the smoke test first — assume the last session may have left
 the app broken, and fix that before new work. Then implement the next unchecked item,
 run its tests, check it off, update 'Next step', and commit. Then stop."

When compaction starts losing detail, /clear and let the next session rebuild from PROGRESS.md — a clean hand-off beats a degraded context. Keep the authoritative task ledger as JSON, not prose — Anthropic's long-running-agents write-up found models reliably "tidy" or overwrite Markdown but treat a features.json of {name, passes: false} (flip one boolean per session) as real data they won't casually rewrite. Prose hand-off in the doc; the status registry in JSON.

50. Steer long runs mid-flight instead of restarting them.

Instead of: killing a 40-minute run to change a permission, budget, or direction.

Prefer: inject a steering message while it runs.

How: interactively, just type the correction mid-run ("you're at ~60% context — stop exploring, implement the top finding only").

In an automated harness on the Agent SDK, push a system entry into the live messages array — Opus 4.8 accepts mid-conversation system messages, so you change course without a restart or a cache rebuild:

{ "role": "system", "content": "Permissions update: read-only from here. Summarize findings; make no further edits." }

51. Engineer the environment, not the wording.

Instead of: endlessly tuning the perfect prompt.

Prefer: invest in CLAUDE.md, Skills, MCP, git discipline, tests as the check, hooks, and CI. Every methodology converges on research → plan → execute → review → ship, human-gated — build the system that runs that loop. That system, not the prompt, is the product of professional agentic engineering.