Tier 8

Tier 8 — Agent Execution Layer: Put agents into production, so they work without you

Reach for this tier when the team needs it — agents must pick up tickets and open PRs without anyone babysitting a terminal.

While running a local loop is perfect for individual spec-driven development, enterprise agentic loops require a dedicated Agent Execution Layer. It's the layer beyond your own server (Tier 7): loops run hosted, isolated, and event-triggered.

Don't be scared — it's the same loop you already built. Everything from Tier 4 (act → test → fix against a Definition of Done) and Tier 5 (commit, gh, CI) is unchanged.

The execution layer only answers three new questions: where does it run, what triggers it, and how do many agents avoid colliding?

You adopt it gradually — start fully managed, self-host only when you need the control.

Start here — a real execution layer with zero infra

You don't build anything to begin. The managed on-ramps give you a production loop on Monday:

GitHub. Run /install-github-app, then @claude in any issue or PR to implement or fix; switch on automatic Code Review to get a review on every PR. On Agent HQ you can assign Claude (or Codex/Copilot) to an issue and get a draft PR back — reviewed like a teammate's, no new dashboard to learn.
Linear. Assign an issue to the Linear agent and it runs a cloud coding session (powered by Claude Code/Codex), grounded in the ticket, history, and linked context, and returns a diff for review. (Linear resolves ~30% of its own incoming bugs this way, mostly first-pass.) Want your own harness instead? Cyrus (open-source) makes Claude Code an assignable Linear agent via Linear's Agent API.
Anthropic-managed. Claude Code on the web runs sessions in isolated cloud VMs; Routines run scheduled jobs on Anthropic's infra with no local machine.

Wire one trigger, keep a human approval gate, and you have an execution layer. That's the whole "try it" step.

The stack — what it's made of when you build your own

When you outgrow managed (control, data residency, custom tools), the layer has four parts:

Layer	Job	Options
Runtime sandbox	Run untrusted agent code + tests in throwaway isolation, so a runaway loop burns a cheap container, not prod	E2B (Firecracker microVMs), Modal (serverless + GPU), Daytona / Northflank (compliance, BYOC); Claude Code's own `/sandbox` and web VMs
Connectivity (MCP)	Let the agent read Slack / Notion / Postgres / Sentry without hardcoded creds	MCP servers (Tier 3)
Orchestration / state	Track which phase each agent is in; coordinate many	LangGraph, Claude Agent SDK, Warp Oz, Claude Code dynamic workflows
System of record	The ground truth + state machine — the agent forgets, the board remembers	Linear / GitHub issues, whose workflow states are the state machine

A trick worth stealing: make the tracker the agent's memory. The conversation is disposable; the Linear/GitHub workflow state (Todo → In Progress → In Review) is the durable state machine the loop reads on every event.

The loop — from ticket to merged PR

This is just Tier 4's loop, hosted and gated:

flowchart TD
  Tk[Ticket → Agent Todo] --> Pl[Plan: spec sub-agent]
  Pl --> G1{Human approves spec?}
  G1 -->|yes| Ex[Isolated execution<br/>sandbox + own worktree]
  Ex --> Loop[Build & test loop<br/>until green · capped ~3–5 strikes]
  Loop --> PR[Commit & open PR via gh]
  PR --> Rev[Review agent posts findings]
  Rev --> Fix[Execution agent fixes]
  Fix --> G2{Human merges}

Trigger & context. A human moves a ticket to "Agent Todo"; a webhook fires. The agent reads the ticket (MCP) and pulls only the relevant files (code intelligence) — not the whole repo.
Plan (spec sub-agent). A planning subagent drafts a short spec and posts it back to the ticket; the loop pauses for a human "Approve." Keep this gate.
Isolated execution. The execution agent spins up a sandbox, clones, and works in its own git worktree/branch — so ten agents on ten tickets never touch each other's files.
Build & test loop. It writes code and runs the suite in the sandbox. On failure the orchestrator feeds the raw error back; it loops until green, capped at ~3–5 strikes so a stuck agent can't burn tokens forever. (Your Definition of Done is the exit condition — and the cap is the give-up condition.)
PR. It commits and opens a PR via gh, under your identity so git blame stays useful.
Review agent. A separate review agent triggers on the PR (below), posts findings, and the execution agent fixes them before a human looks.

The review-agent pattern (most teams' first loop)

PR review is where to start: high value, low risk. The pattern:

Feed the diff, not the codebase. Pull only the changed lines + surrounding functions. Dumping the whole repo destroys the reviewer's context.
Load the rulebook. Inject your ARCHITECTURE.md or a code-review Skill ("always use the repository pattern for DB access") so it reviews against your standards.
Prompt adversarially. "Identify side effects this diff introduces that are not covered by the modified tests." (And per Tip 36: ask for all findings, severity-labeled — never tell it to be conservative.)
Use a different model than the author. A cross-lab reviewer catches blind spots a self-review shares — see multi-model use.

Instead / Prefer — the decisions that matter

57. Sandbox the loop; never run autonomous agents on a dev box or prod.

Instead of: an unattended agent with write access on your laptop or a shared CI runner.

Prefer: an ephemeral sandbox (E2B/Modal/Daytona/Northflank, or Claude Code's web VMs) — a runaway loop burns a throwaway container, not your environment.

58. Gate the plan, not every keystroke.

Instead of: full autopilot from ticket to merge, or approving every edit.

Prefer: auto-run the loop but pause for human approval at the spec and before merge. Gate the irreversible; automate the rest.

59. Cap the strikes — a stuck agent shouldn't burn tokens forever.

Instead of: "loop until the tests pass," unbounded.

Prefer: "loop until green, max 5 attempts, then stop and summarize what's blocking." An exit condition needs a give-up condition too.

60. Make the tracker the state machine — the agent forgets; the board remembers.

Instead of: holding workflow state in the agent's conversation (it gets archived/compacted away).

Prefer: store state as ticket status/labels (Todo → Planning → Approved → In Review). It survives restarts, and a human can override by moving the card.

The honest part

88% of agent pilots never reach production — and the blocker is almost never the model. It's isolation, governance, least-privilege permissions, audit, and data residency.

So treat the execution layer like any production system: scope tools per agent (Tier 6), keep secrets in a manager not prompts, log everything, and use BYOC/on-prem sandboxes if you're regulated.

Start with one managed loop (PR review), prove it, then expand. This is the top of the arc: the agent stops being a chatbot and becomes an asynchronous worker operating inside your org's existing system of record.