Tier 8
Tier 8 — Agent Execution Layer: Put agents into production, so they work without you
Reach for this tier when the team needs it — agents must pick up tickets and open PRs without anyone babysitting a terminal.
While running a local loop is perfect for individual spec-driven development, enterprise agentic loops require a dedicated Agent Execution Layer. It's the layer beyond your own server (Tier 7): loops run hosted, isolated, and event-triggered.
Don't be scared — it's the same loop you already built. Everything from Tier 4 (act → test → fix against a Definition of Done) and Tier 5 (commit, gh, CI) is unchanged.
The execution layer only answers three new questions: where does it run, what triggers it, and how do many agents avoid colliding?
You adopt it gradually — start fully managed, self-host only when you need the control.
Start here — a real execution layer with zero infra
You don't build anything to begin. The managed on-ramps give you a production loop on Monday:
- GitHub. Run
/install-github-app, then@claudein any issue or PR to implement or fix; switch on automatic Code Review to get a review on every PR. On Agent HQ you can assign Claude (or Codex/Copilot) to an issue and get a draft PR back — reviewed like a teammate's, no new dashboard to learn. - Linear. Assign an issue to the Linear agent and it runs a cloud coding session (powered by Claude Code/Codex), grounded in the ticket, history, and linked context, and returns a diff for review. (Linear resolves ~30% of its own incoming bugs this way, mostly first-pass.) Want your own harness instead? Cyrus (open-source) makes Claude Code an assignable Linear agent via Linear's Agent API.
- Anthropic-managed. Claude Code on the web runs sessions in isolated cloud VMs; Routines run scheduled jobs on Anthropic's infra with no local machine.
Wire one trigger, keep a human approval gate, and you have an execution layer. That's the whole "try it" step.
The stack — what it's made of when you build your own
When you outgrow managed (control, data residency, custom tools), the layer has four parts:
| Layer | Job | Options |
|---|---|---|
| Runtime sandbox | Run untrusted agent code + tests in throwaway isolation, so a runaway loop burns a cheap container, not prod | E2B (Firecracker microVMs), Modal (serverless + GPU), Daytona / Northflank (compliance, BYOC); Claude Code's own /sandbox and web VMs |
| Connectivity (MCP) | Let the agent read Slack / Notion / Postgres / Sentry without hardcoded creds | MCP servers (Tier 3) |
| Orchestration / state | Track which phase each agent is in; coordinate many | LangGraph, Claude Agent SDK, Warp Oz, Claude Code dynamic workflows |
| System of record | The ground truth + state machine — the agent forgets, the board remembers | Linear / GitHub issues, whose workflow states are the state machine |
A trick worth stealing: make the tracker the agent's memory. The conversation is disposable; the Linear/GitHub workflow state (Todo → In Progress → In Review) is the durable state machine the loop reads on every event.
The loop — from ticket to merged PR
This is just Tier 4's loop, hosted and gated:
flowchart TD
Tk[Ticket → Agent Todo] --> Pl[Plan: spec sub-agent]
Pl --> G1{Human approves spec?}
G1 -->|yes| Ex[Isolated execution<br/>sandbox + own worktree]
Ex --> Loop[Build & test loop<br/>until green · capped ~3–5 strikes]
Loop --> PR[Commit & open PR via gh]
PR --> Rev[Review agent posts findings]
Rev --> Fix[Execution agent fixes]
Fix --> G2{Human merges}- Trigger & context. A human moves a ticket to "Agent Todo"; a webhook fires. The agent reads the ticket (MCP) and pulls only the relevant files (code intelligence) — not the whole repo.
- Plan (spec sub-agent). A planning subagent drafts a short spec and posts it back to the ticket; the loop pauses for a human "Approve." Keep this gate.
- Isolated execution. The execution agent spins up a sandbox, clones, and works in its own git worktree/branch — so ten agents on ten tickets never touch each other's files.
- Build & test loop. It writes code and runs the suite in the sandbox. On failure the orchestrator feeds the raw error back; it loops until green, capped at ~3–5 strikes so a stuck agent can't burn tokens forever. (Your Definition of Done is the exit condition — and the cap is the give-up condition.)
- PR. It commits and opens a PR via
gh, under your identity sogit blamestays useful. - Review agent. A separate review agent triggers on the PR (below), posts findings, and the execution agent fixes them before a human looks.
The review-agent pattern (most teams' first loop)
PR review is where to start: high value, low risk. The pattern:
- Feed the diff, not the codebase. Pull only the changed lines + surrounding functions. Dumping the whole repo destroys the reviewer's context.
- Load the rulebook. Inject your
ARCHITECTURE.mdor acode-reviewSkill ("always use the repository pattern for DB access") so it reviews against your standards. - Prompt adversarially. "Identify side effects this diff introduces that are not covered by the modified tests." (And per Tip 36: ask for all findings, severity-labeled — never tell it to be conservative.)
- Use a different model than the author. A cross-lab reviewer catches blind spots a self-review shares — see multi-model use.
Instead / Prefer — the decisions that matter
57. Sandbox the loop; never run autonomous agents on a dev box or prod.
Instead of: an unattended agent with write access on your laptop or a shared CI runner.
Prefer: an ephemeral sandbox (E2B/Modal/Daytona/Northflank, or Claude Code's web VMs) — a runaway loop burns a throwaway container, not your environment.
58. Gate the plan, not every keystroke.
Instead of: full autopilot from ticket to merge, or approving every edit.
Prefer: auto-run the loop but pause for human approval at the spec and before merge. Gate the irreversible; automate the rest.
59. Cap the strikes — a stuck agent shouldn't burn tokens forever.
Instead of: "loop until the tests pass," unbounded.
Prefer: "loop until green, max 5 attempts, then stop and summarize what's blocking." An exit condition needs a give-up condition too.
60. Make the tracker the state machine — the agent forgets; the board remembers.
Instead of: holding workflow state in the agent's conversation (it gets archived/compacted away).
Prefer: store state as ticket status/labels (Todo → Planning → Approved → In Review). It survives restarts, and a human can override by moving the card.
The honest part
88% of agent pilots never reach production — and the blocker is almost never the model. It's isolation, governance, least-privilege permissions, audit, and data residency.
So treat the execution layer like any production system: scope tools per agent (Tier 6), keep secrets in a manager not prompts, log everything, and use BYOC/on-prem sandboxes if you're regulated.
Start with one managed loop (PR review), prove it, then expand. This is the top of the arc: the agent stops being a chatbot and becomes an asynchronous worker operating inside your org's existing system of record.