Professional Agentic Product Engineering

Tier 5

Tier 5 — Checkpointing & Hardening: Checkpoint in git and harden the harness, so you can always roll back

Reach for this tier when a long run goes wrong and you've lost good work with nothing to roll back to. With a checkpoint at every green step, a bad run costs minutes to undo, not a rewrite.

Git isn't bookkeeping here — it's the loop's memory and undo. And the harness around it — hooks that always run your tests, CI, the gh CLI — makes the whole loop deterministic and shareable with the team.

40. Commit every working step — checkpoints are what make a loop safe.

Instead of: one giant commit at the end (or none).

Prefer: commit after each green step; each commit is a state the loop (or you) can revert to.

How: "After each task step that passes tests, commit with a descriptive message." Combine with a Stop hook (below) that won't let it stop on red, and you get an agent whose every checkpoint is a working state — it can iterate hard and never lose ground.

Start from a clean tree; back up anything it can't regenerate before granting write access.

41. Let Claude drive gh — the collaboration layer is harness too.

Instead of: copy-pasting diffs into the GitHub web UI.

Prefer: have Claude use the gh CLI to open PRs, read issues, check CI, and answer review comments.

How:

gh issue view 123          # pull the task context into the session
gh pr create --fill        # open the PR from the working branch
gh pr checks               # watch CI; feed failures back into the loop
gh pr comment 45 --body "addressed in 3f9a1c2"

Claude has native git (stage/commit/branch/PR) and can run gh. In CI, claude -p + gh closes the loop from issue → branch → PR → review.

42. Use worktrees for parallel agents — separate dirs, no clobbering.

Instead of: two sessions stepping on each other in one working copy.

Prefer: git worktree add ../feat-x feat-x so each agent works on its own branch and directory.

43. Replace "remember to run tests" with a hook.

Instead of: "run the tests after each change" (a request it can forget).

Prefer: a hook that always runs them, and one that won't let it stop on red.

This is where Tier 4 becomes durable: the check you defined there gets wired into the harness, so it runs every time and can't be skipped or forgotten.

How: .claude/settings.json (commit it so the team shares the gate):

{
  "hooks": {
    "PostToolUse": [
      { "matcher": "Edit|Write",
        "hooks": [{ "type": "command",
          "command": "npm test -- --findRelatedTests \"$CLAUDE_TOOL_INPUT_FILE_PATH\"" }] }
    ],
    "Stop": [
      { "hooks": [{ "type": "command",
        "command": "INPUT=$(cat); [ \"$(echo \"$INPUT\" | jq -r .stop_hook_active)\" = true ] && exit 0; npm test || { echo 'tests still failing' >&2; exit 2; }" }] }
    ]
  }
}

exit 2 on a Stop hook forces the agent to keep working (guard with stop_hook_active so it can't loop forever — Claude overrides a Stop hook after 8 consecutive blocks). PreToolUse exit 2 blocks a tool outright — use it to protect .env, package-lock.json, .git/; a PreToolUse deny fires before the permission check, so it holds even under --dangerously-skip-permissions (hooks can tighten, never loosen). Hooks aren't only shell + exit codes: beyond command, a hook can be a prompt (a quick Haiku yes/no judgment) or an agent (a subagent that verifies before allowing stop) — an agent-based verify hook is a sturdier loop check than a brittle test-runner line.

44. Move repetitive engineering into CI / headless — run it without you in the seat.

Instead of: doing PR review, issue triage, and release notes by hand each time.

Prefer: run the agent non-interactively in your pipeline, scoped tightly.

How:

claude -p "Review this PR diff for correctness and security; comment only on real issues." \
  --allowedTools "Bash(git diff:*),Read,Grep"

Or the Claude Code GitHub Action on pull_request. Note: as of mid-2026, programmatic use (claude -p, Agent SDK, Actions) bills from a separate Agent-SDK pool at API rates — budget for it.