summaryrefslogtreecommitdiff
path: root/docs/adr/005-sandbox-execution-model.md
diff options
context:
space:
mode:
authorPeter Stone <thepeterstone@gmail.com>2026-03-10 23:57:52 +0000
committerPeter Stone <thepeterstone@gmail.com>2026-03-10 23:57:52 +0000
commitce185cd10839879e566d0dcf4a14466f0148634f (patch)
treed3df8ffb1e1fdb25fe321b7ad7ec8c820af81d40 /docs/adr/005-sandbox-execution-model.md
parent3226af322cb7c7f7a93a91e98651538b1bec54ee (diff)
docs: add development narrative and ADRs 004-005
RAW_NARRATIVE.md: comprehensive chronological engineering history reconstructed from the git log covering all 45 major milestones. ADR-004: multi-agent routing — explicit load balancing in code (pickAgent) plus Gemini-based model classification (Classifier), and why the two decisions are intentionally separated. ADR-005: git sandbox execution model — clone isolation, bare-repo push, uncommitted-change enforcement, BLOCKED preservation, and session ID propagation on second resume cycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'docs/adr/005-sandbox-execution-model.md')
-rw-r--r--docs/adr/005-sandbox-execution-model.md130
1 files changed, 130 insertions, 0 deletions
diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md
new file mode 100644
index 0000000..b374561
--- /dev/null
+++ b/docs/adr/005-sandbox-execution-model.md
@@ -0,0 +1,130 @@
+# ADR-005: Git Sandbox Execution Model
+
+## Status
+Accepted
+
+## Context
+
+Tasks that modify source code need a safe execution environment. Running agents
+directly in the canonical working copy creates several problems:
+
+1. **Concurrent corruption**: multiple agents running in the same directory stomp
+ on each other's changes.
+2. **Partial work leaks**: if a task is cancelled mid-run, half-written files
+ remain in the working tree, blocking other work.
+3. **No rollback**: a failed agent may leave the codebase in a broken state with
+ no clean way to recover without manual `git reset`.
+4. **Audit trail**: changes made by an agent should be visible as discrete,
+ attributable git commits — not as an anonymous blob of working-tree noise.
+
+## Decision
+
+When a task has `agent.project_dir` set, `ClaudeRunner.Run` executes the agent
+in an isolated git clone (a "sandbox") rather than in the project directory
+directly.
+
+### Sandbox lifecycle
+
+```
+project_dir (canonical working copy)
+ |
+ | git clone --no-hardlinks <clone-source> /tmp/claudomator-sandbox-*
+ |
+ v
+sandbox (temp clone)
+ |
+ | agent runs here; commits its changes
+ |
+ | git push (to bare repo "local" or "origin")
+ |
+teardown: verify no uncommitted changes, remove sandbox dir
+```
+
+### Clone source
+
+`sandboxCloneSource` prefers a remote named `"local"` (a local bare repo).
+If not present it falls back to the `"origin"` remote. Using a bare repo
+accepts pushes cleanly; pushing to a non-bare working copy fails when the
+receiving branch is checked out.
+
+### Uncommitted-change enforcement
+
+Before teardown, the runner runs `git status --porcelain` in the sandbox.
+If any uncommitted changes are detected, the task is failed with an error
+message listing the files. The sandbox is **preserved** so the operator
+can inspect or recover the partial work. The error message includes the
+sandbox path.
+
+### Concurrent push conflicts
+
+If two sandboxes try to push at the same time, the second push is rejected
+(`"fetch first"` or `"non-fast-forward"` in the error output). The runner
+detects these signals and performs a fetch → rebase → retry sequence, up to
+a fixed retry limit, before giving up.
+
+### BLOCKED state and sandbox preservation
+
+When an agent exits with a `question.json` file (entering the `BLOCKED`
+state), the sandbox is **not** torn down. The preserved sandbox allows the
+resumed execution to pick up the same working tree state, including any
+in-progress file changes made before the agent asked its question.
+
+Resume executions (`SubmitResume`) skip sandbox setup entirely and run
+directly in `project_dir`, passing `--resume <session-id>` to the agent
+so Claude can continue its previous conversation.
+
+### Session ID propagation on resume
+
+A subtle bug was found and fixed: when a resumed execution is itself blocked
+again (a second BLOCKED→answer→resume cycle), the new execution record must
+carry the **original** `ResumeSessionID`, not the new execution's own UUID.
+If the wrong ID is used, `claude --resume` fails with "No conversation found".
+The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as
+`e.SessionID` rather than `e.ID`.
+
+## Rationale
+
+- **`--no-hardlinks`**: git defaults to hardlinking objects between clone and
+ source when both are on the same filesystem. This causes permission errors
+ when the source is owned by a different user (e.g. `www-data` vs. `root`).
+ The flag forces a full copy.
+- **Bare repo for push target**: non-bare repos reject pushes to checked-out
+ branches. A bare repo (`git init --bare`) accepts all pushes safely.
+- **Preserve sandbox on failure**: partial agent work may be valuable for
+ debugging or resumption. Destroying it immediately on failure was considered
+ and rejected.
+- **Agent must commit**: requiring the agent to commit all changes before
+ exiting ensures git history is always the source of truth. The enforcement
+ check (uncommitted files → FAILED) makes this invariant observable.
+
+## Alternatives Considered
+
+- **Run directly in working copy**: rejected because of concurrent corruption
+ and partial-work leakage.
+- **Copy files instead of git clone**: rejected because the agent needs a
+ working git history (for `git log`, `git blame`, and to push commits back).
+- **Docker/container isolation**: considered for stronger isolation but
+ rejected due to operational complexity, dependency on container runtime,
+ and inability to use the host's claude/gemini credentials.
+
+## Consequences
+
+- Tasks without `project_dir` are unaffected; they run in whatever working
+ directory the server process inherited.
+- If a sandbox's push repeatedly fails (e.g. due to a bare repo that is
+ itself broken), the task is failed with the sandbox preserved.
+- If `/tmp` runs out of space (many large sandboxes), tasks will fail at
+ clone time. This is a known operational risk with no current mitigation.
+- The `project_dir` field in task YAML must point to a git repository with
+ a configured `"local"` or `"origin"` remote that accepts pushes.
+
+## Relevant Code Locations
+
+| Concern | File |
+|---|---|
+| Sandbox setup/teardown | `internal/executor/claude.go` |
+| `setupSandbox`, `teardownSandbox` | `internal/executor/claude.go` |
+| `sandboxCloneSource` | `internal/executor/claude.go` |
+| Resume skips sandbox | `internal/executor/claude.go` (Run) |
+| Session ID propagation fix | `internal/executor/claude.go` (Run) |
+| Sandbox tests | `internal/executor/claude_test.go` |