diff options
Diffstat (limited to 'docs/adr/005-sandbox-execution-model.md')
| -rw-r--r-- | docs/adr/005-sandbox-execution-model.md | 130 |
1 files changed, 130 insertions, 0 deletions
diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md new file mode 100644 index 0000000..b374561 --- /dev/null +++ b/docs/adr/005-sandbox-execution-model.md @@ -0,0 +1,130 @@ +# ADR-005: Git Sandbox Execution Model + +## Status +Accepted + +## Context + +Tasks that modify source code need a safe execution environment. Running agents +directly in the canonical working copy creates several problems: + +1. **Concurrent corruption**: multiple agents running in the same directory stomp + on each other's changes. +2. **Partial work leaks**: if a task is cancelled mid-run, half-written files + remain in the working tree, blocking other work. +3. **No rollback**: a failed agent may leave the codebase in a broken state with + no clean way to recover without manual `git reset`. +4. **Audit trail**: changes made by an agent should be visible as discrete, + attributable git commits — not as an anonymous blob of working-tree noise. + +## Decision + +When a task has `agent.project_dir` set, `ClaudeRunner.Run` executes the agent +in an isolated git clone (a "sandbox") rather than in the project directory +directly. + +### Sandbox lifecycle + +``` +project_dir (canonical working copy) + | + | git clone --no-hardlinks <clone-source> /tmp/claudomator-sandbox-* + | + v +sandbox (temp clone) + | + | agent runs here; commits its changes + | + | git push (to bare repo "local" or "origin") + | +teardown: verify no uncommitted changes, remove sandbox dir +``` + +### Clone source + +`sandboxCloneSource` prefers a remote named `"local"` (a local bare repo). +If not present it falls back to the `"origin"` remote. Using a bare repo +accepts pushes cleanly; pushing to a non-bare working copy fails when the +receiving branch is checked out. + +### Uncommitted-change enforcement + +Before teardown, the runner runs `git status --porcelain` in the sandbox. +If any uncommitted changes are detected, the task is failed with an error +message listing the files. The sandbox is **preserved** so the operator +can inspect or recover the partial work. The error message includes the +sandbox path. + +### Concurrent push conflicts + +If two sandboxes try to push at the same time, the second push is rejected +(`"fetch first"` or `"non-fast-forward"` in the error output). The runner +detects these signals and performs a fetch → rebase → retry sequence, up to +a fixed retry limit, before giving up. + +### BLOCKED state and sandbox preservation + +When an agent exits with a `question.json` file (entering the `BLOCKED` +state), the sandbox is **not** torn down. The preserved sandbox allows the +resumed execution to pick up the same working tree state, including any +in-progress file changes made before the agent asked its question. + +Resume executions (`SubmitResume`) skip sandbox setup entirely and run +directly in `project_dir`, passing `--resume <session-id>` to the agent +so Claude can continue its previous conversation. + +### Session ID propagation on resume + +A subtle bug was found and fixed: when a resumed execution is itself blocked +again (a second BLOCKED→answer→resume cycle), the new execution record must +carry the **original** `ResumeSessionID`, not the new execution's own UUID. +If the wrong ID is used, `claude --resume` fails with "No conversation found". +The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as +`e.SessionID` rather than `e.ID`. + +## Rationale + +- **`--no-hardlinks`**: git defaults to hardlinking objects between clone and + source when both are on the same filesystem. This causes permission errors + when the source is owned by a different user (e.g. `www-data` vs. `root`). + The flag forces a full copy. +- **Bare repo for push target**: non-bare repos reject pushes to checked-out + branches. A bare repo (`git init --bare`) accepts all pushes safely. +- **Preserve sandbox on failure**: partial agent work may be valuable for + debugging or resumption. Destroying it immediately on failure was considered + and rejected. +- **Agent must commit**: requiring the agent to commit all changes before + exiting ensures git history is always the source of truth. The enforcement + check (uncommitted files → FAILED) makes this invariant observable. + +## Alternatives Considered + +- **Run directly in working copy**: rejected because of concurrent corruption + and partial-work leakage. +- **Copy files instead of git clone**: rejected because the agent needs a + working git history (for `git log`, `git blame`, and to push commits back). +- **Docker/container isolation**: considered for stronger isolation but + rejected due to operational complexity, dependency on container runtime, + and inability to use the host's claude/gemini credentials. + +## Consequences + +- Tasks without `project_dir` are unaffected; they run in whatever working + directory the server process inherited. +- If a sandbox's push repeatedly fails (e.g. due to a bare repo that is + itself broken), the task is failed with the sandbox preserved. +- If `/tmp` runs out of space (many large sandboxes), tasks will fail at + clone time. This is a known operational risk with no current mitigation. +- The `project_dir` field in task YAML must point to a git repository with + a configured `"local"` or `"origin"` remote that accepts pushes. + +## Relevant Code Locations + +| Concern | File | +|---|---| +| Sandbox setup/teardown | `internal/executor/claude.go` | +| `setupSandbox`, `teardownSandbox` | `internal/executor/claude.go` | +| `sandboxCloneSource` | `internal/executor/claude.go` | +| Resume skips sandbox | `internal/executor/claude.go` (Run) | +| Session ID propagation fix | `internal/executor/claude.go` (Run) | +| Sandbox tests | `internal/executor/claude_test.go` | |
