docs/adr/005-sandbox-execution-model.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130

# ADR-005: Git Sandbox Execution Model

## Status
Accepted

## Context

Tasks that modify source code need a safe execution environment. Running agents
directly in the canonical working copy creates several problems:

1. **Concurrent corruption**: multiple agents running in the same directory stomp
   on each other's changes.
2. **Partial work leaks**: if a task is cancelled mid-run, half-written files
   remain in the working tree, blocking other work.
3. **No rollback**: a failed agent may leave the codebase in a broken state with
   no clean way to recover without manual `git reset`.
4. **Audit trail**: changes made by an agent should be visible as discrete,
   attributable git commits — not as an anonymous blob of working-tree noise.

## Decision

When a task has `agent.project_dir` set, `ClaudeRunner.Run` executes the agent
in an isolated git clone (a "sandbox") rather than in the project directory
directly.

### Sandbox lifecycle

```
project_dir (canonical working copy)
    |
    | git clone --no-hardlinks <clone-source> /tmp/claudomator-sandbox-*
    |
    v
sandbox (temp clone)
    |
    | agent runs here; commits its changes
    |
    | git push (to bare repo "local" or "origin")
    |
teardown: verify no uncommitted changes, remove sandbox dir
```

### Clone source

`sandboxCloneSource` prefers a remote named `"local"` (a local bare repo).
If not present it falls back to the `"origin"` remote. Using a bare repo
accepts pushes cleanly; pushing to a non-bare working copy fails when the
receiving branch is checked out.

### Uncommitted-change enforcement

Before teardown, the runner runs `git status --porcelain` in the sandbox.
If any uncommitted changes are detected, the task is failed with an error
message listing the files. The sandbox is **preserved** so the operator
can inspect or recover the partial work. The error message includes the
sandbox path.

### Concurrent push conflicts

If two sandboxes try to push at the same time, the second push is rejected
(`"fetch first"` or `"non-fast-forward"` in the error output). The runner
detects these signals and performs a fetch → rebase → retry sequence, up to
a fixed retry limit, before giving up.

### BLOCKED state and sandbox preservation

When an agent exits with a `question.json` file (entering the `BLOCKED`
state), the sandbox is **not** torn down. The preserved sandbox allows the
resumed execution to pick up the same working tree state, including any
in-progress file changes made before the agent asked its question.

Resume executions (`SubmitResume`) skip sandbox setup entirely and run
directly in `project_dir`, passing `--resume <session-id>` to the agent
so Claude can continue its previous conversation.

### Session ID propagation on resume

A subtle bug was found and fixed: when a resumed execution is itself blocked
again (a second BLOCKED→answer→resume cycle), the new execution record must
carry the **original** `ResumeSessionID`, not the new execution's own UUID.
If the wrong ID is used, `claude --resume` fails with "No conversation found".
The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as
`e.SessionID` rather than `e.ID`.

## Rationale

- **`--no-hardlinks`**: git defaults to hardlinking objects between clone and
  source when both are on the same filesystem. This causes permission errors
  when the source is owned by a different user (e.g. `www-data` vs. `root`).
  The flag forces a full copy.
- **Bare repo for push target**: non-bare repos reject pushes to checked-out
  branches. A bare repo (`git init --bare`) accepts all pushes safely.
- **Preserve sandbox on failure**: partial agent work may be valuable for
  debugging or resumption. Destroying it immediately on failure was considered
  and rejected.
- **Agent must commit**: requiring the agent to commit all changes before
  exiting ensures git history is always the source of truth. The enforcement
  check (uncommitted files → FAILED) makes this invariant observable.

## Alternatives Considered

- **Run directly in working copy**: rejected because of concurrent corruption
  and partial-work leakage.
- **Copy files instead of git clone**: rejected because the agent needs a
  working git history (for `git log`, `git blame`, and to push commits back).
- **Docker/container isolation**: considered for stronger isolation but
  rejected due to operational complexity, dependency on container runtime,
  and inability to use the host's claude/gemini credentials.

## Consequences

- Tasks without `project_dir` are unaffected; they run in whatever working
  directory the server process inherited.
- If a sandbox's push repeatedly fails (e.g. due to a bare repo that is
  itself broken), the task is failed with the sandbox preserved.
- If `/tmp` runs out of space (many large sandboxes), tasks will fail at
  clone time. This is a known operational risk with no current mitigation.
- The `project_dir` field in task YAML must point to a git repository with
  a configured `"local"` or `"origin"` remote that accepts pushes.

## Relevant Code Locations

| Concern | File |
|---|---|
| Sandbox setup/teardown | `internal/executor/claude.go` |
| `setupSandbox`, `teardownSandbox` | `internal/executor/claude.go` |
| `sandboxCloneSource` | `internal/executor/claude.go` |
| Resume skips sandbox | `internal/executor/claude.go` (Run) |
| Session ID propagation fix | `internal/executor/claude.go` (Run) |
| Sandbox tests | `internal/executor/claude_test.go` |