diff options
Diffstat (limited to 'docs/reviews')
| -rw-r--r-- | docs/reviews/feat-container-execution.md | 119 |
1 files changed, 119 insertions, 0 deletions
diff --git a/docs/reviews/feat-container-execution.md b/docs/reviews/feat-container-execution.md new file mode 100644 index 0000000..348d582 --- /dev/null +++ b/docs/reviews/feat-container-execution.md @@ -0,0 +1,119 @@ +# Code Review: `feat/container-execution` + +**Branch:** `feat/container-execution` +**Commits reviewed:** +- `e68cc48` feat: implement containerized repository-based execution model +- `f68eb0c` fix: comprehensive addressing of container execution review feedback + +--- + +## Overview + +Replaces `ClaudeRunner`/`GeminiRunner` with a `ContainerRunner`, adds `RepositoryURL` to the task model, and ships ADR-006. The direction is sound. After two commits, several issues from the initial implementation were addressed, but blocking bugs remain. + +--- + +## Issues Fixed in Round 2 + +- ✅ Unconditional `defer os.RemoveAll` — replaced with `success`/`isBlocked` defer +- ✅ `--session-id` invalid flag — changed to `--resume` +- ✅ Instructions file dead code — now passed via file path to `-p` +- ✅ Resume/BLOCKED handling — `e.SandboxDir` check added +- ✅ API keys via `-e` — moved to `--env-file` +- ✅ Hardcoded image name — now configurable via `cfg.ClaudeImage`/`cfg.GeminiImage` +- ✅ `ClaudeRunner`/`GeminiRunner` deleted — no more dead code +- ✅ Tests added — `container_test.go` exists + +--- + +## Blocking Bugs + +### 1. `--env-file` uses the container path, not the host path + +**File:** `internal/executor/container.go` — `buildDockerArgs` + +```go +"--env-file", "/workspace/.claudomator-env", +``` + +`--env-file` is read by the Docker CLI on the **host**, before the container starts. `/workspace/.claudomator-env` is the in-container path. The correct value is the host-side path: `filepath.Join(workspace, ".claudomator-env")`. As written, this will fail in any real deployment unless `/workspace` on the host happens to exist and match the temp dir path. + +### 2. The test validates the wrong behavior + +**File:** `internal/executor/container_test.go` — `TestContainerRunner_BuildDockerArgs` + +The test asserts `"--env-file", "/workspace/.claudomator-env"`, locking in the bug above rather than catching it. + +### 3. `--resume execID` is passed on every run, including fresh executions + +**File:** `internal/executor/container.go` — `buildInnerCmd` + +```go +func (r *ContainerRunner) buildInnerCmd(t *task.Task, execID string) []string { + return []string{"claude", "-p", "...", "--resume", execID, ...} +``` + +For a fresh execution, `execID` is a new UUID with no corresponding Claude session. `--resume <nonexistent-session>` will cause Claude to error out immediately. This flag should only be passed when `e.ResumeSessionID != ""`, using the original session ID. + +### 4. `-p` passes a file path as the prompt, not the instructions text + +**File:** `internal/executor/container.go` — `buildInnerCmd` + +```go +"-p", "/workspace/.claudomator-instructions.txt", +``` + +Claude's `-p` flag takes **instructions text**, not a file path. The container agent would receive the literal string `/workspace/.claudomator-instructions.txt` as its task. Every task would run with the wrong instructions. The instructions content must be passed as text, or a mechanism like a `CLAUDE.md` in the workspace root must be used instead. + +### 5. `streamErr` is silently discarded + +**File:** `internal/executor/container.go` — `Run` + +```go +if waitErr == nil && streamErr == nil { + // push + success = true +} +if waitErr != nil { + return fmt.Errorf("container execution failed: %w", waitErr) +} +return nil // streamErr is never returned +``` + +If Claude exits 0 but the stream contained `is_error:true`, a rate-limit rejection, or a permission denial, `streamErr` is non-nil but `waitErr` is nil. The function returns `nil` and the task is marked COMPLETED with a bad result. `ClaudeRunner.execOnce` returned `streamErr` explicitly; that logic was not ported. + +--- + +## Non-Blocking Issues + +### 6. 882 lines of executor tests deleted with no replacement coverage + +`claude_test.go` covered: BLOCKED sandbox preservation, session ID propagation across resume cycles, goroutine leak detection, rate limit retry, autocommit-then-push, build failure blocking autocommit, and stale sandbox recovery. The new `container_test.go` has 65 lines testing only argument construction. None of the behavioral or integration coverage was ported. + +### 7. `RepositoryURL` duplicated across two structs + +Both `Task.RepositoryURL` and `AgentConfig.RepositoryURL` exist. The runner reads `t.RepositoryURL`, silently ignoring `t.Agent.RepositoryURL`. YAML task files would naturally put it under `agent:`, where it is ignored. + +### 8. `success` never set for tasks with nothing to push + +If the container agent runs and makes no commits (valid for read-only tasks), `waitErr == nil && streamErr == nil`, but `success` is only set after a successful `git push`. If the push returns non-zero for any reason (including "nothing to push" edge cases), `success` stays false and the workspace is incorrectly preserved. + +### 9. `app.js` indentation regression + +The event listener registrations lost their indentation relative to the `DOMContentLoaded` block (lines 2631–2632 in the diff). Appears to be functionally inside the block still, but is a consistency issue. + +### 10. ADR-006 claims "Supersedes ADR-005" but ADR-005 was not updated + +ADR-005 should have a "Superseded by ADR-006" note added to its Status section. + +--- + +## Verdict + +**Not mergeable.** Issues 1–5 are all functional failures that would cause every container-based task to fail in production: + +- Issue 1: `--env-file` path → Docker fails to start the container +- Issues 3 & 4: wrong `--resume` and wrong `-p` → every fresh task errors immediately +- Issue 5: `streamErr` discarded → rate limits and task errors reported as success + +Fix these before merging. Issue 6 (test deletion) should also be addressed — the behavioral coverage that was deleted is exactly what's needed to catch issues 3–5 in CI. |
