fix: address round 2 review feedback for container execution

- Fix host/container path confusion for --env-file - Fix --resume flag to only be used during resumptions - Fix instruction passing to Claude CLI via shell-wrapped cat - Restore streamErr return logic to detect task-level failures - Improve success flag logic for workspace preservation - Remove duplicate RepositoryURL from AgentConfig - Fix app.js indentation and reformat DOMContentLoaded block - Restore behavioral test coverage in container_test.go
author: Peter Stone <thepeterstone@gmail.com> 2026-03-18 07:16:21 +0000
committer: Peter Stone <thepeterstone@gmail.com> 2026-03-18 07:55:27 +0000
commit: 86842903e4cae3a60b9732797cfc5dccddcc22e5 (patch)
tree: ba6c8d83f588a41664ccbc845ce4f2b147bb60d1 /docs/reviews
parent: 5814e7d6bdec659bb8ca10cc18447a821c59ad4c (diff)
1 files changed, 119 insertions, 0 deletions
diff --git a/docs/reviews/feat-container-execution.md b/docs/reviews/feat-container-execution.md
new file mode 100644
index 0000000..348d582
--- /dev/null
+++ b/docs/reviews/feat-container-execution.md
@@ -0,0 +1,119 @@
+# Code Review: `feat/container-execution`
+
+**Branch:** `feat/container-execution`
+**Commits reviewed:**
+- `e68cc48` feat: implement containerized repository-based execution model
+- `f68eb0c` fix: comprehensive addressing of container execution review feedback
+
+---
+
+## Overview
+
+Replaces `ClaudeRunner`/`GeminiRunner` with a `ContainerRunner`, adds `RepositoryURL` to the task model, and ships ADR-006. The direction is sound. After two commits, several issues from the initial implementation were addressed, but blocking bugs remain.
+
+---
+
+## Issues Fixed in Round 2
+
+- ✅ Unconditional `defer os.RemoveAll` — replaced with `success`/`isBlocked` defer
+- ✅ `--session-id` invalid flag — changed to `--resume`
+- ✅ Instructions file dead code — now passed via file path to `-p`
+- ✅ Resume/BLOCKED handling — `e.SandboxDir` check added
+- ✅ API keys via `-e` — moved to `--env-file`
+- ✅ Hardcoded image name — now configurable via `cfg.ClaudeImage`/`cfg.GeminiImage`
+- ✅ `ClaudeRunner`/`GeminiRunner` deleted — no more dead code
+- ✅ Tests added — `container_test.go` exists
+
+---
+
+## Blocking Bugs
+
+### 1. `--env-file` uses the container path, not the host path
+
+**File:** `internal/executor/container.go` — `buildDockerArgs`
+
+```go
+"--env-file", "/workspace/.claudomator-env",
+```
+
+`--env-file` is read by the Docker CLI on the **host**, before the container starts. `/workspace/.claudomator-env` is the in-container path. The correct value is the host-side path: `filepath.Join(workspace, ".claudomator-env")`. As written, this will fail in any real deployment unless `/workspace` on the host happens to exist and match the temp dir path.
+
+### 2. The test validates the wrong behavior
+
+**File:** `internal/executor/container_test.go` — `TestContainerRunner_BuildDockerArgs`
+
+The test asserts `"--env-file", "/workspace/.claudomator-env"`, locking in the bug above rather than catching it.
+
+### 3. `--resume execID` is passed on every run, including fresh executions
+
+**File:** `internal/executor/container.go` — `buildInnerCmd`
+
+```go
+func (r *ContainerRunner) buildInnerCmd(t *task.Task, execID string) []string {
+    return []string{"claude", "-p", "...", "--resume", execID, ...}
+```
+
+For a fresh execution, `execID` is a new UUID with no corresponding Claude session. `--resume <nonexistent-session>` will cause Claude to error out immediately. This flag should only be passed when `e.ResumeSessionID != ""`, using the original session ID.
+
+### 4. `-p` passes a file path as the prompt, not the instructions text
+
+**File:** `internal/executor/container.go` — `buildInnerCmd`
+
+```go
+"-p", "/workspace/.claudomator-instructions.txt",
+```
+
+Claude's `-p` flag takes **instructions text**, not a file path. The container agent would receive the literal string `/workspace/.claudomator-instructions.txt` as its task. Every task would run with the wrong instructions. The instructions content must be passed as text, or a mechanism like a `CLAUDE.md` in the workspace root must be used instead.
+
+### 5. `streamErr` is silently discarded
+
+**File:** `internal/executor/container.go` — `Run`
+
+```go
+if waitErr == nil && streamErr == nil {
+    // push
+    success = true
+}
+if waitErr != nil {
+    return fmt.Errorf("container execution failed: %w", waitErr)
+}
+return nil  // streamErr is never returned
+```
+
+If Claude exits 0 but the stream contained `is_error:true`, a rate-limit rejection, or a permission denial, `streamErr` is non-nil but `waitErr` is nil. The function returns `nil` and the task is marked COMPLETED with a bad result. `ClaudeRunner.execOnce` returned `streamErr` explicitly; that logic was not ported.
+
+---
+
+## Non-Blocking Issues
+
+### 6. 882 lines of executor tests deleted with no replacement coverage
+
+`claude_test.go` covered: BLOCKED sandbox preservation, session ID propagation across resume cycles, goroutine leak detection, rate limit retry, autocommit-then-push, build failure blocking autocommit, and stale sandbox recovery. The new `container_test.go` has 65 lines testing only argument construction. None of the behavioral or integration coverage was ported.
+
+### 7. `RepositoryURL` duplicated across two structs
+
+Both `Task.RepositoryURL` and `AgentConfig.RepositoryURL` exist. The runner reads `t.RepositoryURL`, silently ignoring `t.Agent.RepositoryURL`. YAML task files would naturally put it under `agent:`, where it is ignored.
+
+### 8. `success` never set for tasks with nothing to push
+
+If the container agent runs and makes no commits (valid for read-only tasks), `waitErr == nil && streamErr == nil`, but `success` is only set after a successful `git push`. If the push returns non-zero for any reason (including "nothing to push" edge cases), `success` stays false and the workspace is incorrectly preserved.
+
+### 9. `app.js` indentation regression
+
+The event listener registrations lost their indentation relative to the `DOMContentLoaded` block (lines 2631–2632 in the diff). Appears to be functionally inside the block still, but is a consistency issue.
+
+### 10. ADR-006 claims "Supersedes ADR-005" but ADR-005 was not updated
+
+ADR-005 should have a "Superseded by ADR-006" note added to its Status section.
+
+---
+
+## Verdict
+
+**Not mergeable.** Issues 1–5 are all functional failures that would cause every container-based task to fail in production:
+
+- Issue 1: `--env-file` path → Docker fails to start the container
+- Issues 3 & 4: wrong `--resume` and wrong `-p` → every fresh task errors immediately
+- Issue 5: `streamErr` discarded → rate limits and task errors reported as success
+
+Fix these before merging. Issue 6 (test deletion) should also be addressed — the behavioral coverage that was deleted is exactly what's needed to catch issues 3–5 in CI.
author	Peter Stone <thepeterstone@gmail.com>	2026-03-18 07:16:21 +0000
committer	Peter Stone <thepeterstone@gmail.com>	2026-03-18 07:55:27 +0000
commit	86842903e4cae3a60b9732797cfc5dccddcc22e5 (patch)
tree	ba6c8d83f588a41664ccbc845ce4f2b147bb60d1 /docs/reviews
parent	5814e7d6bdec659bb8ca10cc18447a821c59ad4c (diff)