summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/IMPROVEMENT_PLAN.md278
-rw-r--r--docs/RAW_NARRATIVE.md399
-rw-r--r--docs/adr/005-sandbox-execution-model.md25
-rw-r--r--docs/adr/006-containerized-execution.md51
-rw-r--r--docs/adr/007-planning-layer-and-story-model.md265
-rw-r--r--docs/architecture.md392
-rw-r--r--docs/superpowers/plans/2026-04-03-task-project-fk.md837
-rw-r--r--docs/superpowers/plans/2026-04-04-task-checker-story-ship.md1226
-rw-r--r--docs/superpowers/specs/2026-04-04-task-checker-story-ship.md222
9 files changed, 2896 insertions, 799 deletions
diff --git a/docs/IMPROVEMENT_PLAN.md b/docs/IMPROVEMENT_PLAN.md
new file mode 100644
index 0000000..db5d6a9
--- /dev/null
+++ b/docs/IMPROVEMENT_PLAN.md
@@ -0,0 +1,278 @@
+# Improvement Plan
+
+Comprehensive assessment of stability, architectural coherence, simplicity and cleanliness issues, with prioritized fixes.
+
+---
+
+## Critical Bugs (fix immediately)
+
+### 1. `sandboxDir` variable shadowing — git clone failures
+
+**File:** `internal/executor/claude.go`, lines ~119–137
+
+Both `setupSandbox` calls inside `Run()` use short variable declarations (`:=`) inside nested blocks, creating inner variables that shadow the outer `var sandboxDir string`. The outer variable is always `""`.
+
+**Effects:**
+- `teardownSandbox` is never called → agent work discarded, changes never pushed to origin
+- Sandboxes accumulate in `/tmp/claudomator-sandbox-*` until disk is full
+- BLOCKED task resume always clones fresh → loses agent's partial work from first run
+- Error messages omit sandbox path, making debugging harder
+
+**Fix:**
+
+```go
+// In the resume path (inside `if projectDir != ""`):
+var err error
+sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger) // = not :=
+
+// In the non-resume path (inside `else if projectDir != ""`):
+var err error
+sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger) // = not :=
+```
+
+Both sites already declare `var err error` on the line above. Simply drop the `:` from `:=`.
+
+### 2. Hardcoded `master` branch in sandbox teardown
+
+**File:** `internal/executor/claude.go`, `teardownSandbox()`, line ~377
+
+```go
+git pull --rebase origin master // fails on repos using "main"
+```
+
+**Fix:** Detect the remote default branch before pushing:
+
+```go
+branch, _ := exec.Command("git", "-C", sandboxDir, "rev-parse", "--abbrev-ref", "HEAD").Output()
+defaultBranch := strings.TrimSpace(string(branch))
+if defaultBranch == "" || defaultBranch == "HEAD" {
+ defaultBranch = "main"
+}
+// then use defaultBranch instead of "master"
+```
+
+---
+
+## High Priority — Stability & Correctness
+
+### 3. GeminiRunner is a non-functional stub in production code
+
+**File:** `internal/executor/gemini.go`
+
+`execOnce()` spawns a goroutine writing hardcoded JSON to a pipe instead of running the `gemini` binary. Any task with `agent.type: "gemini"` silently returns fake output.
+
+**Options (choose one):**
+- **Implement properly:** Mirror ClaudeRunner structure — subprocess + process group + stream parsing + sandbox support. GeminiRunner must also implement `LogPather`.
+- **Remove until ready:** Delete `gemini.go`, remove `GeminiRunner` from `NewPool` in `cli/serve.go`, and gate the `Classifier` on Claude-only operation. Add `// TODO: GeminiRunner not implemented` clearly.
+
+The stub in production is a correctness hazard. Removing it is safer than leaving it in place.
+
+### 4. Atomic execution creation + state update
+
+**Files:** `internal/executor/executor.go` `execute()`, `internal/storage/db.go`
+
+`CreateExecution` and `UpdateTaskState(RUNNING)` are two separate DB writes with no transaction. A crash between them leaves the task PENDING with an orphaned RUNNING execution record. `RecoverStaleRunning` handles the RUNNING-task case but not the PENDING-task-with-running-execution case.
+
+**Fix:** Add `storage.DB.CreateExecutionAndSetRunning(exec, taskID)` that wraps both writes in a single transaction.
+
+### 5. `context.Background()` in resume submission
+
+**File:** `internal/api/server.go`, `handleAnswerQuestion()`
+
+```go
+p.SubmitResume(context.Background(), task, exec)
+```
+
+This detaches the resume from any server lifecycle context. If the server has a root context with a shutdown deadline, resumes submitted this way will not be cancelled.
+
+**Fix:** Pass the server's root context (or a derived context) instead of `Background()`.
+
+### 6. Pool `dispatch()` goroutine leaks on server shutdown
+
+**File:** `internal/executor/executor.go`
+
+`NewPool` starts `go p.dispatch()` but `workCh` is never closed. On server shutdown, `dispatch()` blocks forever on `<-p.workCh`. The pool has no `Shutdown()` method.
+
+**Fix:** Add `Pool.Shutdown(ctx context.Context)` that closes `workCh` and waits for active workers to drain.
+
+---
+
+## Medium Priority — Architectural Coherence
+
+### 7. Priority field is stored but never dispatched
+
+**Files:** `internal/task/task.go`, `internal/executor/executor.go`
+
+Priority is parsed, stored in SQLite, returned via API — but `dispatch()` uses a single FIFO channel, ignoring priority entirely.
+
+**Options:**
+- Use a priority queue (e.g., `container/heap`) keyed by `Priority` for `workCh`
+- Or remove `Priority` from `Task` and documentation until implemented — don't ship half-features
+
+### 8. RetryConfig is stored but retry is manual
+
+**Files:** `internal/task/task.go`, `internal/executor/executor.go`
+
+`RetryConfig.MaxAttempts` and `Backoff` are stored but never read. Every "retry" is a manual user action via `/resume`.
+
+**Options:**
+- Implement automatic retry in `handleRunResult` when `MaxAttempts > 1` and `currentAttempt < MaxAttempts`
+- Or remove `RetryConfig` from the schema/API until implemented
+
+### 9. Changestats extracted in two places
+
+**Files:** `internal/executor/executor.go` `handleRunResult()`, `internal/api/server.go` `processResult()`
+
+Both call `task.ParseChangestatFromFile()` on the same file and write to the same row. The second write is idempotent but the duplication is confusing.
+
+**Fix:** Remove the extraction from `api.Server.processResult()`. The pool already handles it reliably before broadcasting the result.
+
+### 10. `withFailureHistory` shallow copy is unsafe
+
+**File:** `internal/executor/executor.go`
+
+```go
+copy := *t // copies struct
+copy.Agent = t.Agent // copies AgentConfig struct, slices share backing array
+```
+
+If any downstream code appends to `copy.Agent.AllowedTools`, it will mutate the original. The current code only writes to `SystemPromptAppend` (a string), so this is safe today but fragile.
+
+**Fix:** Deep-copy slices in AgentConfig:
+```go
+agent := t.Agent
+agent.AllowedTools = append([]string(nil), t.Agent.AllowedTools...)
+agent.DisallowedTools = append([]string(nil), t.Agent.DisallowedTools...)
+agent.ContextFiles = append([]string(nil), t.Agent.ContextFiles...)
+agent.AdditionalArgs = append([]string(nil), t.Agent.AdditionalArgs...)
+copy.Agent = agent
+```
+
+### 11. Additive migration strategy has no version tracking
+
+**File:** `internal/storage/db.go` `migrate()`
+
+Migrations are applied in code order; the only idempotency guard is ignoring "column already exists" errors. There is no way to detect if a migration was partially applied, rolled back incorrectly, or was applied out of order by a different code version.
+
+**Fix (minimal):** Use a `schema_migrations` table with a version counter. Apply only migrations with version > current max. This prevents re-running migrations and makes the migration history explicit.
+
+### 12. Server test-seam fields in production struct
+
+**File:** `internal/api/server.go`
+
+```go
+elaborateCmdPath string // overrides claudeBinPath in tests
+validateCmdPath string // overrides claudeBinPath in tests
+```
+
+These exist solely to inject test paths. They are an internal testing seam leaked into the production struct.
+
+**Fix:** Extract an `Elaborator` interface (e.g., `Elaborate(ctx, request, workDir string) (string, error)`) and inject it. Tests inject a fake; production builds inject the real Claude-backed implementation.
+
+---
+
+## Lower Priority — Simplicity & Cleanliness
+
+### 13. Classifier output parsing is brittle string manipulation
+
+**File:** `internal/executor/classifier.go`
+
+The classifier strips `"Loaded cached credentials."` by line scanning, strips markdown fences by `strings.TrimPrefix/TrimSuffix`, and falls back through multiple parsing paths. This will silently break if Gemini CLI output format changes.
+
+**Fix:** Request `--output-format json` and parse the structured response. Treat any parse failure as a classification error (fall back to default model), and log the raw output for debugging.
+
+### 14. `gitSafe` is inconsistently applied in teardownSandbox
+
+**File:** `internal/executor/claude.go`
+
+`setupSandbox` wraps all git commands with `gitSafe(...)` (adds `-c safe.directory=*`). `teardownSandbox` calls `git` directly without `gitSafe` on most commands. This means teardown can fail with "dubious ownership" errors in mixed-owner environments.
+
+**Fix:** Use `gitSafe` consistently in `teardownSandbox`.
+
+### 15. `tailFile` loads entire file into memory
+
+**File:** `internal/executor/claude.go`
+
+```go
+func tailFile(path string, n int) string {
+ // scans all lines, keeps last n
+}
+```
+
+For large log files this reads everything before discarding early lines. Fine in practice (stderr is typically small) but fragile if anything large is written to stderr.
+
+**Fix:** Use a circular buffer of n lines to avoid O(filesize) memory.
+
+### 16. `execute()` and `executeResume()` duplicate context/cancel/defer setup
+
+**File:** `internal/executor/executor.go`
+
+Both methods contain identical blocks for: context timeout + cancel setup, `activePerAgent` increment/decrement, `doneCh` signal, cancel registration/deregistration. This is ~40 lines duplicated.
+
+**Fix:** Extract `withWorkerSlot(ctx, t, fn func(ctx context.Context) error) error` that handles the common setup and calls `fn`. Both paths become 5–10 lines.
+
+### 17. `QuestionRegistry` and `QuestionHandler` are unused
+
+**Files:** `internal/executor/question.go`, `internal/executor/executor.go`
+
+`QuestionRegistry` is created in `NewPool` and stored on `Pool.Questions` but nothing reads from it. `QuestionHandler` interface exists but is never instantiated or wired to anything. The intended design (pool detects `tool_use` and invokes a handler) is not implemented.
+
+**Fix:** Remove `QuestionRegistry`, `QuestionHandler`, and `Pool.Questions` until the feature is actually needed. The current question flow (agent writes file, pool reads BlockedError) already works without them.
+
+### 18. Script registry endpoint is undocumented and empty
+
+**File:** `internal/api/scripts.go`
+
+The `ScriptRegistry` type and `POST /api/scripts/{name}` endpoint are defined but the registry is never populated from config. There are no example scripts. The endpoint exists in production code but can't be used.
+
+**Fix:** Either populate scripts from config (`[[scripts]]` entries in config.toml) with documentation, or remove the endpoint until it's ready.
+
+### 19. `parseStream` ignores scanner errors
+
+**File:** `internal/executor/claude.go`
+
+`parseStream()` loops over `scanner.Scan()` but never checks `scanner.Err()` after the loop. If the scanner hits a read error mid-stream, it silently stops. The function returns `nil` error, making the task appear to complete successfully when it may have lost output.
+
+**Fix:**
+```go
+if err := scanner.Err(); err != nil {
+ if streamErr == nil {
+ streamErr = fmt.Errorf("reading claude stdout: %w", err)
+ }
+}
+```
+
+### 20. Log files have no size limit or rotation
+
+**Files:** `internal/executor/claude.go`, `internal/storage/db.go`
+
+Execution logs in `~/.claudomator/executions/` have no size limit, rotation, or TTL. Long-running or repeatedly-retried tasks can produce arbitrarily large log files. Old executions are never cleaned up.
+
+**Fix (minimal):** Add a `CleanupOldExecutions(olderThan time.Duration)` method to `storage.DB` and call it on server startup (or on a ticker). Delete execution log directories for executions older than the TTL.
+
+---
+
+## Summary by Effort
+
+| # | Issue | Effort | Impact |
+|---|-------|--------|--------|
+| 1 | `sandboxDir` shadowing | XS (2 chars: `=` not `:=`) | Critical — root cause of git clone failures |
+| 2 | Hardcoded `master` branch | S | High — breaks all `main`-branch repos |
+| 3 | GeminiRunner stub | M–L | High — silent wrong output in production |
+| 4 | Atomic exec creation | S | Medium — crash-recovery edge case |
+| 5 | context.Background() resume | XS | Low — shutdown correctness |
+| 6 | Pool shutdown | S | Medium — goroutine leak on shutdown |
+| 7 | Priority dispatch | M | Medium — misleading API |
+| 8 | RetryConfig unused | S | Low — misleading API |
+| 9 | Changestats duplication | XS | Low — confusing code |
+| 10 | Shallow copy in withFailureHistory | S | Low — latent bug |
+| 11 | Migration versioning | M | Medium — operational risk |
+| 12 | Test seams in Server struct | M | Low — architecture cleanliness |
+| 13 | Brittle classifier parsing | S | Medium — silent classification failures |
+| 14 | gitSafe inconsistency | XS | Medium — owner errors in teardown |
+| 15 | tailFile memory | XS | Low — theoretical |
+| 16 | execute/executeResume duplication | M | Low — maintainability |
+| 17 | QuestionRegistry dead code | XS | Low — confusion |
+| 18 | Script registry empty | S | Low — dead endpoint |
+| 19 | parseStream ignores scanner.Err | XS | Medium — silent data loss |
+| 20 | Log file cleanup | S | Medium — disk usage |
diff --git a/docs/RAW_NARRATIVE.md b/docs/RAW_NARRATIVE.md
deleted file mode 100644
index 834d812..0000000
--- a/docs/RAW_NARRATIVE.md
+++ /dev/null
@@ -1,399 +0,0 @@
-# Claudomator: Development Narrative
-
-This document is a chronological engineering history of the Claudomator project,
-reconstructed from the git log, ADRs, and source code.
-
----
-
-## 1. Initial commit — core scaffolding (2e2b218)
-
-The project started with a single commit that established the full skeleton:
-task model, executor, API server, CLI, storage layer, and reporter. The Go module
-was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a
-`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model,
-instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was
-chosen as the storage backend (see ADR-001). The executor pool used a bounded
-goroutine model. The API server was plain `net/http` with no external framework.
-The CLI was Cobra.
-
-## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d)
-
-Early housekeeping: added JSON struct tags to all exported types, renamed the Go
-module to its final identifier, and set up the `.gitignore` to exclude the compiled
-binary and local Claude settings.
-
-## 3. Verbose flag, logs CLI command (0377c06, f27d4f7)
-
-Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand
-for tailing execution output.
-
-## 4. Embedded web UI and HTTP wiring (135d8eb)
-
-The first web UI was embedded into the binary using `go:embed`. This made the
-binary fully self-contained: no separate static file server was needed.
-
-## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007)
-
-Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to
-the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the
-`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational
-subtask plumbing.
-
-## 6. Dependency waiting and planning preamble (f527972)
-
-The executor gained dependency waiting: tasks with `depends_on` now block in a
-polling loop until all dependencies reach `COMPLETED`. Any dependency entering a
-terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`)
-immediately fails the waiting task.
-
-The planning preamble was also introduced here — a system prompt prefix injected
-into every task's instructions that explains to the agent how to write question
-files, how to break tasks into subtasks via the `claudomator` CLI, and how to
-commit all changes in git sandboxes.
-
-## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740)
-
-The API gained several new endpoints:
-- `POST /api/elaborate` — calls Claude to expand a brief task description into
- structured YAML.
-- `GET /api/executions/{id}/stream` — live-streams the execution log.
-- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed).
-- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task.
-
-## 8. Web UI: tabs, new task modal, templates panel (e8d1b80)
-
-The web UI got a tabbed layout (Running / Done / Templates), a modal for creating
-new tasks with AI-drafted instructions, and a templates panel. This was the first
-version of the UI that matched the current design.
-
-## 9. READY state for human-in-the-loop review (6511d6e)
-
-A critical design point: when a top-level task's runner exits successfully, the
-task does not immediately go to `COMPLETED`. Instead it transitions to `READY`,
-meaning it paused for the operator to review the agent's output and explicitly
-accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`.
-`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`.
-
-This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY
-and go directly to `COMPLETED` — only the root task requires human sign-off.
-
-## 10. Fix working_dir failures, hardcoded /root removed (3962597)
-
-Early deployments hardcoded `/root` as the base path for `working_dir`. This was
-removed. `working_dir` is now validated to exist before the subprocess starts.
-
-## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4)
-
-Added the `scripts/` directory with `debug-execution` (inspects a specific
-execution's logs) and `deploy` (builds and deploys the binary to the production
-server). Added a CLI `start` command and the `version` package.
-
-## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444)
-
-A batch of features rescued from a detached-work branch:
-- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json`
- file before exiting. The pool detects this and transitions the task to `BLOCKED`,
- storing the question for the user. `POST /api/tasks/{id}/answer` resumes the
- Claude session with the user's answer injected as the next message.
-- **Rate limiting**: the pool tracks which agents are rate-limited and when.
- `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from
- 5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline.
-- **Start-next-task script**: a shell script that picks the highest-priority pending
- task and starts it.
-
-## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3)
-
-The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state
-and a "Start Next" button in the header that triggers the `start-next-task` script.
-
-## 14. Stream-level failure detection when claude exits 0 (4c0ee5c)
-
-Claude can exit 0 even when the task actually failed — for example when the
-permission mode denies a tool_use and Claude exits politely. `parseStream` was
-updated to detect `is_error: true` in the result message and
-`tool_result.is_error: true` with permission-denial text, returning an error in
-both cases so the task goes to `FAILED` rather than silently succeeding.
-
-## 15. Persist log paths at CreateExecution time (f8b5f25)
-
-Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the
-execution record at `UpdateExecution` time (after the subprocess finished). This
-prevented live log tailing. Introduced the `LogPather` interface: runners that
-implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling
-`CreateExecution`, making them available for streaming before the process ends.
-
-## 16. bypassPermissions as executor default (a33211d)
-
-`permission_mode` defaults to `bypassPermissions` when not set in the task YAML.
-This was a deliberate trade-off: unattended automation needs to proceed without
-tool-use confirmation prompts. Operators can override per-task via `permission_mode`.
-
-## 17. Cancel endpoint and pool cancel mechanism (3672981)
-
-`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map
-from taskID to context cancel functions. Cancellation sends a SIGKILL to the
-entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and
-bash children that the claude subprocess spawned.
-
-## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace)
-
-The full BLOCKED cycle was wired end-to-end:
-1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits.
-2. Runner detects the file and returns `*BlockedError`.
-3. Pool transitions task to `BLOCKED` and stores the question JSON.
-4. User answers via `POST /api/tasks/{id}/answer`.
-5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID`
- and `ResumeAnswer`.
-6. Runner invokes `claude --resume <session-id> -p <answer>`.
-
-A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`,
-causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution`
-returned no session ID.
-
-## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c)
-
-Resume executions now use `context.Background()` instead of inheriting a potentially
-stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled
-tasks can be manually restarted.
-
-## 20. git sandbox execution, project_dir rename (1f36e23)
-
-The `working_dir` field was renamed to `project_dir` across all layers (task YAML,
-storage, API, UI). When `project_dir` is set, the runner no longer executes
-directly in that directory. Instead it:
-
-1. Detects whether `project_dir` is a git repo (initialising one if not).
-2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks`
- to avoid permission issues with mixed-owner `.git/objects`).
-3. Runs the agent in the sandbox clone.
-4. After the agent exits, verifies no uncommitted changes remain and pushes
- new commits to the canonical bare repo.
-5. Removes the sandbox.
-
-On BLOCKED, the sandbox is preserved so the agent can resume where it left off
-in the same working tree.
-
-Concurrent push conflicts (two sandboxes pushing at the same time) are handled
-by a fetch-rebase-retry sequence.
-
-## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2)
-
-`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If
-the transition is not allowed by the state machine, the function returns an error
-and no write occurs. This is the enforcement point for the state machine invariants.
-
-## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97)
-
-The previous pool rejected `Submit` when all slots were taken. This was replaced
-with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted
-while the pool is at capacity are buffered in the channel and picked up as soon
-as a slot opens. `Submit` now only returns an error if the channel itself is full
-(which requires an enormous backlog).
-
-## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376)
-
-Several API reliability improvements:
-- WebSocket connections now require an API token (if `SetAPIToken` was called) and
- are capped at a configurable maximum number of clients. A ping/pong keepalive
- prevents stale connections from accumulating.
-- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate`
- endpoints to prevent abuse.
-- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of
- individual handlers per script, a single handler dispatches to registered scripts
- by name.
-
-## 24. API: extend executions and log streaming endpoints (7914153)
-
-`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs`
-was added for fetching completed log files. Live streaming via SSE and the log
-tail endpoint were polished.
-
-## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6)
-
-CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared
-HTTP client, a default server URL (`http://localhost:8484`). Added the `report`
-CLI subcommand for fetching execution summaries from the server.
-
-## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822)
-
-This was a major refactor over several commits:
-1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`,
- `"gemini"`, etc.).
-2. `Pool` was changed from holding a single `ClaudeRunner` to holding a
- `map[string]Runner` — one runner per agent type.
-3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the
- `gemini` CLI.
-4. The storage layer, API handlers, elaborate/validate endpoints, and all tests
- were updated to use `AgentConfig`.
-5. The web UI was updated to expose agent type selection.
-
-## 27. Gemini-based task classification and explicit load balancing (406247b)
-
-`Classifier` and `pickAgent` were introduced to automate agent and model selection:
-
-- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available
- (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active
- if all agents are rate-limited.
-- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick
- the best model for the task. This is intentionally model-picks-model: use a fast,
- cheap classifier to avoid wasting expensive tokens.
-
-After this commit the flow is: `execute()` → pick agent → call classifier → set
-`t.Agent.Type` and `t.Agent.Model` → dispatch to runner.
-
-## 28. ADR-003: Security Model (93a4c85)
-
-The security model was documented formally: no auth, permissive CORS, `bypassPermissions`
-as default, and the known risk inventory (see `docs/adr/003-security-model.md`).
-
-## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30)
-
-Running tasks became the default view. A "Running view" showing currently running
-tasks alongside the 24h execution history was added. Agent type and model were
-surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h.
-
-## 30. Quota exhaustion detection from stream (076c0fa)
-
-Previously, quota exhaustion (the 5-hour usage limit) was treated identically to
-generic failures. `isQuotaExhausted` was introduced to distinguish it: quota
-exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the
-agent, rather than failing the task with a generic error.
-
-## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac)
-
-The sandbox teardown strategy was revised: instead of pushing directly into the
-working copy (which fails for non-bare repos), the sandbox pushes to a bare repo
-(`remote "local"` or `remote "origin"`) and the working copy is pulled separately
-by the developer. This avoids permission errors from mixed-owner `.git/objects`.
-The `--no-hardlinks` clone flag was added to prevent object sharing.
-
-## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46)
-
-When a top-level task exits the runner successfully but has subtasks, it transitions
-to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new
-`maybeUnblockParent` function is called every time a subtask completes: if all
-siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is
-presented for operator review.
-
-## 33. Stale RUNNING task recovery on server startup (9159572)
-
-`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for
-tasks still in `RUNNING` state (left over from a previous server crash) and marks
-them `FAILED`, closing their open execution records. This prevents stuck tasks
-after server restarts.
-
-## 34. API: configurable mockRunner, async error-path tests (b33566b)
-
-The `api` test suite was hardened with a configurable `mockRunner` that can be
-injected into the test server. Async error paths (runner returns an error, DB
-update fails mid-execution) were now exercised in tests.
-
-## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409)
-
-Several storage correctness fixes:
-- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`,
- `idx_executions_task_id`, and `idx_executions_start_time` indexes were added.
-- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions;
- tests were added to catch this.
-- `DeleteTask` was made atomic using a recursive CTE to delete the task and all
- its subtasks in a single transaction.
-
-## 36. API: validate ?state= param, standardize operation response shapes (933af81)
-
-`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation
-responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised
-to return `{"status": "ok"}` on success.
-
-## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c)
-
-Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through
-classification again so they pick up the latest agent/model selection logic. The
-post-run error classification block was extracted into `handleRunResult` — a shared
-helper called by both `execute` and `executeResume` — eliminating 60+ lines of
-duplication.
-
-## 38. Legacy Claude field removed (b4371d0, a782bbf)
-
-The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir`
-shim were removed. The schema is now fully generic.
-
-## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e)
-
-A documented invariant was added to the `execOnce` goroutine that kills the
-subprocess process group: it cannot block indefinitely. Tests were added to verify
-no goroutine leak occurs when a task is cancelled.
-
-## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b)
-
-The classifier now skips calling itself if the selected agent is rate-limited,
-avoiding a redundant Gemini API call when the rate-limited agent is already known.
-The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`,
-`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`,
-`gemini-2.5-flash`, `gemini-2.5-pro`).
-
-## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b)
-
-Two map leak bugs were fixed in the pool:
-- `activePerAgent[agentType]` was decremented but never deleted when the count hit
- zero, so inactive agents accumulated as dead entries.
-- Expired `rateLimited[agentType]` entries were not deleted, so the map grew
- unboundedly over long runs.
-
-## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624)
-
-The sandbox teardown removed the `git pull` into the working copy (which was failing
-due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to
-detect `"fetch first"` and `"non-fast-forward"` as the rejection signals.
-
-## 43. Explicit load balancing separated from classification (e033504)
-
-Previously the `Classifier` both picked the agent and selected the model. This was
-split: `pickAgent` is deterministic code that picks the agent from the registered
-runners using the load-balancing algorithm. The `Classifier` only picks the model
-for the already-selected agent. This makes load balancing reliable and fast even
-when the Gemini classifier is unavailable.
-
-## 44. Session ID fix on second block-and-resume cycle (65c7638)
-
-A bug was found where the second BLOCKED→answer→resume cycle passed the wrong
-`--resume` session ID to Claude. The fix ensures that resume executions propagate
-the original session ID rather than the new execution's UUID.
-
-## 45. validTransitions promoted to package-level var (3226af3)
-
-`validTransitions` was promoted to a package-level variable in `internal/task/task.go`
-for clarity and potential reuse outside the package. ADR-002 was updated to reflect
-the current state machine including the `BLOCKED→READY` transition for parent tasks.
-
----
-
-## Feature Summary (current state)
-
-| Feature | Status |
-|---|---|
-| Task YAML parsing, batch files | Done |
-| SQLite persistence | Done |
-| REST API (CRUD + lifecycle) | Done |
-| WebSocket real-time events | Done |
-| Claude subprocess execution | Done |
-| Gemini subprocess execution | Done |
-| Explicit load balancing (pickAgent) | Done |
-| Gemini-based model classification | Done |
-| BLOCKED / question-answer / resume | Done |
-| git sandbox isolation | Done |
-| Subtask creation and unblocking | Done |
-| READY state / human accept-reject | Done |
-| Rate-limit and quota tracking | Done |
-| Stale RUNNING recovery on startup | Done |
-| Per-IP rate limiter on elaborate | Done |
-| Web UI (PWA) | Done |
-| Push notifications (PWA) | Planned |
-
---- 2026-03-16T00:56:20Z ---
-Converter sudoku to rust
-
---- 2026-03-16T01:14:27Z ---
-For claudomator tasks that are ready, check the deployed server version against their fix commit
-
---- 2026-03-16T01:17:00Z ---
-For every claudomator task that is ready, display on the task whether the currently deployed server includes the commit which fixes that task
diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md
index b374561..0c9ef14 100644
--- a/docs/adr/005-sandbox-execution-model.md
+++ b/docs/adr/005-sandbox-execution-model.md
@@ -1,7 +1,7 @@
# ADR-005: Git Sandbox Execution Model
## Status
-Accepted
+Superseded by [ADR-006](006-containerized-execution.md)
## Context
@@ -69,9 +69,13 @@ state), the sandbox is **not** torn down. The preserved sandbox allows the
resumed execution to pick up the same working tree state, including any
in-progress file changes made before the agent asked its question.
-Resume executions (`SubmitResume`) skip sandbox setup entirely and run
-directly in `project_dir`, passing `--resume <session-id>` to the agent
-so Claude can continue its previous conversation.
+**Known Risk: Resume skips sandbox.** Current implementation of
+Resume executions (`SubmitResume`) skips sandbox setup entirely and runs
+directly in `project_dir`. This is a significant behavioral divergence: if a
+resumed task makes further changes, they land directly in the canonical working
+copy, reintroducing the concurrent corruption and partial-work leak risks
+identified in the Context section. A future iteration should ensure resumed
+tasks pick up the preserved sandbox instead.
### Session ID propagation on resume
@@ -113,10 +117,15 @@ The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as
directory the server process inherited.
- If a sandbox's push repeatedly fails (e.g. due to a bare repo that is
itself broken), the task is failed with the sandbox preserved.
-- If `/tmp` runs out of space (many large sandboxes), tasks will fail at
- clone time. This is a known operational risk with no current mitigation.
-- The `project_dir` field in task YAML must point to a git repository with
- a configured `"local"` or `"origin"` remote that accepts pushes.
+- **If `/tmp` runs out of space** (many large sandboxes), tasks will fail at
+ clone time. This is a known operational risk. Mitigations such as periodic
+ cleanup of old sandboxes (cron) or pre-clone disk space checks are required
+ as follow-up items.
+- **The `project_dir` field in task YAML** must point to a git repository with
+ a configured `"local"` or `"origin"` remote that accepts pushes. If neither
+ remote exists or the push is rejected for other reasons, the task will be
+ marked as `FAILED` and the sandbox will be preserved for manual recovery.
+
## Relevant Code Locations
diff --git a/docs/adr/006-containerized-execution.md b/docs/adr/006-containerized-execution.md
new file mode 100644
index 0000000..cdd1cc2
--- /dev/null
+++ b/docs/adr/006-containerized-execution.md
@@ -0,0 +1,51 @@
+# ADR-006: Containerized Repository-Based Execution Model
+
+## Status
+Accepted (Supersedes ADR-005)
+
+## Context
+ADR-005 introduced a sandbox execution model based on local git clones and pushes back to a local project directory. While this provided isolation, it had several flaws identified during early adoption:
+1. **Host pollution**: Build dependencies (Node, Go, etc.) had to be installed on the host and were subject to permission issues (e.g., `/root/.nvm` access for `www-data`).
+2. **Fragile Pushes**: Pushing to a checked-out local branch is non-standard and requires risky git configs.
+3. **Resume Divergence**: Resumed tasks bypassed the sandbox, reintroducing corruption risks.
+4. **Scale**: Local directory-based "project selection" is a hack that doesn't scale to multiple repos or environments.
+
+## Decision
+We will move to a containerized execution model where projects are defined by canonical repository URLs and executed in isolated containers.
+
+### 1. Repository-Based Projects
+- The `Task` model now uses `RepositoryURL` as the source of truth for the codebase.
+- This replaces the fragile reliance on local `ProjectDir` paths.
+
+### 2. Containerized Sandboxes
+- Each task execution runs in a fresh container (Docker/Podman).
+- The runner clones the repository into a host-side temporary workspace and mounts it into the container.
+- The container provides a "bare system" with the full build stack (Node, Go, etc.) pre-installed, isolating the host from build dependencies.
+
+### 3. Unified Workspace Management (including RESUME)
+- Unlike ADR-005, the containerized model is designed to handle **Resume** by re-attaching to or re-mounting the same host-side workspace.
+- This ensures that resumed tasks **do not** bypass the sandbox and never land directly in a production directory.
+
+### 4. Push to Actual Remotes
+- Agents commit changes within the sandbox.
+- The runner pushes these commits directly to the `RepositoryURL` (actual remote).
+- If the remote is missing or the push fails, the task is marked `FAILED` and the host-side workspace is preserved for inspection.
+
+## Rationale
+- **Isolation**: Containers prevent host pollution and ensure a consistent build environment.
+- **Safety**: Repository URLs provide a standard way to manage codebases across environments.
+- **Consistency**: Unified workspace management for initial runs and resumes eliminates the behavioral divergence found in ADR-005.
+
+## Consequences
+- Requires a container runtime (Docker) on the host.
+- Requires pre-built agent images (e.g., `claudomator-agent:latest`).
+- **Disk Space Risk**: Host-side clones still consume `/tmp` space. Mitigation requires periodic cleanup of old workspaces or disk-space monitoring.
+- **Git Config**: Repositories no longer require `receive.denyCurrentBranch = updateInstead` because we push to the remote, not a local worktree.
+
+## Relevant Code Locations
+| Concern | File |
+|---|---|
+| Container Lifecycle | `internal/executor/container.go` |
+| Runner Registration | `internal/cli/serve.go` |
+| Task Model | `internal/task/task.go` |
+| API Integration | `internal/api/server.go` |
diff --git a/docs/adr/007-planning-layer-and-story-model.md b/docs/adr/007-planning-layer-and-story-model.md
new file mode 100644
index 0000000..7efb66d
--- /dev/null
+++ b/docs/adr/007-planning-layer-and-story-model.md
@@ -0,0 +1,265 @@
+# ADR-007: Planning Layer, Task Hierarchy, and Story-Gated Deployment
+
+**Status:** Draft
+**Date:** 2026-03-19
+**Context:** Design discussion exploring the integration of Claudomator with Doot and a richer task hierarchy model.
+
+---
+
+## Context
+
+Claudomator currently operates as a flat queue of tasks, each with optional subtasks (`parent_task_id`). There is no concept of grouping tasks into shippable units, no deploy automation, and no integration with personal planning tools. Separately, Doot is a personal dashboard that aggregates tasks, meals, calendar events, and bugs from third-party services (Todoist, Trello, PlanToEat, Google Calendar) into a unified `Atom` model.
+
+The goal of this ADR is to capture a design direction that:
+
+1. Integrates Claudomator into Doot as a first-class data source
+2. Introduces a four-level task hierarchy (Epic → Story → Task → Subtask)
+3. Defines a branching and execution model for stories
+4. Establishes stories as the unit that gates deployment
+
+---
+
+## Decision
+
+### 1. Claudomator as an Atom Source in Doot
+
+Doot already normalizes heterogeneous data sources into a unified `Atom` model (see `internal/models/atom.go`). Claudomator tasks are a natural peer to Todoist and Trello — they are their own source of truth (SQLite, full execution history) and should be surfaced in Doot's aggregation views without duplication elsewhere.
+
+**Design:**
+- Add `SourceClaudomator AtomSource = "claudomator"` to Doot's atom model
+- Implement a Claudomator API client in `internal/api/claudomator.go` (analogous to `todoist.go`, `trello.go`)
+- Map Claudomator tasks to `Atom` with appropriate priority, status, and source icon
+- Individual subtasks are **not** surfaced in the Doot timeline — they are execution-level details, not planning-level items
+
+**Rationale:** Claudomator is a peer to other task sources, not subordinate to them. Users should not need a Todoist card to track agent work — Claudomator is the source of truth for that domain.
+
+---
+
+### 2. Four-Level Task Hierarchy
+
+The current flat model (task + optional subtask) is insufficient for feature-scale work. The following hierarchy is adopted:
+
+| Level | Name | Description |
+|---|---|---|
+| 4 | **Epic** | Large design initiative requiring back-and-forth, resulting in a set of stories. Lives primarily in the planning layer (Doot). Not an execution unit. |
+| 3 | **Story** | A shippable slice of work. Independent and deployable on its own. Groups tasks that together constitute a releasable change. The unit that gates deployment. |
+| 2 | **Task** | A feature- or bug-level unit of work. Individually buildable, but may not make sense to ship alone. Belongs to a story. |
+| 1 | **Subtask** | A discrete, ordered agent action. The actual Claudomator execution unit. Belongs to a task. Performed in sequence. |
+
+**Key properties:**
+- Stories are independently shippable — deployment is gated at this level
+- Tasks are individually buildable but do not gate deployment alone
+- Subtasks are the agent execution primitive — what `ContainerRunner` actually runs
+- Epics are planning artifacts; they live in Doot or a future planning layer, not in Claudomator's execution model
+- Scheduling prefers picking up subtasks from **already-started stories** before beginning new ones (WIP limiting)
+
+**Claudomator data model changes required:**
+- Add `stories` table with deploy configuration and status
+- Add `story_id` to tasks (foreign key to stories)
+- `repository_url` moves from individual tasks to stories (all tasks in a story operate on the same repo)
+- Story status is derived: all tasks completed → story is shippable
+
+---
+
+### 3. Story-Level Branching Model
+
+Each story has a dedicated Git branch. Subtasks execute sequentially, each cloning the repository at the story branch's current HEAD, making commits, and pushing back before the next subtask begins.
+
+**Model:** One branch per story. Fresh clone + container per subtask. Subtasks commit to the story branch in sequence.
+
+**Properties:**
+
+- **Each subtask sees all prior subtask work** — it clones the story branch at HEAD, which includes all previous subtask commits
+- **Clean environment per subtask** — no filesystem state leaks between subtasks; the container is ephemeral
+- **Ordered execution enforced** — subtasks run strictly in order; each depends on the previous commit
+- **Reviewable history** — the story branch accumulates one commit per subtask, giving a clean, auditable record before merge
+- **Clear recovery points** — if subtask N fails, roll back to subtask N-1's commit, fix the subtask definition, rerun
+- **Resilient to transient API failures** — transient rate-limit errors (429) do not fail the story or subtask; the executor requeues the task and "pauses" the story until the agent is unblocked, preserving the sequential chain.
+
+**Tradeoffs accepted:**
+- Clone and container creation cost is paid per subtask (not amortized across the story). Acceptable at current usage scale.
+- No parallelism within a story — subtasks are strictly sequential by design
+- Concurrency lock required at the story level to prevent two subtasks running simultaneously (e.g., on retry races)
+
+**Rejected alternatives:**
+
+*Isolated commit model (fresh clone per subtask, independent branches):* Clean but subtasks cannot build on each other's work. Requires careful branch ordering and merging to assemble a story.
+
+*Persistent workspace per story (one container, one clone for the life of the story):* More efficient, natural continuity, but a bad subtask can corrupt the workspace for subsequent subtasks. Recovery is harder. Loses the discipline of enforced commit points.
+
+### Sequential Subtask Execution
+
+Subtasks within a story execute sequentially. This is enforced via `depends_on` links set automatically at task creation time — each subtask added to a story gets `depends_on: [previous_subtask_id]`, forming a linear chain. The existing pool dependency mechanism handles the rest.
+
+**Rejected alternative — pool-level story concurrency lock:** Would require the executor to become story-aware, lock state would be in-memory (fragile across restarts), and the ordering would be invisible in the data model. The `depends_on` approach is durable, inspectable, and reuses existing infrastructure. The 5-second polling delay between subtasks is an accepted tradeoff.
+
+---
+
+### 4. Story-Gated Deployment and Agent Validation
+
+Deployment is triggered at the story level, not the task or subtask level.
+
+#### State Machine
+
+```
+PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY
+ ↘ NEEDS_FIX → IN_PROGRESS (retry)
+```
+
+- **SHIPPABLE:** All tasks completed. Ready to merge and deploy.
+- **DEPLOYED:** Merged to main, deploy triggered.
+- **VALIDATING:** Validation agent is running.
+- **REVIEW_READY:** Validation passed. Awaiting human sign-off.
+- **NEEDS_FIX:** Validation failed. Story returns to `IN_PROGRESS` with the validation report attached.
+
+#### Merge Strategy
+
+Merge to main first, then validate against the live deployment. No branch review phase — tests are the confidence mechanism. If test coverage is insufficient for a given story, the implementor is responsible for adding tests before marking it shippable. Branch review may be introduced later if needed.
+
+#### Deploy Configuration
+
+Stored on the story. Two project types are handled:
+
+| Project | Deploy trigger | What "deployed" means |
+|---|---|---|
+| claudomator | `git push` to local bare repo → systemd pulls and restarts | Live at `doot.terst.org` |
+| nav (Android) | `git push` to GitHub → CI build action fires | APK distributed to testers via Play Store testing track |
+
+For nav, Claudomator does not interact with GitHub CI directly — it pushes the branch/commits; the CI action is an external trigger. "Deployed" is declared once the push succeeds; the CI result is not polled.
+
+#### Agent Validation
+
+After a story is deployed, a validation subtask is automatically created. The elaborator is responsible for specifying how validation should be performed — it has full context of what changed and can prescribe the appropriate check level.
+
+**Validation spec** (produced by elaborator, stored on the story):
+
+```yaml
+validation:
+ type: curl # curl | tests | playwright | gradle
+ steps:
+ - "GET /api/stats — expect 200, body contains throughput[]"
+ - "GET /api/agents/status — expect agents array non-empty"
+ success_criteria: "All steps return expected responses with correct structure"
+```
+
+**Validation types by project:**
+
+| Type | When to use | What the agent does |
+|---|---|---|
+| `curl` | API changes, data model additions, simple UI text | HTTP requests, check status codes and response shape |
+| `tests` | Logic changes with existing test coverage | Runs the project test suite against the live deployment or codebase |
+| `playwright` | Subtle UI changes, interactive flows, visual correctness | Browser automation against the deployed URL |
+| `gradle` | nav (Android) — any change | `./gradlew test`, `./gradlew lint`; optionally `./gradlew assembleDebug` |
+
+The elaborator selects `type` based on change scope. Curl is the default for small targeted changes; playwright is reserved for changes where visual or interactive correctness cannot be inferred from API responses alone.
+
+**Validation agent inputs:**
+- The validation spec (type, steps, success_criteria)
+- Deployed URL or project path
+- Summary of what changed (story name + task list)
+
+**Validation agent outputs:**
+- Structured pass/fail per step
+- Evidence (response bodies, test output excerpts, screenshots for playwright)
+- Overall verdict: pass → story moves to `REVIEW_READY`; fail → story moves to `NEEDS_FIX` with report attached
+
+Validation subtasks are governed by the same pool-level rate-limit resilience; a 429 during validation will requeue the subtask rather than failing the story.
+
+#### Failure Recovery
+
+If a subtask fails mid-story: pause the story and require human review before resuming. The options at that point are:
+- Roll back to the previous subtask's commit and retry
+- Amend the subtask definition and requeue
+
+Policy beyond this is deferred until failure patterns are observed in practice.
+
+---
+
+## Consequences
+
+**Claudomator changes:**
+- New `stories` table: `id, name, branch_name, project_id, deploy_config, validation_json, status`
+- New `projects` table: `id, name, remote_url, local_path, type, deploy_script`
+- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story → project)
+- Sequential subtask ordering via auto-wired `depends_on` at task creation time
+- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → merge + deploy trigger
+- Post-deploy: auto-create validation subtask from story's `validation_json` spec
+- Validation subtask completes → story transitions to REVIEW_READY or NEEDS_FIX
+- Story state machine: PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX
+- `ContainerRunner`: clone at story branch HEAD; push back to story branch after each subtask
+- Deployment status check moves from task level to story level
+- Elaborator output extended: `validation` block (type, steps, success_criteria) stored as `validation_json` on story
+- Remove `Agent.RepositoryURL`, `Agent.ProjectDir` legacy fields, `skip_planning`, `fallbackGitInit()`
+- Remove duplicate changestats extraction (keep pool-side, remove API server-side)
+- Pool-level "requeue-and-skip" logic for rate-limited agents: tasks return to `QUEUED` and release worker slots if all candidate agents are blocked, allowing the system to "wait out" 429 errors without failing stories.
+- Background "Recovery Scheduler" goroutine: periodically (every 30m or as hinted by API) runs minimal "test tasks" to verify agent availability and unblock the pool.
+
+**Doot changes:**
+- New `SourceClaudomator` atom source
+- Claudomator API client (`internal/api/claudomator.go`)
+- Story → Atom mapper (title = story name, description = task progress e.g. "3/5 tasks done", priority from story config, deploy status)
+- Task → Atom mapper (optional, feature-level visibility)
+- Individual subtasks explicitly excluded from all views
+
+**Doot removals (dead code / superseded):**
+- `bugs` table, `BugToAtom`, `SourceBug`, `TypeBug` — bug reporting becomes a thin UI shim that submits to Claudomator; nothing stored in Doot's data model
+- `notes` table and all Obsidian skeleton code — never wired up
+- `AddMealToPlanner()` stub — never called
+- `UpdateCard()` stub — never called
+- All bug handlers, templates, and store methods
+
+**Planning layer (future):**
+- Epics live here, not in Claudomator
+- Story creation via elaboration + validation flow (see below)
+- WIP-limiting scheduler that prefers subtasks from started stories
+
+### Story Creation: Elaboration and Validation Flow
+
+Story creation is driven by a beefed-up version of Claudomator's existing elaboration and validation pipeline, not a YAML file or form.
+
+**Flow:**
+1. User describes the story goal (rough, high-level) in the UI
+2. Elaboration agent runs against a **local working copy** of the project (read-only mount, no clone) — reads the codebase, understands current state, produces story + task + subtask breakdown with `depends_on` chain wired
+3. Validation agent checks the structure: tasks are independently buildable, subtasks properly scoped, story has a clear shippable definition, no dependency cycles
+4. User reviews and approves in the UI
+5. On approval: story branch created (`git checkout -b story/xxx origin/main`, pushed to remote); subtasks queued
+
+**Responsiveness:**
+- Elaboration uses a local working copy — no clone cost, near-instant container start
+- A `git fetch` (not pull) at elaboration start updates remote refs without touching the working tree
+- Branch creation is deferred to approval — elaboration agent is purely read-only
+- Execution clones use `git clone --reference /local/path <remote>` — reuses local object store, fetches only the delta; significantly faster than cold clone
+- Rate-limit aware — if the elaboration agent is blocked, the UI surfaces the status and resumes automatically once unblocked via the Recovery Scheduler.
+
+### Project Registry
+
+The local working copy model requires a formal project concept. A `projects` table replaces the current ad-hoc `repository_url` + `working_dir` fields:
+
+| Field | Purpose |
+|---|---|
+| `id` | UUID |
+| `name` | Human-readable label |
+| `remote_url` | Git remote (clone target for execution) |
+| `local_path` | Local working copy path (read cache for elaboration, object store for `--reference` clones) |
+| `type` | `web` \| `android` — controls available validation types and deploy semantics |
+| `deploy_script` | Optional path to project-specific deploy script |
+
+`repository_url` on stories becomes a FK to `projects`. The existing `project` string field on tasks (currently just a label) is replaced by `project_id`. `Agent.RepositoryURL`, `Agent.ProjectDir`, and `Task.RepositoryURL` are all removed — project is the single source of truth for repo location.
+
+**Initial registered projects:**
+
+| Name | Local path | Remote | Type |
+|---|---|---|---|
+| claudomator | `/workspace/claudomator` | local bare repo | web |
+| nav | `/workspace/nav` | GitHub | android |
+
+---
+
+## Out of Scope (for now)
+
+- Voice interface (noted as a future exploration, not an architectural requirement)
+- Epic management tooling
+- Parallelism within stories
+- Branch review before merge — deferred; merge-first is the current strategy. May be revisited if confidence requires it.
+- Polling GitHub CI result for nav deploys — Claudomator declares "deployed" on push success; CI outcome is out of band
+- ADB / emulator-based UI validation for nav — `gradle` type covers unit and integration tests; device UI testing deferred
diff --git a/docs/architecture.md b/docs/architecture.md
deleted file mode 100644
index 27c7601..0000000
--- a/docs/architecture.md
+++ /dev/null
@@ -1,392 +0,0 @@
-# Claudomator — System Architecture
-
-## 1. System Purpose and Design Goals
-
-Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude,
-Gemini), and reports results. Its primary use case is unattended, automated execution of agent
-tasks with real-time status streaming to a mobile Progressive Web App.
-
-**Design goals:**
-
-- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite.
-- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning.
-- **Durability** — all task state survives server restarts (SQLite + WAL mode).
-- **Real-time feedback** — WebSocket pushes task completion events to connected clients.
-- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven
- model-tier selection via a cheap Gemini classifier.
-- **Git-isolated execution** — tasks that modify source code run in a temporary git clone
- (sandbox) to prevent concurrent corruption and partial-work leakage.
-- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before
- reaching `COMPLETED`.
-
-See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale.
-
----
-
-## 2. High-Level Architecture
-
-```mermaid
-flowchart TD
- CLI["CLI\ncmd/claudomator\n(cobra)"]
- API["HTTP API\ninternal/api\nREST + WebSocket"]
- Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"]
- Classifier["Gemini Classifier\ninternal/executor.Classifier"]
- ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"]
- GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"]
- Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"]
- Subprocess["AI subprocess\nclaude -p / gemini"]
- SQLite["SQLite\ntasks.db"]
- LogFiles["Log Files\n~/.claudomator/executions/"]
- Hub["WebSocket Hub\ninternal/api.Hub"]
- Clients["Clients\nbrowser / mobile PWA"]
- Notifier["Notifier\ninternal/notify"]
-
- CLI -->|run / serve| API
- CLI -->|direct YAML run| Pool
- API -->|POST /api/tasks/run| Pool
- Pool -->|pickAgent + Classify| Classifier
- Pool -->|Submit / SubmitResume| ClaudeRunner
- Pool -->|Submit / SubmitResume| GeminiRunner
- ClaudeRunner -->|git clone| Sandbox
- Sandbox -->|claude -p| Subprocess
- GeminiRunner -->|gemini| Subprocess
- Subprocess -->|stream-json stdout| LogFiles
- Pool -->|UpdateTaskState| SQLite
- Pool -->|resultCh| API
- API -->|Broadcast| Hub
- Hub -->|fan-out| Clients
- API -->|Notify| Notifier
-```
-
----
-
-## 3. Component Table
-
-| Package | Role | Key Exported Types |
-|---|---|---|
-| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` |
-| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` |
-| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` |
-| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` |
-| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` |
-| `internal/config` | TOML config loading; data-directory layout | `Config` |
-| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` |
-| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` |
-| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) |
-| `version` | Build-time version string | `Version` |
-
----
-
-## 4. Package Dependency Graph
-
-```mermaid
-graph LR
- cli["internal/cli"]
- api["internal/api"]
- executor["internal/executor"]
- storage["internal/storage"]
- task["internal/task"]
- config["internal/config"]
- reporter["internal/reporter"]
- notify["internal/notify"]
- web["web"]
- version["version"]
-
- cli --> api
- cli --> executor
- cli --> storage
- cli --> config
- cli --> version
- api --> executor
- api --> storage
- api --> task
- api --> notify
- api --> web
- executor --> storage
- executor --> task
- reporter --> task
- reporter --> storage
- storage --> task
-```
-
----
-
-## 5. Task Execution Pipeline
-
-The following numbered steps trace a task from API submission to final state, with file and line
-references to the key logic in each step.
-
-1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`.
- Task is written to SQLite in `PENDING` state.
- (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`)
-
-2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates
- the `PENDING → QUEUED` transition) then `executor.Pool.Submit`.
- (`internal/api/server.go:460`, `internal/executor/executor.go:125`)
-
-3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot
- (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`.
- (`internal/executor/executor.go:102`)
-
-4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest
- active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes
- `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal.
- (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`)
-
-5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite
- every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state).
- (`internal/executor/executor.go:642`)
-
-6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING`
- status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are
- immediately available for tailing.
- (`internal/executor/executor.go:483`)
-
-7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument
- list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume
- execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first.
- (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`)
-
-8. **Output streaming** — stdout is written to `<LogDir>/<execID>/stdout.log`; the runner
- concurrently parses the `stream-json` lines for cost and session ID.
- (`internal/executor/claude.go`)
-
-9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error
- type to a final task state and calls `storage.DB.UpdateTaskState`.
- (`internal/executor/executor.go:256`)
-
- | Outcome | Final state |
- |---|---|
- | `runner.Run` → `nil`, top-level, no subtasks | `READY` |
- | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` |
- | `runner.Run` → `nil`, subtask | `COMPLETED` |
- | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` |
- | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` |
- | `ctx.Err() == Canceled` | `CANCELLED` |
- | quota exhausted | `BUDGET_EXCEEDED` |
- | any other error | `FAILED` |
-
-10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults`
- reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`.
- (`internal/api/server.go:123`, `internal/api/server.go:129`)
-
-11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain,
- `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task
- fails and the sandbox is preserved for inspection.
- (`internal/executor/claude.go:teardownSandbox`)
-
-12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or
- `POST /api/tasks/{id}/reject` (`READY → PENDING`).
- (`internal/api/server.go:487`, `internal/api/server.go:507`)
-
----
-
-## 6. Task State Machine
-
-```mermaid
-stateDiagram-v2
- [*] --> PENDING : task created
-
- PENDING --> QUEUED : POST /run
- PENDING --> CANCELLED : POST /cancel
-
- QUEUED --> RUNNING : pool goroutine starts
- QUEUED --> CANCELLED : POST /cancel
-
- RUNNING --> READY : exit 0, top-level, no subtasks
- RUNNING --> BLOCKED : exit 0, top-level, has subtasks
- RUNNING --> BLOCKED : question.json written
- RUNNING --> COMPLETED : exit 0, subtask
- RUNNING --> FAILED : exit non-zero / stream error
- RUNNING --> TIMED_OUT : context deadline exceeded
- RUNNING --> CANCELLED : context cancelled
- RUNNING --> BUDGET_EXCEEDED : quota exhausted
-
- READY --> COMPLETED : POST /accept
- READY --> PENDING : POST /reject
-
- BLOCKED --> QUEUED : POST /answer
- BLOCKED --> READY : all subtasks COMPLETED
-
- FAILED --> QUEUED : POST /run (retry)
- TIMED_OUT --> QUEUED : POST /resume
- CANCELLED --> QUEUED : POST /run (restart)
- BUDGET_EXCEEDED --> QUEUED : POST /run (retry)
-
- COMPLETED --> [*]
-```
-
-**State definitions** (`internal/task/task.go:9`):
-
-| State | Meaning |
-|---|---|
-| `PENDING` | Created; not yet submitted for execution |
-| `QUEUED` | Submitted to pool; waiting for a goroutine slot |
-| `RUNNING` | Subprocess actively executing |
-| `READY` | Top-level task done; awaiting operator accept/reject |
-| `COMPLETED` | Fully done (only true terminal state) |
-| `FAILED` | Execution error; eligible for retry |
-| `TIMED_OUT` | Exceeded configured timeout; resumable |
-| `CANCELLED` | Explicitly cancelled by operator |
-| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` |
-| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks |
-
-`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state
-write. (`internal/task/task.go:113`)
-
----
-
-## 7. WebSocket Broadcast Flow
-
-```mermaid
-sequenceDiagram
- participant Runner as ClaudeRunner / GeminiRunner
- participant Pool as executor.Pool
- participant API as api.Server (forwardResults)
- participant Hub as api.Hub
- participant Client1 as WebSocket client 1
- participant Client2 as WebSocket client 2
-
- Runner->>Pool: runner.Run() returns
- Pool->>Pool: handleRunResult() — sets exec.Status
- Pool->>SQLite: UpdateTaskState + UpdateExecution
- Pool->>Pool: resultCh <- &Result{...}
- API->>Pool: <-pool.Results()
- API->>API: marshal task_completed JSON event
- API->>Hub: hub.Broadcast(data)
- Hub->>Client1: ws.Write(data)
- Hub->>Client2: ws.Write(data)
- API->>Notifier: notifier.Notify(Event{...}) [if set]
-```
-
-**Event payload** (JSON):
-```json
-{
- "type": "task_completed",
- "task_id": "<uuid>",
- "status": "READY | COMPLETED | FAILED | ...",
- "exit_code": 0,
- "cost_usd": 0.042,
- "error": "",
- "timestamp": "2026-03-11T12:00:00Z"
-}
-```
-
-The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses
-an interactive question tool (currently unused in the primary file-based `BLOCKED` flow).
-
-WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is
-configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections.
-(`internal/api/websocket.go`)
-
----
-
-## 8. Subtask / Parent-Task Dependency Resolution
-
-Claudomator supports two distinct dependency mechanisms:
-
-### 8a. `depends_on` — explicit task ordering
-
-Tasks declare `depends_on: [<task-id>, ...]` in their YAML or creation payload. When the
-executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed
-tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`,
-`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED`
-immediately. (`internal/executor/executor.go:642`)
-
-### 8b. Parent / subtask blocking (`BLOCKED` state)
-
-A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it)
-transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows
-an agent to dispatch subtasks and then wait for them.
-
-```
-Parent task runner exits 0
- ├── no subtasks → READY (normal review gate)
- └── has subtasks → BLOCKED (waiting for subtasks)
- │
- Each subtask completes → COMPLETED
- │
- All subtasks COMPLETED?
- ├── yes → maybeUnblockParent() → parent READY
- └── no → parent stays BLOCKED
-```
-
-`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It
-loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls
-`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`)
-
-The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects
-a `question.json` file in the execution log directory, it returns a `*BlockedError` containing
-the question JSON and session ID. The pool stores the question in the `tasks.question_json`
-column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer`
-resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying
-`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`,
-`internal/api/server.go:221`)
-
----
-
-## 9. External Go Dependencies
-
-| Module | Version | Purpose |
-|---|---|---|
-| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) |
-| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs |
-| `github.com/spf13/cobra` | v1.10.2 | CLI command framework |
-| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing |
-| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server |
-| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing |
-
----
-
-## 10. Related Documentation
-
-### Architectural Decision Records
-
-| ADR | Title | Summary |
-|---|---|---|
-| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale |
-| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases |
-| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist |
-| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier |
-| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation |
-
-### Package-Level Docs
-
-Per-package design notes live in [`docs/packages/`](packages/) (in progress).
-
-### Task YAML Reference
-
-```yaml
-name: "My Task"
-agent:
- type: "claude" # "claude" | "gemini"; optional — load balancer may override
- model: "sonnet" # model tier hint; optional — classifier may override
- instructions: |
- Do something useful.
- project_dir: "/workspace/myproject" # if set, runs in a git sandbox
- max_budget_usd: 1.00
- permission_mode: "bypassPermissions" # default
- allowed_tools: ["Bash", "Read"]
- context_files: ["README.md"]
-timeout: "15m"
-priority: "normal" # high | normal | low
-tags: ["ci", "backend"]
-depends_on: ["<task-uuid>"] # explicit ordering
-parent_task_id: "<task-uuid>" # set by parent agent when creating subtasks
-```
-
-Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`.
-
-### Storage Schema
-
-Two tables auto-migrated on `storage.Open()`:
-
-- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`,
- `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`,
- `question_json`, `rejection_comment`, `created_at`, `updated_at`
-- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`,
- `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`,
- `resume_session_id`, `resume_answer`
-
-Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`,
-`executions.status`, `executions.start_time`.
diff --git a/docs/superpowers/plans/2026-04-03-task-project-fk.md b/docs/superpowers/plans/2026-04-03-task-project-fk.md
new file mode 100644
index 0000000..36e6a18
--- /dev/null
+++ b/docs/superpowers/plans/2026-04-03-task-project-fk.md
@@ -0,0 +1,837 @@
+# Task → Project FK Migration Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Replace the loose `project TEXT` and `repository_url TEXT` fields on Task with a proper `project_id TEXT` FK to the projects table, and remove the runtime-populated `BranchName` field (it belongs on Story).
+
+**Architecture:** Add `project_id` column to tasks table via additive migration; backfill from project name lookup via SQL; update `scanTask` to LEFT JOIN projects and resolve `repository_url` as `COALESCE(p.remote_url, t.repository_url)` so orphan (webhook) tasks still work; strip `Project`, `BranchName` from the Task struct and the three ADR-007 runtime patches that compensated for the old design.
+
+**Tech Stack:** Go, SQLite (database/sql + mattn/go-sqlite3), standard library only.
+
+---
+
+## File Map
+
+| File | Change |
+|------|--------|
+| `internal/task/task.go` | Remove `Project`, `RepositoryURL`, `BranchName` fields; add `ProjectID` |
+| `internal/task/validator.go` | Drop `repository_url` required; require at least one of `project_id` / `repository_url` |
+| `internal/task/validator_test.go` | Update `validTask()` helper |
+| `internal/task/task_test.go` | `Project` → `ProjectID` references |
+| `internal/storage/db.go` | Add migration + backfill; define `taskSelectSQL` helper with LEFT JOIN; update `scanTask`, `CreateTask`, `UpdateTask`; add `GetProjectByName` |
+| `internal/storage/db_test.go` | Update project field references in task tests |
+| `internal/executor/executor.go` | Remove four ADR-007 patches (RepositoryURL ×2, BranchName ×2) |
+| `internal/executor/container.go` | Remove `t.BranchName` fallback; always resolve from story |
+| `internal/executor/container_test.go` | Remove `BranchName` from direct task construction; test via story instead |
+| `internal/executor/executor_test.go` | Minor: `Project` → `ProjectID` in any task literals |
+| `internal/api/server.go` | `handleCreateTask`: accept `project_id`; drop `project`/`repository_url` input fields |
+| `internal/api/webhook.go` | `createCIFailureTask`: use `GetProjectByName` to set `project_id`; keep `repository_url` fallback when no DB project matches |
+| `internal/api/task_view.go` | No change — `RepositoryURL` still populated by JOIN |
+| `internal/api/server_test.go` | Replace `repository_url`/`project` in JSON payloads with `project_id` |
+| `internal/api/webhook_test.go` | Seed DB project before assertions; check `ProjectID` not `RepositoryURL` directly |
+| `internal/cli/list.go` | `t.Project` → `t.ProjectID` |
+| `internal/cli/status.go` | `t.Project` → `t.ProjectID` |
+
+---
+
+## Task 1: Update Task struct and validator
+
+**Files:**
+- Modify: `internal/task/task.go`
+- Modify: `internal/task/validator.go`
+
+- [ ] **Step 1: Write failing validator tests**
+
+In `internal/task/validator_test.go`, replace the existing `validTask()` helper and add a new failing test:
+
+```go
+func validTask() *Task {
+ return &Task{
+ ID: "test-id",
+ Name: "Valid Task",
+ ProjectID: "proj-1",
+ Agent: AgentConfig{
+ Type: "claude",
+ Instructions: "do something",
+ },
+ Retry: RetryConfig{MaxAttempts: 1, Backoff: "linear"},
+ Priority: PriorityNormal,
+ }
+}
+
+func TestValidate_MissingProjectIDAndRepositoryURL(t *testing.T) {
+ tk := validTask()
+ tk.ProjectID = ""
+ if err := Validate(tk); err == nil {
+ t.Error("expected error for missing project_id and repository_url")
+ }
+}
+
+func TestValidate_RepositoryURLAloneIsValid(t *testing.T) {
+ tk := validTask()
+ tk.ProjectID = ""
+ tk.RepositoryURL = "https://github.com/owner/repo.git"
+ if err := Validate(tk); err != nil {
+ t.Errorf("expected no error with repository_url set, got: %v", err)
+ }
+}
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```
+go test ./internal/task/... 2>&1 | grep -E "FAIL|PASS|error"
+```
+Expected: compile error — `ProjectID` undefined on Task.
+
+- [ ] **Step 3: Update `internal/task/task.go`**
+
+Replace lines 76-85:
+```go
+// Before:
+Project string `yaml:"project" json:"project"`
+RepositoryURL string `yaml:"repository_url" json:"repository_url"`
+// ...
+BranchName string `yaml:"-" json:"branch_name,omitempty"`
+
+// After:
+ProjectID string `yaml:"project_id" json:"project_id"`
+RepositoryURL string `yaml:"-" json:"repository_url,omitempty"` // derived at load time; not stored
+```
+
+Full replacement for lines 71-94:
+```go
+type Task struct {
+ ID string `yaml:"id" json:"id"`
+ ParentTaskID string `yaml:"parent_task_id" json:"parent_task_id"`
+ Name string `yaml:"name" json:"name"`
+ Description string `yaml:"description" json:"description"`
+ ProjectID string `yaml:"project_id" json:"project_id"`
+ RepositoryURL string `yaml:"-" json:"repository_url,omitempty"`
+ Agent AgentConfig `yaml:"agent" json:"agent"`
+ Timeout Duration `yaml:"timeout" json:"timeout"`
+ Retry RetryConfig `yaml:"retry" json:"retry"`
+ Priority Priority `yaml:"priority" json:"priority"`
+ Tags []string `yaml:"tags" json:"tags"`
+ DependsOn []string `yaml:"depends_on" json:"depends_on"`
+ StoryID string `yaml:"-" json:"story_id,omitempty"`
+ State State `yaml:"-" json:"state"`
+ RejectionComment string `yaml:"-" json:"rejection_comment,omitempty"`
+ QuestionJSON string `yaml:"-" json:"question,omitempty"`
+ ElaborationInput string `yaml:"-" json:"elaboration_input,omitempty"`
+ Summary string `yaml:"-" json:"summary,omitempty"`
+ Interactions []Interaction `yaml:"-" json:"interactions,omitempty"`
+ CreatedAt time.Time `yaml:"-" json:"created_at"`
+ UpdatedAt time.Time `yaml:"-" json:"updated_at"`
+}
+```
+
+- [ ] **Step 4: Update `internal/task/validator.go`**
+
+Replace the `repository_url` check (lines 32-34) with:
+```go
+if t.ProjectID == "" && t.RepositoryURL == "" {
+ ve.Add("project_id or repository_url is required")
+}
+```
+
+- [ ] **Step 5: Fix `internal/task/task_test.go`**
+
+Find lines that reference `task.Project` and replace with `task.ProjectID`:
+```go
+// Line ~106:
+task := Task{ProjectID: "my-project"}
+if task.ProjectID != "my-project" {
+ t.Errorf("expected ProjectID 'my-project', got %q", task.ProjectID)
+}
+// Line ~126:
+if tasks[0].ProjectID != "my-project" {
+ t.Errorf("expected ProjectID 'my-project', got %q", tasks[0].ProjectID)
+}
+```
+
+- [ ] **Step 6: Run tests**
+
+```
+go test ./internal/task/... 2>&1
+```
+Expected: all pass.
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add internal/task/task.go internal/task/validator.go internal/task/validator_test.go internal/task/task_test.go
+git commit -m "refactor: replace Task.Project+RepositoryURL+BranchName with ProjectID FK"
+```
+
+---
+
+## Task 2: Storage — migration, LEFT JOIN queries, scanTask
+
+**Files:**
+- Modify: `internal/storage/db.go`
+
+- [ ] **Step 1: Write failing storage test**
+
+In `internal/storage/db_test.go`, find the test around line 1017 (`TestTask_ProjectRoundTrip` or similar) and update it to use `ProjectID`:
+
+```go
+func TestTask_ProjectIDRoundTrip(t *testing.T) {
+ db := testDB(t)
+
+ // Seed a project so the FK resolves.
+ proj := &task.Project{
+ ID: "proj-rt", Name: "roundtrip-proj",
+ RemoteURL: "https://github.com/owner/rt.git",
+ Type: "web",
+ }
+ if err := db.UpsertProject(proj); err != nil {
+ t.Fatalf("UpsertProject: %v", err)
+ }
+
+ now := time.Now().UTC().Truncate(time.Second)
+ tk := &task.Task{
+ ID: "proj-task-1",
+ Name: "Task with project",
+ ProjectID: "proj-rt",
+ Agent: task.AgentConfig{Type: "claude", Instructions: "do x"},
+ Priority: task.PriorityNormal,
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"},
+ Tags: []string{},
+ DependsOn: []string{},
+ State: task.StatePending,
+ CreatedAt: now,
+ UpdatedAt: now,
+ }
+ if err := db.CreateTask(tk); err != nil {
+ t.Fatalf("CreateTask: %v", err)
+ }
+
+ got, err := db.GetTask("proj-task-1")
+ if err != nil {
+ t.Fatalf("GetTask: %v", err)
+ }
+ if got.ProjectID != "proj-rt" {
+ t.Errorf("ProjectID: want %q, got %q", "proj-rt", got.ProjectID)
+ }
+ // RepositoryURL should be derived from the project via JOIN.
+ if got.RepositoryURL != "https://github.com/owner/rt.git" {
+ t.Errorf("RepositoryURL: want derived from project, got %q", got.RepositoryURL)
+ }
+}
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```
+go test ./internal/storage/... 2>&1 | grep -E "FAIL|error|undefined"
+```
+Expected: compile errors — `ProjectID` undefined on task struct (old DB code still references `Project`).
+
+- [ ] **Step 3: Add `project_id` migration and backfill to `db.go`**
+
+In `migrate()`, append to the `migrations` slice (after the existing `story_id` migration):
+```go
+`ALTER TABLE tasks ADD COLUMN project_id TEXT`,
+// Backfill project_id from the project name stored in the legacy 'project' column.
+`UPDATE tasks SET project_id = (SELECT id FROM projects WHERE name = tasks.project) WHERE project IS NOT NULL AND project != '' AND project_id IS NULL`,
+```
+
+- [ ] **Step 4: Add `taskSelectSQL` constant and update `scanTask`**
+
+Add a package-level constant just before `scanTask`:
+```go
+// taskSelectSQL is the base SELECT for all task queries.
+// It LEFT JOINs projects so that RepositoryURL is always resolved:
+// project-linked tasks use p.remote_url; orphan tasks fall back to t.repository_url.
+const taskSelectSQL = `
+SELECT t.id, t.name, t.description, t.elaboration_input, t.project_id,
+ COALESCE(p.remote_url, t.repository_url, '') AS resolved_url,
+ t.config_json, t.priority, t.timeout_ns, t.retry_json, t.tags_json,
+ t.depends_on_json, t.parent_task_id, t.state, t.created_at, t.updated_at,
+ t.rejection_comment, t.question_json, t.summary, t.interactions_json, t.story_id
+FROM tasks t
+LEFT JOIN projects p ON p.id = t.project_id`
+```
+
+Update `scanTask` to match the new column order (21 columns):
+```go
+func scanTask(row scanner) (*task.Task, error) {
+ var (
+ t task.Task
+ configJSON string
+ retryJSON string
+ tagsJSON string
+ depsJSON string
+ state string
+ priority string
+ timeoutNS int64
+ projectID sql.NullString
+ resolvedURL sql.NullString
+ parentTaskID sql.NullString
+ elaborationInput sql.NullString
+ rejectionComment sql.NullString
+ questionJSON sql.NullString
+ summary sql.NullString
+ interactionsJSON sql.NullString
+ storyID sql.NullString
+ )
+ err := row.Scan(
+ &t.ID, &t.Name, &t.Description, &elaborationInput, &projectID, &resolvedURL,
+ &configJSON, &priority, &timeoutNS, &retryJSON, &tagsJSON, &depsJSON,
+ &parentTaskID, &state, &t.CreatedAt, &t.UpdatedAt,
+ &rejectionComment, &questionJSON, &summary, &interactionsJSON, &storyID,
+ )
+ t.ProjectID = projectID.String
+ t.RepositoryURL = resolvedURL.String
+ t.ElaborationInput = elaborationInput.String
+ t.ParentTaskID = parentTaskID.String
+ t.RejectionComment = rejectionComment.String
+ t.QuestionJSON = questionJSON.String
+ t.Summary = summary.String
+ t.StoryID = storyID.String
+ if err != nil {
+ return nil, err
+ }
+ t.State = task.State(state)
+ t.Priority = task.Priority(priority)
+ t.Timeout.Duration = time.Duration(timeoutNS)
+ if err := json.Unmarshal([]byte(configJSON), &t.Agent); err != nil {
+ return nil, fmt.Errorf("unmarshaling agent config: %w", err)
+ }
+ if err := json.Unmarshal([]byte(retryJSON), &t.Retry); err != nil {
+ return nil, fmt.Errorf("unmarshaling retry: %w", err)
+ }
+ if err := json.Unmarshal([]byte(tagsJSON), &t.Tags); err != nil {
+ return nil, fmt.Errorf("unmarshaling tags: %w", err)
+ }
+ if err := json.Unmarshal([]byte(depsJSON), &t.DependsOn); err != nil {
+ return nil, fmt.Errorf("unmarshaling depends_on: %w", err)
+ }
+ raw := interactionsJSON.String
+ if raw == "" {
+ raw = "[]"
+ }
+ if err := json.Unmarshal([]byte(raw), &t.Interactions); err != nil {
+ return nil, fmt.Errorf("unmarshaling interactions: %w", err)
+ }
+ return &t, nil
+}
+```
+
+- [ ] **Step 5: Update all SELECT call sites to use `taskSelectSQL`**
+
+Replace every hard-coded SELECT string that queries tasks with `taskSelectSQL + " WHERE ..."`.
+
+**`GetTask`:**
+```go
+func (s *DB) GetTask(id string) (*task.Task, error) {
+ row := s.db.QueryRow(taskSelectSQL+` WHERE t.id = ?`, id)
+ return scanTask(row)
+}
+```
+
+**`ListTasks`:**
+```go
+func (s *DB) ListTasks(filter TaskFilter) ([]*task.Task, error) {
+ query := taskSelectSQL + ` WHERE 1=1`
+ var args []interface{}
+ if filter.State != "" {
+ query += " AND t.state = ?"
+ args = append(args, string(filter.State))
+ }
+ if !filter.Since.IsZero() {
+ query += " AND t.updated_at > ?"
+ args = append(args, filter.Since.UTC())
+ }
+ query += " ORDER BY t.created_at DESC"
+ if filter.Limit > 0 {
+ query += " LIMIT ?"
+ args = append(args, filter.Limit)
+ }
+ rows, err := s.db.Query(query, args...)
+ // ... rest unchanged
+```
+
+**`ListSubtasks`:**
+```go
+func (s *DB) ListSubtasks(parentID string) ([]*task.Task, error) {
+ rows, err := s.db.Query(taskSelectSQL+` WHERE t.parent_task_id = ? ORDER BY t.created_at ASC`, parentID)
+ // ... rest unchanged
+```
+
+**`ResetTaskForRetry`** (inner SELECT):
+```go
+t, err := scanTask(tx.QueryRow(taskSelectSQL+` WHERE t.id = ?`, id))
+```
+
+**`ListTasksByStory`:**
+```go
+rows, err := s.db.Query(taskSelectSQL+` WHERE t.story_id = ? ORDER BY t.created_at ASC`, storyID)
+```
+
+- [ ] **Step 6: Update `CreateTask` to write `project_id` instead of `project`/`repository_url`**
+
+Replace the INSERT:
+```go
+_, err = s.db.Exec(`
+ INSERT INTO tasks (id, name, description, elaboration_input, project_id, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, story_id)
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
+ t.ID, t.Name, t.Description, t.ElaborationInput, t.ProjectID, t.RepositoryURL, string(configJSON), string(t.Priority),
+ t.Timeout.Duration.Nanoseconds(), string(retryJSON), string(tagsJSON), string(depsJSON),
+ t.ParentTaskID, string(t.State), t.CreatedAt.UTC(), t.UpdatedAt.UTC(), t.StoryID,
+)
+```
+
+Note: `repository_url` is kept as a fallback column for orphan (webhook) tasks with no project.
+
+- [ ] **Step 7: Update `TaskUpdate` struct and `UpdateTask`**
+
+Replace `RepositoryURL string` with `ProjectID string` in `TaskUpdate`:
+```go
+type TaskUpdate struct {
+ Name string
+ Description string
+ ProjectID string
+ Config task.AgentConfig
+ Priority task.Priority
+ TimeoutNS int64
+ Retry task.RetryConfig
+ Tags []string
+ DependsOn []string
+}
+```
+
+Update the UPDATE query in `UpdateTask`:
+```go
+result, err := s.db.Exec(`
+ UPDATE tasks
+ SET name = ?, description = ?, project_id = ?, config_json = ?, priority = ?, timeout_ns = ?,
+ retry_json = ?, tags_json = ?, depends_on_json = ?, state = ?, updated_at = ?
+ WHERE id = ?`,
+ u.Name, u.Description, u.ProjectID, configJSON, string(u.Priority), u.TimeoutNS,
+ retryJSON, tagsJSON, depsJSON, string(task.StatePending), now, id)
+```
+
+- [ ] **Step 8: Add `GetProjectByName`**
+
+```go
+// GetProjectByName retrieves a project by its name field.
+func (s *DB) GetProjectByName(name string) (*task.Project, error) {
+ row := s.db.QueryRow(`SELECT id, name, remote_url, local_path, type, deploy_script FROM projects WHERE name = ?`, name)
+ p := &task.Project{}
+ if err := row.Scan(&p.ID, &p.Name, &p.RemoteURL, &p.LocalPath, &p.Type, &p.DeployScript); err != nil {
+ return nil, err
+ }
+ return p, nil
+}
+```
+
+- [ ] **Step 9: Fix the old `TestTask_ProjectRoundTrip` test in `db_test.go`**
+
+Find and replace the test that asserts `got.Project != "my-project"` (around line 1017). Replace the entire test with the new `TestTask_ProjectIDRoundTrip` written in Step 1 above. Delete the old test.
+
+Also find and replace the `TestUpdateTask` test's `RepositoryURL` field with `ProjectID`:
+```go
+// In the update test, change:
+u := TaskUpdate{
+ Name: "Updated",
+ ProjectID: "proj-upd",
+ // ...
+}
+// And assert:
+if got.ProjectID != "proj-upd" { ... }
+```
+
+- [ ] **Step 10: Run storage tests**
+
+```
+go test -count=1 ./internal/storage/... 2>&1
+```
+Expected: all pass.
+
+- [ ] **Step 11: Commit**
+
+```bash
+git add internal/storage/db.go internal/storage/db_test.go
+git commit -m "feat: add project_id FK to tasks; LEFT JOIN resolves repository_url at read time"
+```
+
+---
+
+## Task 3: Remove ADR-007 patches from executor
+
+**Files:**
+- Modify: `internal/executor/executor.go`
+
+- [ ] **Step 1: Remove RepositoryURL patches**
+
+Delete these two blocks (they appear in both `execute` and `executeResume`):
+```go
+// DELETE this block (appears twice):
+// Populate RepositoryURL from Project registry if missing (ADR-007).
+if t.RepositoryURL == "" && t.Project != "" {
+ if proj, err := p.store.GetProject(t.Project); err == nil && proj.RemoteURL != "" {
+ t.RepositoryURL = proj.RemoteURL
+ }
+}
+```
+
+- [ ] **Step 2: Remove BranchName patches**
+
+Delete these two blocks (also appear twice):
+```go
+// DELETE this block (appears twice):
+// Populate BranchName from Story if missing (ADR-007).
+if t.BranchName == "" && t.StoryID != "" {
+ if story, err := p.store.GetStory(t.StoryID); err == nil && story.BranchName != "" {
+ t.BranchName = story.BranchName
+ }
+}
+```
+
+- [ ] **Step 3: Build check**
+
+```
+go build ./internal/executor/... 2>&1
+```
+Expected: clean build.
+
+- [ ] **Step 4: Run executor tests**
+
+```
+go test -count=1 ./internal/executor/... 2>&1 | tail -5
+```
+Expected: all pass.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add internal/executor/executor.go
+git commit -m "refactor: remove ADR-007 runtime patches for RepositoryURL and BranchName"
+```
+
+---
+
+## Task 4: Clean up container runner
+
+**Files:**
+- Modify: `internal/executor/container.go`
+- Modify: `internal/executor/container_test.go`
+
+- [ ] **Step 1: Remove `t.BranchName` fallback in `container.go`**
+
+Find lines 154-157:
+```go
+// Fall back to task-level BranchName (e.g. set explicitly by executor or tests).
+if storyBranch == "" {
+ storyBranch = t.BranchName
+}
+```
+Delete these four lines. The container already resolves the branch from the story via the store lookup at lines 144-153; the fallback only existed for the (now-removed) `BranchName` field.
+
+- [ ] **Step 2: Update container tests that set `BranchName` directly**
+
+In `container_test.go`, find `TestContainerRunner_ClonesStoryBranch` (around line 544). It currently sets `BranchName: "story/my-feature"` directly on the task. Replace with a story + store setup:
+
+```go
+func TestContainerRunner_ClonesStoryBranch(t *testing.T) {
+ // ... existing setup ...
+ store := testContainerStore(t) // use the real test store
+ story := &task.Story{
+ ID: "story-branch-1",
+ Name: "Branch Test",
+ BranchName: "story/my-feature",
+ ProjectID: "",
+ Status: task.StoryInProgress,
+ }
+ if err := store.CreateStory(story); err != nil {
+ t.Fatalf("CreateStory: %v", err)
+ }
+
+ tk := &task.Task{
+ ID: "story-branch-test",
+ RepositoryURL: "https://example.com/repo.git",
+ StoryID: "story-branch-1",
+ Agent: task.AgentConfig{Type: "claude"},
+ }
+ // ... rest of assertions unchanged
+```
+
+Note: check whether `container_test.go` has a `testContainerStore` helper or uses a different pattern; adapt accordingly. The store reference in `ContainerRunner` is `r.Store` — ensure the runner is initialized with the test store.
+
+- [ ] **Step 3: Run container tests**
+
+```
+go test -count=1 -run TestContainerRunner ./internal/executor/... 2>&1
+```
+Expected: all pass.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/executor/container.go internal/executor/container_test.go
+git commit -m "refactor: remove Task.BranchName fallback in container runner; always resolve from story"
+```
+
+---
+
+## Task 5: Update API handlers
+
+**Files:**
+- Modify: `internal/api/server.go`
+- Modify: `internal/api/server_test.go`
+
+- [ ] **Step 1: Update `handleCreateTask` in `server.go`**
+
+Replace the input struct and task construction (lines ~447-488):
+
+```go
+func (s *Server) handleCreateTask(w http.ResponseWriter, r *http.Request) {
+ var input struct {
+ Name string `json:"name"`
+ Description string `json:"description"`
+ ElaborationInput string `json:"elaboration_input"`
+ ProjectID string `json:"project_id"`
+ Agent task.AgentConfig `json:"agent"`
+ Claude task.AgentConfig `json:"claude"` // legacy alias
+ Timeout string `json:"timeout"`
+ Priority string `json:"priority"`
+ Tags []string `json:"tags"`
+ ParentTaskID string `json:"parent_task_id"`
+ }
+ // ... decode unchanged ...
+
+ t := &task.Task{
+ ID: uuid.New().String(),
+ Name: input.Name,
+ Description: input.Description,
+ ElaborationInput: input.ElaborationInput,
+ ProjectID: input.ProjectID,
+ Agent: input.Agent,
+ Priority: task.Priority(input.Priority),
+ Tags: input.Tags,
+ DependsOn: []string{},
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "exponential"},
+ State: task.StatePending,
+ CreatedAt: now,
+ UpdatedAt: now,
+ ParentTaskID: input.ParentTaskID,
+ }
+ // ... rest unchanged
+```
+
+- [ ] **Step 2: Update `server_test.go` task payloads**
+
+Replace every `"repository_url": "https://github.com/user/repo"` in JSON payloads with `"project_id": "test-proj"`.
+
+First seed a project in `testServer` or `testServerWithRunner` so the FK is valid and `GetTask` returns `RepositoryURL` via JOIN. Add to `testServerWithRunner`:
+
+```go
+// Seed a default project for tests that don't need a specific one.
+defaultProj := &task.Project{
+ ID: "test-proj", Name: "test-project",
+ RemoteURL: "https://github.com/user/repo",
+ Type: "web",
+}
+if err := store.UpsertProject(defaultProj); err != nil {
+ t.Fatalf("seed test project: %v", err)
+}
+```
+
+Then in each test that previously sent `"repository_url": "..."`, use `"project_id": "test-proj"` instead.
+
+Update assertions that checked `created.Project != "test-project"` to check `created.ProjectID != "test-proj"`.
+
+- [ ] **Step 3: Run API tests**
+
+```
+go test -count=1 ./internal/api/... 2>&1 | grep -E "FAIL|ok"
+```
+Expected: all pass.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/api/server.go internal/api/server_test.go
+git commit -m "feat: handleCreateTask accepts project_id; drop project/repository_url input fields"
+```
+
+---
+
+## Task 6: Update webhook handler
+
+**Files:**
+- Modify: `internal/api/webhook.go`
+- Modify: `internal/api/webhook_test.go`
+
+- [ ] **Step 1: Add `GetProjectByName` to the `Store` interface in `server.go`**
+
+Find the `store` interface (or wherever `GetProject` is declared in the api package). Add:
+```go
+GetProjectByName(name string) (*task.Project, error)
+```
+
+- [ ] **Step 2: Update `createCIFailureTask` in `webhook.go`**
+
+Replace lines 203-224:
+```go
+now := time.Now().UTC()
+t := &task.Task{
+ ID: uuid.New().String(),
+ Name: fmt.Sprintf("Fix CI failure: %s on %s", checkName, branch),
+ Agent: task.AgentConfig{
+ Type: "claude",
+ Model: "sonnet",
+ Instructions: instructions,
+ MaxBudgetUSD: 3.0,
+ AllowedTools: []string{"Read", "Edit", "Bash", "Glob", "Grep"},
+ },
+ Priority: task.PriorityNormal,
+ Tags: []string{"ci", "auto"},
+ DependsOn: []string{},
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "exponential"},
+ State: task.StatePending,
+ CreatedAt: now,
+ UpdatedAt: now,
+}
+
+// Resolve project_id from DB using the config project name.
+// If no DB project matches, fall back to storing the repo URL directly.
+if project != nil {
+ if dbProj, err := s.store.GetProjectByName(project.Name); err == nil {
+ t.ProjectID = dbProj.ID
+ } else {
+ // No DB project yet — store URL directly so the executor can clone.
+ t.RepositoryURL = fmt.Sprintf("https://github.com/%s.git", fullName)
+ }
+} else {
+ t.RepositoryURL = fmt.Sprintf("https://github.com/%s.git", fullName)
+}
+```
+
+- [ ] **Step 3: Update webhook tests**
+
+In `webhook_test.go`, seed a DB project before the assertions that check `tk.RepositoryURL`. After the migration, `RepositoryURL` is derived from the project, so seed the project:
+
+```go
+// In TestHandleCheckRun_CreatesTask (and similar tests):
+store.UpsertProject(&task.Project{
+ ID: "myrepo-proj", Name: "myrepo",
+ RemoteURL: "https://github.com/owner/myrepo.git",
+ Type: "web",
+})
+// ...
+// Check ProjectID instead of RepositoryURL directly:
+if tk.ProjectID != "myrepo-proj" {
+ t.Errorf("task project_id = %q, want myrepo-proj", tk.ProjectID)
+}
+// RepositoryURL is still populated via JOIN:
+if tk.RepositoryURL != "https://github.com/owner/myrepo.git" {
+ t.Errorf("task repository url = %q, want https://github.com/owner/myrepo.git", tk.RepositoryURL)
+}
+```
+
+For tests where `matchProject` returns nil (no matching project), verify `t.RepositoryURL` is set directly (fallback path).
+
+- [ ] **Step 4: Run webhook tests**
+
+```
+go test -count=1 -run TestHandle ./internal/api/... 2>&1
+```
+Expected: all pass.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add internal/api/webhook.go internal/api/webhook_test.go internal/api/server.go
+git commit -m "feat: webhook sets project_id via DB lookup; falls back to repository_url for unknown repos"
+```
+
+---
+
+## Task 7: Update CLI
+
+**Files:**
+- Modify: `internal/cli/list.go`
+- Modify: `internal/cli/status.go`
+
+- [ ] **Step 1: Update `list.go`**
+
+Find line ~55: `t.ID, t.Name, t.Project, ...`
+Replace with: `t.ID, t.Name, t.ProjectID, ...`
+
+- [ ] **Step 2: Update `status.go`**
+
+Find lines ~42-44:
+```go
+if t.Project != "" {
+ fmt.Printf("Project: %s\n", t.Project)
+}
+```
+Replace with:
+```go
+if t.ProjectID != "" {
+ fmt.Printf("Project: %s\n", t.ProjectID)
+}
+```
+
+- [ ] **Step 3: Build check**
+
+```
+go build ./... 2>&1
+```
+Expected: clean.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add internal/cli/list.go internal/cli/status.go
+git commit -m "fix: CLI list/status use ProjectID"
+```
+
+---
+
+## Task 8: Full test suite + production migration verification
+
+- [ ] **Step 1: Run full test suite with race detector**
+
+```
+go test -race -count=1 ./... 2>&1 | grep -E "FAIL|ok|panic"
+```
+Expected: all packages pass.
+
+- [ ] **Step 2: Verify production DB migration**
+
+```bash
+sqlite3 /site/doot.terst.org/data/claudomator.db "
+SELECT t.id, t.name, t.project_id, COALESCE(p.remote_url, t.repository_url) as url
+FROM tasks t LEFT JOIN projects p ON p.id = t.project_id
+WHERE t.state NOT IN ('COMPLETED','CANCELLED')
+ORDER BY t.created_at DESC LIMIT 10;"
+```
+Expected: `project_id` populated for tasks that had a `project` name; URL resolves correctly.
+
+- [ ] **Step 3: Push and deploy**
+
+```bash
+git push local main
+sudo ./scripts/deploy
+```
+
+---
+
+## Self-Review
+
+**Spec coverage check:**
+- ✅ `Task.Project` removed → `ProjectID` added (Tasks 1, 2, 5, 6, 7)
+- ✅ `Task.RepositoryURL` no longer stored directly; derived via LEFT JOIN (Task 2)
+- ✅ `Task.BranchName` removed; container always resolves from story (Tasks 1, 4)
+- ✅ ADR-007 patches removed (Task 3)
+- ✅ `GetProjectByName` added to storage (Task 2 Step 8)
+- ✅ Webhook falls back gracefully for unknown repos (Task 6)
+- ✅ DB backfill migration in place (Task 2 Step 3)
+
+**Placeholder scan:** No TBDs or incomplete steps found.
+
+**Type consistency:** `ProjectID` used consistently throughout. `RepositoryURL` remains on Task struct (derived, not stored). `taskSelectSQL` constant referenced uniformly in all SELECT call sites.
diff --git a/docs/superpowers/plans/2026-04-04-task-checker-story-ship.md b/docs/superpowers/plans/2026-04-04-task-checker-story-ship.md
new file mode 100644
index 0000000..021405f
--- /dev/null
+++ b/docs/superpowers/plans/2026-04-04-task-checker-story-ship.md
@@ -0,0 +1,1226 @@
+# Task Checker Agent and Story Ship Gate — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add an async per-task checker agent that auto-accepts passing tasks, and replace the auto-deploy story trigger with an explicit human "Ship" action.
+
+**Architecture:** Checker tasks are regular pool tasks with a new `checker_for_task_id` field; when they complete successfully the pool auto-accepts the linked task. `checkStoryCompletion` still transitions stories to SHIPPABLE but no longer fires the deploy — a new `POST /api/stories/{id}/ship` endpoint and "Ship" button do that instead. Story elaboration is extended to produce `acceptance_criteria` per task.
+
+**Tech Stack:** Go 1.25, SQLite (database/sql + go-sqlite3), vanilla JS (no framework)
+
+---
+
+## File Map
+
+| File | Change |
+|---|---|
+| `internal/task/task.go` | Add `AcceptanceCriteria`, `CheckerForTaskID`, `CheckerReport` fields to `Task` |
+| `internal/storage/db.go` | 3 migrations; extend `CreateTask`, `scanTask`, all SELECT queries; add `UpdateTaskCheckerReport`, `GetCheckerTask` |
+| `internal/executor/executor.go` | Add 2 methods to `Store` interface; add `spawnCheckerTask`; modify `handleRunResult`; guard `checkStoryCompletion`; remove auto-deploy; add `ShipStory` |
+| `internal/api/server.go` | Register `POST /api/stories/{id}/ship` |
+| `internal/api/stories.go` | Add `handleShipStory`; pass `AcceptanceCriteria` in `handleApproveStory` |
+| `internal/api/elaborate.go` | Add `AcceptanceCriteria` to `elaboratedStoryTask`; update `buildStoryElaboratePrompt` |
+| `web/app.js` | Ship button on SHIPPABLE story cards; checker report on READY task cards |
+
+---
+
+## Task 1: Task struct — three new fields
+
+**Files:**
+- Modify: `internal/task/task.go`
+
+- [ ] **Step 1: Add fields to Task struct**
+
+In `internal/task/task.go`, add three fields after `StoryID`:
+
+```go
+type Task struct {
+ ID string `yaml:"id" json:"id"`
+ ParentTaskID string `yaml:"parent_task_id" json:"parent_task_id"`
+ Name string `yaml:"name" json:"name"`
+ Description string `yaml:"description" json:"description"`
+ Project string `yaml:"project" json:"project"`
+ RepositoryURL string `yaml:"repository_url" json:"repository_url"`
+ Agent AgentConfig `yaml:"agent" json:"agent"`
+ Timeout Duration `yaml:"timeout" json:"timeout"`
+ Retry RetryConfig `yaml:"retry" json:"retry"`
+ Priority Priority `yaml:"priority" json:"priority"`
+ Tags []string `yaml:"tags" json:"tags"`
+ DependsOn []string `yaml:"depends_on" json:"depends_on"`
+ StoryID string `yaml:"-" json:"story_id,omitempty"`
+ BranchName string `yaml:"-" json:"branch_name,omitempty"`
+ AcceptanceCriteria string `yaml:"-" json:"acceptance_criteria,omitempty"`
+ CheckerForTaskID string `yaml:"-" json:"checker_for_task_id,omitempty"`
+ CheckerReport string `yaml:"-" json:"checker_report,omitempty"`
+ State State `yaml:"-" json:"state"`
+ RejectionComment string `yaml:"-" json:"rejection_comment,omitempty"`
+ QuestionJSON string `yaml:"-" json:"question,omitempty"`
+ ElaborationInput string `yaml:"-" json:"elaboration_input,omitempty"`
+ Summary string `yaml:"-" json:"summary,omitempty"`
+ Interactions []Interaction `yaml:"-" json:"interactions,omitempty"`
+ CreatedAt time.Time `yaml:"-" json:"created_at"`
+ UpdatedAt time.Time `yaml:"-" json:"updated_at"`
+}
+```
+
+- [ ] **Step 2: Build to verify no compilation errors**
+
+```bash
+cd /workspace/claudomator && go build ./...
+```
+
+Expected: no output (success).
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add internal/task/task.go
+git commit -m "feat: add AcceptanceCriteria, CheckerForTaskID, CheckerReport to Task struct"
+```
+
+---
+
+## Task 2: Storage — migrations, queries, two new methods
+
+**Files:**
+- Modify: `internal/storage/db.go`
+- Test: `internal/storage/db_test.go`
+
+- [ ] **Step 1: Write failing tests for the two new storage methods**
+
+Find the existing test file and add at the end:
+
+```go
+func TestUpdateTaskCheckerReport(t *testing.T) {
+ db := openTestDB(t)
+ tk := &task.Task{
+ ID: "cr-1", Name: "orig", RepositoryURL: "https://github.com/x/y",
+ Agent: task.AgentConfig{Type: "claude", Instructions: "x"},
+ Priority: task.PriorityNormal,
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"},
+ Tags: []string{}, DependsOn: []string{},
+ State: task.StatePending, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ if err := db.CreateTask(tk); err != nil {
+ t.Fatalf("CreateTask: %v", err)
+ }
+ if err := db.UpdateTaskCheckerReport("cr-1", "Tests failed: missing endpoint"); err != nil {
+ t.Fatalf("UpdateTaskCheckerReport: %v", err)
+ }
+ got, err := db.GetTask("cr-1")
+ if err != nil {
+ t.Fatalf("GetTask: %v", err)
+ }
+ if got.CheckerReport != "Tests failed: missing endpoint" {
+ t.Errorf("expected checker report, got %q", got.CheckerReport)
+ }
+}
+
+func TestGetCheckerTask(t *testing.T) {
+ db := openTestDB(t)
+ checked := &task.Task{
+ ID: "chk-orig", Name: "orig", RepositoryURL: "https://github.com/x/y",
+ Agent: task.AgentConfig{Type: "claude", Instructions: "x"},
+ Priority: task.PriorityNormal,
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"},
+ Tags: []string{}, DependsOn: []string{},
+ State: task.StatePending, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ if err := db.CreateTask(checked); err != nil {
+ t.Fatalf("CreateTask checked: %v", err)
+ }
+ checker := &task.Task{
+ ID: "chk-checker", Name: "Check: orig", CheckerForTaskID: "chk-orig",
+ RepositoryURL: "https://github.com/x/y",
+ Agent: task.AgentConfig{Type: "claude", Instructions: "validate"},
+ Priority: task.PriorityNormal,
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"},
+ Tags: []string{}, DependsOn: []string{},
+ State: task.StatePending, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ if err := db.CreateTask(checker); err != nil {
+ t.Fatalf("CreateTask checker: %v", err)
+ }
+
+ // Should find the checker task.
+ got, err := db.GetCheckerTask("chk-orig")
+ if err != nil {
+ t.Fatalf("GetCheckerTask: %v", err)
+ }
+ if got == nil || got.ID != "chk-checker" {
+ t.Errorf("expected checker task ID chk-checker, got %v", got)
+ }
+
+ // Should return nil when no checker exists.
+ none, err := db.GetCheckerTask("nonexistent")
+ if err != nil {
+ t.Fatalf("GetCheckerTask nonexistent: %v", err)
+ }
+ if none != nil {
+ t.Errorf("expected nil for task with no checker, got %v", none)
+ }
+}
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+```bash
+cd /workspace/claudomator && go test ./internal/storage/... -run "TestUpdateTaskCheckerReport|TestGetCheckerTask" -v
+```
+
+Expected: FAIL — `db.UpdateTaskCheckerReport undefined`, `db.GetCheckerTask undefined`.
+
+- [ ] **Step 3: Add three migrations to `db.go`**
+
+In the `migrations` slice in `migrate()`, append after the `ALTER TABLE tasks ADD COLUMN story_id TEXT` entry:
+
+```go
+`ALTER TABLE tasks ADD COLUMN acceptance_criteria TEXT NOT NULL DEFAULT ''`,
+`ALTER TABLE tasks ADD COLUMN checker_for_task_id TEXT NOT NULL DEFAULT ''`,
+`ALTER TABLE tasks ADD COLUMN checker_report TEXT NOT NULL DEFAULT ''`,
+```
+
+- [ ] **Step 4: Update `CreateTask` INSERT to include the three new columns**
+
+Replace the `INSERT INTO tasks` statement in `CreateTask`:
+
+```go
+_, err = s.db.Exec(`
+ INSERT INTO tasks (id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, story_id, acceptance_criteria, checker_for_task_id, checker_report)
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`,
+ t.ID, t.Name, t.Description, t.ElaborationInput, t.Project, t.RepositoryURL, string(configJSON), string(t.Priority),
+ t.Timeout.Duration.Nanoseconds(), string(retryJSON), string(tagsJSON), string(depsJSON),
+ t.ParentTaskID, string(t.State), t.CreatedAt.UTC(), t.UpdatedAt.UTC(), t.StoryID,
+ t.AcceptanceCriteria, t.CheckerForTaskID, t.CheckerReport,
+ )
+```
+
+- [ ] **Step 5: Update `scanTask` to scan the three new columns**
+
+`scanTask` currently declares local vars and calls `row.Scan(...)` with 21 positional arguments. Add three new vars and extend the scan. The new `var` block:
+
+```go
+func scanTask(row scanner) (*task.Task, error) {
+ var (
+ t task.Task
+ configJSON string
+ retryJSON string
+ tagsJSON string
+ depsJSON string
+ state string
+ priority string
+ timeoutNS int64
+ parentTaskID sql.NullString
+ elaborationInput sql.NullString
+ project sql.NullString
+ repositoryURL sql.NullString
+ rejectionComment sql.NullString
+ questionJSON sql.NullString
+ summary sql.NullString
+ interactionsJSON sql.NullString
+ storyID sql.NullString
+ acceptanceCriteria sql.NullString
+ checkerForTaskID sql.NullString
+ checkerReport sql.NullString
+ )
+ err := row.Scan(
+ &t.ID, &t.Name, &t.Description, &elaborationInput, &project, &repositoryURL,
+ &configJSON, &priority, &timeoutNS, &retryJSON, &tagsJSON, &depsJSON,
+ &parentTaskID, &state, &t.CreatedAt, &t.UpdatedAt,
+ &rejectionComment, &questionJSON, &summary, &interactionsJSON, &storyID,
+ &acceptanceCriteria, &checkerForTaskID, &checkerReport,
+ )
+ t.ParentTaskID = parentTaskID.String
+ t.ElaborationInput = elaborationInput.String
+ t.Project = project.String
+ t.RepositoryURL = repositoryURL.String
+ t.RejectionComment = rejectionComment.String
+ t.QuestionJSON = questionJSON.String
+ t.Summary = summary.String
+ t.StoryID = storyID.String
+ t.AcceptanceCriteria = acceptanceCriteria.String
+ t.CheckerForTaskID = checkerForTaskID.String
+ t.CheckerReport = checkerReport.String
+ // ... rest of function unchanged
+```
+
+- [ ] **Step 6: Update all SELECT queries to include the three new columns**
+
+There are five SELECT statements that need `acceptance_criteria, checker_for_task_id, checker_report` appended to the column list. The pattern to find: every query with `story_id FROM tasks`. Update each one:
+
+In `GetTask` (line ~185):
+```go
+row := s.db.QueryRow(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE id = ?`, id)
+```
+
+In `ListTasks` (line ~191):
+```go
+query := `SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE 1=1`
+```
+
+In `ListSubtasks` (line ~227):
+```go
+rows, err := s.db.Query(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE parent_task_id = ? ORDER BY created_at ASC`, parentID)
+```
+
+In `ResetTaskForRetry` (line ~280):
+```go
+t, err := scanTask(tx.QueryRow(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE id = ?`, id))
+```
+
+In `ListTasksByStory` (line ~1202):
+```go
+`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE story_id = ? ORDER BY created_at ASC`,
+```
+
+- [ ] **Step 7: Add `UpdateTaskCheckerReport`**
+
+Add after `UpdateTaskSummary`:
+
+```go
+// UpdateTaskCheckerReport sets the checker_report field on a task.
+func (s *DB) UpdateTaskCheckerReport(id, report string) error {
+ now := time.Now().UTC()
+ _, err := s.db.Exec(`UPDATE tasks SET checker_report = ?, updated_at = ? WHERE id = ?`, report, now, id)
+ return err
+}
+```
+
+- [ ] **Step 8: Add `GetCheckerTask`**
+
+Add after `UpdateTaskCheckerReport`:
+
+```go
+// GetCheckerTask returns the checker task for the given checked task ID,
+// or nil if no checker task exists.
+func (s *DB) GetCheckerTask(checkedTaskID string) (*task.Task, error) {
+ row := s.db.QueryRow(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE checker_for_task_id = ? LIMIT 1`, checkedTaskID)
+ t, err := scanTask(row)
+ if err == sql.ErrNoRows {
+ return nil, nil
+ }
+ return t, err
+}
+```
+
+- [ ] **Step 9: Run the failing tests to verify they pass**
+
+```bash
+cd /workspace/claudomator && go test ./internal/storage/... -run "TestUpdateTaskCheckerReport|TestGetCheckerTask" -v
+```
+
+Expected: PASS.
+
+- [ ] **Step 10: Run full storage tests**
+
+```bash
+cd /workspace/claudomator && go test ./internal/storage/... -v
+```
+
+Expected: all PASS.
+
+- [ ] **Step 11: Commit**
+
+```bash
+git add internal/storage/db.go internal/storage/db_test.go
+git commit -m "feat: add checker task columns, UpdateTaskCheckerReport, GetCheckerTask"
+```
+
+---
+
+## Task 3: Executor — checker task spawn and completion handling
+
+**Files:**
+- Modify: `internal/executor/executor.go`
+- Modify: `internal/executor/executor_test.go`
+
+- [ ] **Step 1: Add two methods to executor's `Store` interface**
+
+In `executor.go`, the `Store` interface (around line 22). Add after `CreateTask`:
+
+```go
+UpdateTaskCheckerReport(id, report string) error
+GetCheckerTask(checkedTaskID string) (*task.Task, error)
+```
+
+- [ ] **Step 2: Write failing tests**
+
+In `executor_test.go`, add:
+
+```go
+func TestPool_CheckerSpawned_OnReady(t *testing.T) {
+ store := testStore(t)
+ runner := &mockRunner{} // succeeds instantly
+ pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})))
+
+ tk := makeTask("checker-spawn-1")
+ tk.RepositoryURL = "https://github.com/x/y"
+ store.CreateTask(tk)
+ pool.Submit(context.Background(), tk)
+ <-pool.Results() // wait for original task to finish
+
+ // Give the async spawnCheckerTask goroutine a moment to run.
+ time.Sleep(200 * time.Millisecond)
+
+ checker, err := store.GetCheckerTask("checker-spawn-1")
+ if err != nil {
+ t.Fatalf("GetCheckerTask: %v", err)
+ }
+ if checker == nil {
+ t.Fatal("expected a checker task to be created, got nil")
+ }
+ if checker.CheckerForTaskID != "checker-spawn-1" {
+ t.Errorf("expected CheckerForTaskID=checker-spawn-1, got %q", checker.CheckerForTaskID)
+ }
+}
+
+func TestPool_CheckerNotSpawned_ForSubtask(t *testing.T) {
+ store := testStore(t)
+ runner := &mockRunner{}
+ pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})))
+
+ parent := makeTask("no-checker-parent")
+ parent.RepositoryURL = "https://github.com/x/y"
+ store.CreateTask(parent)
+
+ sub := makeTask("no-checker-sub")
+ sub.ParentTaskID = "no-checker-parent"
+ sub.RepositoryURL = "https://github.com/x/y"
+ store.CreateTask(sub)
+
+ pool.Submit(context.Background(), sub)
+ <-pool.Results()
+
+ time.Sleep(100 * time.Millisecond)
+
+ checker, err := store.GetCheckerTask("no-checker-sub")
+ if err != nil {
+ t.Fatalf("GetCheckerTask: %v", err)
+ }
+ if checker != nil {
+ t.Error("expected no checker for subtask, but one was created")
+ }
+}
+
+func TestPool_CheckerPass_AutoAcceptsTask(t *testing.T) {
+ store := testStore(t)
+ // Two-phase: first runner succeeds (original task), second also succeeds (checker).
+ callCount := 0
+ runner := &mockRunner{
+ onRun: func(t *task.Task, e *storage.Execution) error {
+ callCount++
+ return nil // both original and checker succeed
+ },
+ }
+ pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})))
+
+ tk := makeTask("autoaccept-1")
+ tk.RepositoryURL = "https://github.com/x/y"
+ store.CreateTask(tk)
+ pool.Submit(context.Background(), tk)
+ <-pool.Results() // original finishes → READY + checker spawned
+
+ // Wait for checker to run and complete.
+ deadline := time.Now().Add(5 * time.Second)
+ for time.Now().Before(deadline) {
+ got, _ := store.GetTask("autoaccept-1")
+ if got != nil && got.State == task.StateCompleted {
+ break
+ }
+ <-pool.Results()
+ }
+
+ got, err := store.GetTask("autoaccept-1")
+ if err != nil {
+ t.Fatalf("GetTask: %v", err)
+ }
+ if got.State != task.StateCompleted {
+ t.Errorf("expected COMPLETED after checker pass, got %s", got.State)
+ }
+}
+
+func TestPool_CheckerFail_AttachesReport(t *testing.T) {
+ store := testStore(t)
+ callCount := 0
+ runner := &mockRunner{
+ onRun: func(t *task.Task, e *storage.Execution) error {
+ callCount++
+ if t.CheckerForTaskID != "" {
+ return fmt.Errorf("test suite failed: 3 failures")
+ }
+ return nil // original task succeeds
+ },
+ }
+ pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})))
+
+ tk := makeTask("fail-checker-1")
+ tk.RepositoryURL = "https://github.com/x/y"
+ store.CreateTask(tk)
+ pool.Submit(context.Background(), tk)
+ <-pool.Results() // original → READY
+
+ // Wait for checker to fail.
+ deadline := time.Now().Add(5 * time.Second)
+ for time.Now().Before(deadline) {
+ got, _ := store.GetTask("fail-checker-1")
+ if got != nil && got.CheckerReport != "" {
+ break
+ }
+ select {
+ case <-pool.Results():
+ case <-time.After(100 * time.Millisecond):
+ }
+ }
+
+ got, err := store.GetTask("fail-checker-1")
+ if err != nil {
+ t.Fatalf("GetTask: %v", err)
+ }
+ if got.State != task.StateReady {
+ t.Errorf("expected task to stay READY after checker fail, got %s", got.State)
+ }
+ if got.CheckerReport == "" {
+ t.Error("expected checker_report to be set after checker failure")
+ }
+}
+```
+
+- [ ] **Step 3: Run tests to verify they fail**
+
+```bash
+cd /workspace/claudomator && go test ./internal/executor/... -run "TestPool_Checker" -v 2>&1 | head -30
+```
+
+Expected: FAIL — `store.UpdateTaskCheckerReport undefined`, `store.GetCheckerTask undefined`, `spawnCheckerTask undefined`.
+
+- [ ] **Step 4: Add `spawnCheckerTask` to `executor.go`**
+
+Add this function after `checkStoryCompletion`:
+
+```go
+// spawnCheckerTask creates and submits a checker task for the given completed task.
+// Guards: not called for subtasks, checker tasks, or tasks that already have a checker.
+func (p *Pool) spawnCheckerTask(ctx context.Context, checked *task.Task) {
+ // Never spawn a checker for subtasks or checker tasks themselves.
+ if checked.ParentTaskID != "" || checked.CheckerForTaskID != "" {
+ return
+ }
+ // Idempotent: don't create a second checker if one already exists.
+ existing, err := p.store.GetCheckerTask(checked.ID)
+ if err != nil {
+ p.logger.Error("spawnCheckerTask: GetCheckerTask failed", "taskID", checked.ID, "error", err)
+ return
+ }
+ if existing != nil {
+ return
+ }
+
+ criteria := checked.AcceptanceCriteria
+ if criteria == "" {
+ criteria = checked.Agent.Instructions
+ }
+
+ instructions := fmt.Sprintf(`You are validating a completed task. Do not make any changes to the code or repository.
+
+Task: %s
+Instructions given to the implementor:
+%s
+
+Acceptance criteria:
+%s
+
+Steps:
+1. Clone the repository and review the changes made.
+2. Verify each acceptance criterion is met. Run tests or make HTTP requests as needed.
+3. If all criteria are satisfied, exit normally (success).
+4. If any criterion is not met, use the Bash tool to exit with a non-zero code:
+ bash -c "exit 1"
+ Before exiting, write a brief summary of what failed.`, checked.Name, checked.Agent.Instructions, criteria)
+
+ now := time.Now().UTC()
+ checker := &task.Task{
+ ID: uuid.New().String(),
+ Name: "Check: " + checked.Name,
+ CheckerForTaskID: checked.ID,
+ RepositoryURL: checked.RepositoryURL,
+ Agent: task.AgentConfig{
+ Type: "claude",
+ Instructions: instructions,
+ MaxBudgetUSD: 0.50,
+ AllowedTools: []string{"Bash", "Read", "Glob", "Grep"},
+ },
+ Timeout: task.Duration{Duration: 10 * time.Minute},
+ Priority: task.PriorityNormal,
+ Tags: []string{},
+ DependsOn: []string{},
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"},
+ State: task.StatePending,
+ CreatedAt: now,
+ UpdatedAt: now,
+ }
+
+ if err := p.store.CreateTask(checker); err != nil {
+ p.logger.Error("spawnCheckerTask: CreateTask failed", "error", err)
+ return
+ }
+ checker.State = task.StateQueued
+ if err := p.store.UpdateTaskState(checker.ID, task.StateQueued); err != nil {
+ p.logger.Error("spawnCheckerTask: UpdateTaskState failed", "error", err)
+ return
+ }
+ if err := p.Submit(ctx, checker); err != nil {
+ p.logger.Error("spawnCheckerTask: Submit failed", "error", err)
+ }
+}
+```
+
+- [ ] **Step 5: Modify `handleRunResult` — success path**
+
+Find the success branch in `handleRunResult` (the `} else {` block after all the error handling). Currently it looks like:
+
+```go
+} else {
+ p.mu.Lock()
+ p.consecutiveFailures[agentType] = 0
+ p.mu.Unlock()
+ if t.ParentTaskID == "" {
+ subtasks, subErr := p.store.ListSubtasks(t.ID)
+ // ...
+ if subErr == nil && len(subtasks) > 0 {
+ exec.Status = "BLOCKED"
+ if err := p.store.UpdateTaskState(t.ID, task.StateBlocked); err != nil { ... }
+ } else {
+ exec.Status = "READY"
+ if err := p.store.UpdateTaskState(t.ID, task.StateReady); err != nil { ... }
+ }
+ } else {
+ exec.Status = "COMPLETED"
+ if err := p.store.UpdateTaskState(t.ID, task.StateCompleted); err != nil { ... }
+ p.maybeUnblockParent(t.ParentTaskID)
+ }
+ if t.StoryID != "" {
+ // ...checkStoryCompletion / checkValidationResult
+ }
+}
+```
+
+Replace it with:
+
+```go
+} else {
+ p.mu.Lock()
+ p.consecutiveFailures[agentType] = 0
+ p.mu.Unlock()
+ if t.CheckerForTaskID != "" {
+ // Checker task succeeded — auto-accept the checked task.
+ exec.Status = "COMPLETED"
+ if err := p.store.UpdateTaskState(t.ID, task.StateCompleted); err != nil {
+ p.logger.Error("handleRunResult: failed to complete checker task", "taskID", t.ID, "error", err)
+ }
+ checkedTask, getErr := p.store.GetTask(t.CheckerForTaskID)
+ if getErr == nil {
+ if acceptErr := p.store.UpdateTaskState(t.CheckerForTaskID, task.StateCompleted); acceptErr != nil {
+ p.logger.Error("handleRunResult: failed to auto-accept checked task", "taskID", t.CheckerForTaskID, "error", acceptErr)
+ } else if checkedTask.StoryID != "" {
+ go p.checkStoryCompletion(ctx, checkedTask.StoryID)
+ }
+ } else {
+ p.logger.Error("handleRunResult: failed to get checked task", "taskID", t.CheckerForTaskID, "error", getErr)
+ }
+ } else if t.ParentTaskID == "" {
+ subtasks, subErr := p.store.ListSubtasks(t.ID)
+ if subErr != nil {
+ p.logger.Error("failed to list subtasks", "taskID", t.ID, "error", subErr)
+ }
+ if subErr == nil && len(subtasks) > 0 {
+ exec.Status = "BLOCKED"
+ if err := p.store.UpdateTaskState(t.ID, task.StateBlocked); err != nil {
+ p.logger.Error("failed to update task state", "taskID", t.ID, "state", task.StateBlocked, "error", err)
+ }
+ } else {
+ exec.Status = "READY"
+ if err := p.store.UpdateTaskState(t.ID, task.StateReady); err != nil {
+ p.logger.Error("failed to update task state", "taskID", t.ID, "state", task.StateReady, "error", err)
+ }
+ go p.spawnCheckerTask(ctx, t)
+ }
+ } else {
+ exec.Status = "COMPLETED"
+ if err := p.store.UpdateTaskState(t.ID, task.StateCompleted); err != nil {
+ p.logger.Error("failed to update task state", "taskID", t.ID, "state", task.StateCompleted, "error", err)
+ }
+ p.maybeUnblockParent(t.ParentTaskID)
+ }
+ if t.StoryID != "" {
+ storyID := t.StoryID
+ go func() {
+ story, getErr := p.store.GetStory(storyID)
+ if getErr != nil {
+ p.logger.Error("handleRunResult: failed to get story", "storyID", storyID, "error", getErr)
+ return
+ }
+ if story.Status == task.StoryValidating {
+ p.checkValidationResult(ctx, storyID, task.StateCompleted, "")
+ } else {
+ p.checkStoryCompletion(ctx, storyID)
+ }
+ }()
+ }
+}
+```
+
+- [ ] **Step 6: Modify `handleRunResult` — failure path, attach checker report**
+
+In the generic failure `else` case (where `exec.Status = "FAILED"` is set and `consecutiveFailures` is incremented), add after the failures increment:
+
+```go
+p.mu.Lock()
+p.consecutiveFailures[agentType]++
+p.mu.Unlock()
+// If this is a checker task, attach the failure report to the checked task.
+if t.CheckerForTaskID != "" {
+ report := exec.ErrorMsg
+ if reportErr := p.store.UpdateTaskCheckerReport(t.CheckerForTaskID, report); reportErr != nil {
+ p.logger.Error("handleRunResult: failed to set checker report", "taskID", t.CheckerForTaskID, "error", reportErr)
+ }
+}
+```
+
+Also update the checker report after summary extraction (around the `summary := exec.Summary` block), to prefer summary over error message when available. After the summary is resolved, add:
+
+```go
+if t.CheckerForTaskID != "" && exec.Status == "FAILED" && summary != "" {
+ // Overwrite the initial error-message report with the richer summary.
+ if reportErr := p.store.UpdateTaskCheckerReport(t.CheckerForTaskID, summary); reportErr != nil {
+ p.logger.Error("handleRunResult: failed to update checker report with summary", "taskID", t.CheckerForTaskID, "error", reportErr)
+ }
+}
+```
+
+- [ ] **Step 7: Run the checker tests**
+
+```bash
+cd /workspace/claudomator && go test ./internal/executor/... -run "TestPool_Checker" -v -timeout 30s
+```
+
+Expected: all PASS.
+
+- [ ] **Step 8: Run full executor tests**
+
+```bash
+cd /workspace/claudomator && go test ./internal/executor/... -race -timeout 120s
+```
+
+Expected: all PASS.
+
+- [ ] **Step 9: Commit**
+
+```bash
+git add internal/executor/executor.go internal/executor/executor_test.go
+git commit -m "feat: spawn checker task on READY; auto-accept on pass; attach report on fail"
+```
+
+---
+
+## Task 4: Story ship gate — remove auto-deploy, add explicit ship endpoint
+
+**Files:**
+- Modify: `internal/executor/executor.go`
+- Modify: `internal/api/server.go`
+- Modify: `internal/api/stories.go`
+- Test: `internal/api/server_test.go`
+
+- [ ] **Step 1: Write failing test for ship endpoint**
+
+In `internal/api/server_test.go`, add:
+
+```go
+func TestShipStory_ShippableStory_Returns202(t *testing.T) {
+ srv, store := testServer(t)
+
+ // Create a project with a deploy script (empty path — deploy will fail but that's OK for this test).
+ proj := &task.Project{
+ ID: "ship-proj-1", Name: "test", RemoteURL: "https://github.com/x/y",
+ Type: "web", DeployScript: "",
+ CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ if err := store.CreateProject(proj); err != nil {
+ t.Fatalf("CreateProject: %v", err)
+ }
+
+ story := &task.Story{
+ ID: "ship-story-1", Name: "Ship Test", ProjectID: "ship-proj-1",
+ Status: task.StoryShippable, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ if err := store.CreateStory(story); err != nil {
+ t.Fatalf("CreateStory: %v", err)
+ }
+
+ req := httptest.NewRequest("POST", "/api/stories/ship-story-1/ship", nil)
+ w := httptest.NewRecorder()
+ srv.Handler().ServeHTTP(w, req)
+
+ if w.Code != http.StatusAccepted {
+ t.Errorf("expected 202, got %d: %s", w.Code, w.Body.String())
+ }
+}
+
+func TestShipStory_NonShippable_Returns409(t *testing.T) {
+ srv, store := testServer(t)
+
+ story := &task.Story{
+ ID: "nonship-1", Name: "Not Ready", ProjectID: "",
+ Status: task.StoryInProgress, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ if err := store.CreateStory(story); err != nil {
+ t.Fatalf("CreateStory: %v", err)
+ }
+
+ req := httptest.NewRequest("POST", "/api/stories/nonship-1/ship", nil)
+ w := httptest.NewRecorder()
+ srv.Handler().ServeHTTP(w, req)
+
+ if w.Code != http.StatusConflict {
+ t.Errorf("expected 409, got %d", w.Code)
+ }
+}
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+```bash
+cd /workspace/claudomator && go test ./internal/api/... -run "TestShipStory" -v
+```
+
+Expected: FAIL — `404 page not found` (route doesn't exist yet).
+
+- [ ] **Step 3: Remove auto-deploy from `checkStoryCompletion` and add status guard**
+
+In `executor.go`, replace `checkStoryCompletion`:
+
+```go
+func (p *Pool) checkStoryCompletion(ctx context.Context, storyID string) {
+ story, err := p.store.GetStory(storyID)
+ if err != nil {
+ p.logger.Error("checkStoryCompletion: failed to get story", "storyID", storyID, "error", err)
+ return
+ }
+ if story.Status != task.StoryInProgress {
+ return // already SHIPPABLE or beyond — nothing to do
+ }
+ tasks, err := p.store.ListTasksByStory(storyID)
+ if err != nil {
+ p.logger.Error("checkStoryCompletion: failed to list tasks", "storyID", storyID, "error", err)
+ return
+ }
+ if len(tasks) == 0 {
+ return
+ }
+ topLevelCount := 0
+ for _, t := range tasks {
+ if t.ParentTaskID != "" {
+ continue // subtasks are covered by their parent
+ }
+ topLevelCount++
+ if t.State != task.StateCompleted && t.State != task.StateReady {
+ return // not all top-level tasks done
+ }
+ }
+ if topLevelCount == 0 {
+ return
+ }
+ if err := p.store.UpdateStoryStatus(storyID, task.StoryShippable); err != nil {
+ p.logger.Error("checkStoryCompletion: failed to update story status", "storyID", storyID, "error", err)
+ return
+ }
+ p.logger.Info("story transitioned to SHIPPABLE", "storyID", storyID)
+ // Deploy is now triggered explicitly by the human via POST /api/stories/{id}/ship.
+}
+```
+
+- [ ] **Step 4: Add `ShipStory` to Pool**
+
+Add after `checkStoryCompletion`:
+
+```go
+// ShipStory merges the story branch and runs the deploy script.
+// Returns an error if the story is not in SHIPPABLE state.
+func (p *Pool) ShipStory(ctx context.Context, storyID string) error {
+ story, err := p.store.GetStory(storyID)
+ if err != nil {
+ return fmt.Errorf("story not found: %w", err)
+ }
+ if story.Status != task.StoryShippable {
+ return fmt.Errorf("story is not SHIPPABLE (current status: %s)", story.Status)
+ }
+ go p.triggerStoryDeploy(ctx, storyID)
+ return nil
+}
+```
+
+- [ ] **Step 5: Register the route in `server.go`**
+
+In the `routes()` method, after the existing story routes, add:
+
+```go
+s.mux.HandleFunc("POST /api/stories/{id}/ship", s.handleShipStory)
+```
+
+- [ ] **Step 6: Add `handleShipStory` to `stories.go`**
+
+Add at the end of `stories.go`:
+
+```go
+// handleShipStory triggers the merge + deploy for a SHIPPABLE story.
+// POST /api/stories/{id}/ship
+func (s *Server) handleShipStory(w http.ResponseWriter, r *http.Request) {
+ id := r.PathValue("id")
+ if err := s.pool.ShipStory(r.Context(), id); err != nil {
+ writeJSON(w, http.StatusConflict, map[string]string{"error": err.Error()})
+ return
+ }
+ writeJSON(w, http.StatusAccepted, map[string]string{"message": "story shipping initiated", "story_id": id})
+}
+```
+
+- [ ] **Step 7: Run the ship tests**
+
+```bash
+cd /workspace/claudomator && go test ./internal/api/... -run "TestShipStory" -v
+```
+
+Expected: both PASS.
+
+- [ ] **Step 8: Run full test suite**
+
+```bash
+cd /workspace/claudomator && go test ./... -race -timeout 120s
+```
+
+Expected: all PASS.
+
+- [ ] **Step 9: Commit**
+
+```bash
+git add internal/executor/executor.go internal/api/server.go internal/api/stories.go internal/api/server_test.go
+git commit -m "feat: story ship gate — explicit POST /api/stories/{id}/ship; remove auto-deploy"
+```
+
+---
+
+## Task 5: Elaborator — acceptance criteria per story task
+
+**Files:**
+- Modify: `internal/api/elaborate.go`
+- Modify: `internal/api/stories.go`
+- Test: `internal/api/stories_test.go`
+
+- [ ] **Step 1: Write failing test**
+
+In `internal/api/stories_test.go`, find (or add) a test for story approval and verify acceptance criteria flows through:
+
+```go
+func TestApproveStory_AcceptanceCriteriaStored(t *testing.T) {
+ srv, store := testServer(t)
+
+ proj := &task.Project{
+ ID: "ac-proj", Name: "test", RemoteURL: "https://github.com/x/y",
+ Type: "web", DeployScript: "",
+ CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(),
+ }
+ store.CreateProject(proj)
+
+ body := `{
+ "name": "AC Story",
+ "branch_name": "story/ac-test",
+ "project_id": "ac-proj",
+ "tasks": [
+ {
+ "name": "Add feature",
+ "instructions": "implement the thing",
+ "acceptance_criteria": "run go test ./... and verify all pass",
+ "subtasks": []
+ }
+ ],
+ "validation": {"type": "test", "steps": [], "success_criteria": "tests pass"}
+ }`
+ req := httptest.NewRequest("POST", "/api/stories/approve", strings.NewReader(body))
+ req.Header.Set("Content-Type", "application/json")
+ w := httptest.NewRecorder()
+ srv.Handler().ServeHTTP(w, req)
+
+ if w.Code != http.StatusCreated {
+ t.Fatalf("expected 201, got %d: %s", w.Code, w.Body.String())
+ }
+
+ var resp struct {
+ TaskIDs []string `json:"task_ids"`
+ }
+ json.NewDecoder(w.Body).Decode(&resp)
+ if len(resp.TaskIDs) == 0 {
+ t.Fatal("expected task_ids in response")
+ }
+
+ tk, err := store.GetTask(resp.TaskIDs[0])
+ if err != nil {
+ t.Fatalf("GetTask: %v", err)
+ }
+ if tk.AcceptanceCriteria != "run go test ./... and verify all pass" {
+ t.Errorf("expected acceptance criteria stored on task, got %q", tk.AcceptanceCriteria)
+ }
+}
+```
+
+- [ ] **Step 2: Run test to verify it fails**
+
+```bash
+cd /workspace/claudomator && go test ./internal/api/... -run "TestApproveStory_AcceptanceCriteriaStored" -v
+```
+
+Expected: FAIL — acceptance criteria is empty on the created task.
+
+- [ ] **Step 3: Add `AcceptanceCriteria` to `elaboratedStoryTask` in `elaborate.go`**
+
+```go
+type elaboratedStoryTask struct {
+ Name string `json:"name"`
+ Instructions string `json:"instructions"`
+ AcceptanceCriteria string `json:"acceptance_criteria"`
+ Subtasks []elaboratedStorySubtask `json:"subtasks"`
+}
+```
+
+- [ ] **Step 4: Update `buildStoryElaboratePrompt` to request acceptance criteria**
+
+In `buildStoryElaboratePrompt()`, update the JSON schema in the returned string. Replace the tasks section:
+
+```go
+func buildStoryElaboratePrompt() string {
+ return `You are a software architect. Given a goal, analyze the codebase at /workspace and produce a structured implementation plan as JSON.
+
+Output ONLY valid JSON matching this schema:
+{
+ "name": "story name",
+ "branch_name": "story/kebab-case-name",
+ "tasks": [
+ {
+ "name": "task name",
+ "instructions": "detailed instructions including file paths and what to change",
+ "acceptance_criteria": "specific, verifiable conditions a separate reviewer can check — e.g. 'run go test ./... and verify all pass; confirm GET /api/foo returns 200 with expected JSON shape'",
+ "subtasks": [
+ { "name": "subtask name", "instructions": "..." }
+ ]
+ }
+ ],
+ "validation": {
+ "type": "build|test|smoke",
+ "steps": ["step1", "step2"],
+ "success_criteria": "what success looks like"
+ }
+}
+
+Rules:
+- Tasks must be independently buildable (each can be deployed alone)
+- Subtasks within a task are order-dependent and run sequentially
+- Instructions must include specific file paths, function names, and exact changes
+- Instructions must end with: git add -A && git commit -m "..." && git push origin <branch>
+- acceptance_criteria must be concrete and verifiable by a separate agent — no vague assertions like "code looks good"
+- Validation should match the scope: small change = build check; new feature = smoke test`
+}
+```
+
+- [ ] **Step 5: Pass `AcceptanceCriteria` through in `handleApproveStory`**
+
+In `stories.go`, inside `handleApproveStory`, find the task creation block (the `for _, tp := range input.Tasks` loop). Add `AcceptanceCriteria` to the `task.Task` literal:
+
+```go
+t := &task.Task{
+ ID: uuid.New().String(),
+ Name: tp.Name,
+ Project: input.ProjectID,
+ RepositoryURL: repoURL,
+ StoryID: story.ID,
+ AcceptanceCriteria: tp.AcceptanceCriteria,
+ Agent: task.AgentConfig{Type: "claude", Instructions: tp.Instructions},
+ Priority: task.PriorityNormal,
+ Tags: []string{},
+ DependsOn: []string{},
+ Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "exponential"},
+ State: task.StatePending,
+ CreatedAt: time.Now().UTC(),
+ UpdatedAt: time.Now().UTC(),
+}
+```
+
+- [ ] **Step 6: Run the test**
+
+```bash
+cd /workspace/claudomator && go test ./internal/api/... -run "TestApproveStory_AcceptanceCriteriaStored" -v
+```
+
+Expected: PASS.
+
+- [ ] **Step 7: Run full API tests**
+
+```bash
+cd /workspace/claudomator && go test ./internal/api/... -race -timeout 120s
+```
+
+Expected: all PASS.
+
+- [ ] **Step 8: Commit**
+
+```bash
+git add internal/api/elaborate.go internal/api/stories.go internal/api/stories_test.go
+git commit -m "feat: acceptance_criteria per story task in elaboration and approval"
+```
+
+---
+
+## Task 6: UI — Ship button and checker report
+
+**Files:**
+- Modify: `web/app.js`
+
+- [ ] **Step 1: Add "Ship" button to SHIPPABLE story cards**
+
+In `renderStoryCard`, after the `meta` element is appended to `card`, add:
+
+```js
+export function renderStoryCard(story, doc = document) {
+ // ... existing code building header, badge, meta ...
+
+ card.appendChild(header);
+ if (meta.children.length) card.appendChild(meta);
+
+ // Ship button for SHIPPABLE stories.
+ if (story.status === 'SHIPPABLE') {
+ const shipBtn = doc.createElement('button');
+ shipBtn.className = 'btn-primary story-ship-btn';
+ shipBtn.textContent = 'Ship';
+ shipBtn.addEventListener('click', async (e) => {
+ e.stopPropagation();
+ shipBtn.disabled = true;
+ shipBtn.textContent = 'Shipping…';
+ try {
+ const res = await fetch(`${API_BASE}/api/stories/${story.id}/ship`, { method: 'POST' });
+ if (!res.ok) {
+ const body = await res.json().catch(() => ({}));
+ alert(body.error || `Ship failed (${res.status})`);
+ shipBtn.disabled = false;
+ shipBtn.textContent = 'Ship';
+ } else {
+ renderStoriesPanel();
+ }
+ } catch {
+ shipBtn.disabled = false;
+ shipBtn.textContent = 'Ship';
+ }
+ });
+ card.appendChild(shipBtn);
+ }
+
+ return card;
+}
+```
+
+> Note: `API_BASE` is a module-level constant already defined in `app.js`. Verify it's accessible in this scope; if not, use `BASE_PATH` (also defined at module level) instead.
+
+- [ ] **Step 2: Add checker report to READY task cards**
+
+In `createTaskCard`, after the `// Error message for failed tasks` block, add:
+
+```js
+ // Checker report for READY tasks where the checker flagged a problem.
+ if (task.state === 'READY' && task.checker_report) {
+ const reportEl = document.createElement('div');
+ reportEl.className = 'task-checker-report';
+ const label = document.createElement('span');
+ label.className = 'task-checker-report-label';
+ label.textContent = '⚠ Checker flagged:';
+ const text = document.createElement('span');
+ text.textContent = task.checker_report;
+ reportEl.appendChild(label);
+ reportEl.appendChild(text);
+ card.appendChild(reportEl);
+ }
+```
+
+- [ ] **Step 3: Add CSS for checker report**
+
+In `web/style.css`, add after the `.ready-completed-label` block:
+
+```css
+.task-checker-report {
+ margin: 0.5rem 0;
+ padding: 0.5rem 0.75rem;
+ background: var(--warning-bg, rgba(255, 180, 0, 0.12));
+ border-left: 3px solid var(--warning, #f0a500);
+ border-radius: 4px;
+ font-size: 0.8rem;
+ color: var(--text);
+}
+
+.task-checker-report-label {
+ font-weight: 600;
+ margin-right: 0.4rem;
+}
+```
+
+- [ ] **Step 4: Build and verify**
+
+```bash
+cd /workspace/claudomator && go build ./...
+```
+
+Expected: no errors.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add web/app.js web/style.css
+git commit -m "feat: Ship button on SHIPPABLE stories; checker report on READY task cards"
+```
+
+---
+
+## Task 7: Full test run and deploy
+
+**Files:** none
+
+- [ ] **Step 1: Run full test suite with race detector**
+
+```bash
+cd /workspace/claudomator && go test ./... -race -timeout 120s
+```
+
+Expected: all PASS.
+
+- [ ] **Step 2: Push and deploy**
+
+```bash
+git push && sudo scripts/deploy
+```
+
+Expected: build passes, tests pass, binary installs, service restarts.
+
+---
+
+## Self-Review
+
+**Spec coverage:**
+- ✅ Checker spawned after task → READY (Task 3)
+- ✅ Checker uses acceptance_criteria or falls back to task instructions (Task 3)
+- ✅ Pass → auto-accept (READY → COMPLETED) (Task 3)
+- ✅ Fail → task stays READY + checker_report attached (Task 3)
+- ✅ No checker for subtasks or checker tasks (Task 3, guards in spawnCheckerTask)
+- ✅ Story elaborator generates acceptance_criteria per task (Task 5)
+- ✅ `checkStoryCompletion` no longer auto-deploys (Task 4)
+- ✅ `POST /api/stories/{id}/ship` endpoint (Task 4)
+- ✅ Ship button in UI (Task 6)
+- ✅ Checker report shown on READY task cards (Task 6)
+- ✅ New DB columns + migrations (Task 2)
+
+**Placeholder scan:** none found.
+
+**Type consistency:** `UpdateTaskCheckerReport(id, report string)`, `GetCheckerTask(checkedTaskID string) (*task.Task, error)`, `ShipStory(ctx context.Context, storyID string) error` — all consistent across Tasks 2, 3, 4.
diff --git a/docs/superpowers/specs/2026-04-04-task-checker-story-ship.md b/docs/superpowers/specs/2026-04-04-task-checker-story-ship.md
new file mode 100644
index 0000000..1be2f3c
--- /dev/null
+++ b/docs/superpowers/specs/2026-04-04-task-checker-story-ship.md
@@ -0,0 +1,222 @@
+# Task Checker Agent and Story Ship Gate — Design Spec
+
+**Date:** 2026-04-04
+**Status:** Approved
+
+---
+
+## Goal
+
+Reduce per-task human review burden by running an independent checker agent after every top-level task completes. If the checker passes, the task auto-accepts. Human attention is required only when the checker flags a problem. Stories accumulate completed tasks until all are done, then surface a single "Ship" action for human approval before deploy.
+
+---
+
+## Context
+
+The current flow requires a human to manually accept every READY task. For stories with many tasks this is friction — the human has to review each one before `checkStoryCompletion` can fire and ship the story. Additionally, `checkStoryCompletion` currently auto-triggers deploy with no human gate at the story level.
+
+ADR-007 describes a post-deploy validation agent (runs after merge, verifies the live deployment). This spec adds a pre-ship, per-task checker that is independent of the implementor and runs asynchronously.
+
+---
+
+## Design
+
+### Two-tier validation
+
+| Tier | When | Author | Gate |
+|---|---|---|---|
+| **Checker** (this spec) | After task → READY | Different agent from implementor | Auto-accepts on pass; leaves READY + attaches report on fail |
+| **Post-deploy validation** (existing) | After story → DEPLOYED | Separate validation task | Story → REVIEW_READY or NEEDS_FIX |
+
+### Full story flow after this change
+
+```
+Task runs → READY
+ → checker task spawned (async, independent)
+ → checker passes: task → COMPLETED (silent)
+ → checker fails: task stays READY, checker_report attached
+
+All story top-level tasks COMPLETED
+ → story → SHIPPABLE (human gate — no auto-deploy)
+
+Human clicks "Ship" on SHIPPABLE story
+ → merge branch to main + run deploy script → DEPLOYED
+ → post-deploy validation task created → VALIDATING
+ → REVIEW_READY | NEEDS_FIX
+```
+
+---
+
+## Data Model
+
+### `tasks` table — three new columns
+
+| Column | Type | Purpose |
+|---|---|---|
+| `acceptance_criteria` | `TEXT` | Criteria for the checker. Empty = use task instructions as spec. |
+| `checker_for_task_id` | `TEXT` | Set on checker tasks. Points to the task being checked. |
+| `checker_report` | `TEXT` | Populated when checker fails. Shown in the UI on the READY task. |
+
+Checker tasks have `checker_for_task_id` set and no `story_id`. They do not appear in story task lists and do not affect `checkStoryCompletion`.
+
+---
+
+## Acceptance Criteria Source
+
+**Story tasks:** The elaborator generates `acceptance_criteria` alongside each task's instructions. Example:
+
+```
+Run the full test suite and verify all tests pass.
+Confirm the /api/tasks endpoint returns repository_url in the response body.
+```
+
+**Standalone tasks (no story):** `acceptance_criteria` is empty. The checker uses the task's own `instructions` field as the specification.
+
+---
+
+## Checker Task
+
+### Creation
+
+Spawned by the executor pool when a top-level task (no `parent_task_id`, no `checker_for_task_id`) transitions to READY. The pool constructs and submits a checker task immediately:
+
+```
+Name: "Check: <original task name>"
+checker_for_task_id: <original task ID>
+story_id: (empty — checker is not a story task)
+repository_url: same as the checked task
+agent.type: claude
+agent.instructions: (see below)
+agent.max_budget_usd: 0.50
+timeout: 10m
+retry.max_attempts: 1
+```
+
+**Do not spawn a checker if:**
+- `t.ParentTaskID != ""` (subtasks go directly to COMPLETED, never READY)
+- `t.CheckerForTaskID != ""` (never check a checker)
+- A checker task already exists for this task (query by `checker_for_task_id` before spawning)
+
+### Instructions template
+
+```
+You are validating a completed task. Do not make any changes to the code or repository.
+
+Task: <name>
+Instructions given to the implementor:
+<task instructions>
+
+Acceptance criteria:
+<acceptance_criteria, or task instructions if acceptance_criteria is empty>
+
+Steps:
+1. Clone the repository and review the changes made.
+2. Verify each acceptance criterion is met. Run tests or make HTTP requests as needed.
+3. If all criteria are satisfied, exit normally (success).
+4. If any criterion is not met, use the Bash tool to exit with a non-zero code:
+ bash -c "exit 1"
+ Before exiting, write a brief summary of what failed.
+```
+
+### Completion handling
+
+In `executor.Pool.handleRunResult`, after determining the task outcome, check `t.CheckerForTaskID`:
+
+- **Checker succeeded (exit 0):** Call `store.UpdateTaskState(t.CheckerForTaskID, task.StateCompleted)`. Then call `checkStoryCompletion` for the checked task's story (if any).
+- **Checker failed (exit non-0 or error):** Extract the execution summary and call `store.UpdateTaskCheckerReport(t.CheckerForTaskID, summary)`. The checked task stays READY; the report surfaces in the UI.
+
+The checker task itself always resolves to COMPLETED or FAILED through the normal state machine — no special states needed.
+
+---
+
+## Story Ship Gate
+
+### Remove auto-deploy from `checkStoryCompletion`
+
+Current code at the end of `checkStoryCompletion`:
+```go
+go p.triggerStoryDeploy(ctx, storyID) // REMOVE THIS
+```
+
+After this change, `checkStoryCompletion` only transitions the story to SHIPPABLE. Deploy is triggered explicitly by the human.
+
+### New endpoint
+
+`POST /api/stories/{id}/ship`
+
+- Verifies story state is SHIPPABLE. Returns 409 otherwise.
+- Calls `p.triggerStoryDeploy(ctx, storyID)` (existing function, no changes needed).
+- Returns 202 Accepted immediately; deploy runs async.
+
+### UI
+
+The stories panel shows a **"Ship"** button on any story in SHIPPABLE state. No other UI changes required for the story panel. The button calls `POST /api/stories/{id}/ship`.
+
+---
+
+## Task Card UI
+
+When `checker_report` is non-empty on a READY task:
+
+- Show a warning badge on the task card (e.g. "⚠ Checker failed")
+- Expand the card or side panel to show the report text
+- Human can still accept or reject the task manually regardless of checker result
+
+When a checker is pending/running for a READY task:
+
+- Show a subtle indicator (e.g. "Checking…") — optional, nice to have
+
+---
+
+## Elaborator Changes
+
+The story elaboration endpoint currently returns a list of tasks with `name` and `instructions`. Add `acceptance_criteria` to each task in the elaborated output:
+
+```json
+{
+ "tasks": [
+ {
+ "name": "Add repository_url to task struct",
+ "instructions": "...",
+ "acceptance_criteria": "Run go test ./... and verify all tests pass. Confirm Task struct has RepositoryURL field with correct json tag."
+ }
+ ]
+}
+```
+
+The elaborator prompt should instruct the LLM to write acceptance criteria that are concrete and verifiable by a separate agent: specific commands to run, specific API responses to check, specific file contents to verify. Vague criteria like "code looks good" are not acceptable.
+
+---
+
+## Storage
+
+New methods on `storage.DB`:
+
+```go
+// UpdateTaskCheckerReport sets the checker_report field on a task.
+func (s *DB) UpdateTaskCheckerReport(id, report string) error
+
+// GetCheckerTask returns the checker task for a given task ID, or nil if none exists.
+func (s *DB) GetCheckerTask(checkedTaskID string) (*task.Task, error)
+```
+
+Migration: `ALTER TABLE tasks ADD COLUMN acceptance_criteria TEXT NOT NULL DEFAULT ''`, same for `checker_for_task_id` and `checker_report`.
+
+---
+
+## What This Does Not Change
+
+- Post-deploy validation flow (DEPLOYED → VALIDATING → REVIEW_READY/NEEDS_FIX) is unchanged.
+- Subtask handling is unchanged — subtasks never go READY, so they never get checkers.
+- The `handleAcceptTask` endpoint remains — humans can still manually accept READY tasks.
+- The `handleRejectTask` endpoint remains — humans can still manually reject.
+- Checker tasks are subject to normal rate-limiting, retry, and budget enforcement.
+
+---
+
+## Out of Scope
+
+- Checker result overriding the human (human can always accept/reject regardless)
+- Parallel checker runs (one checker per task, no re-run unless task is re-run)
+- Configurable checker agent type per project
+- Checker budget/timeout configuration beyond the defaults above