diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/IMPROVEMENT_PLAN.md | 278 | ||||
| -rw-r--r-- | docs/RAW_NARRATIVE.md | 399 | ||||
| -rw-r--r-- | docs/adr/005-sandbox-execution-model.md | 25 | ||||
| -rw-r--r-- | docs/adr/006-containerized-execution.md | 51 | ||||
| -rw-r--r-- | docs/adr/007-planning-layer-and-story-model.md | 265 | ||||
| -rw-r--r-- | docs/architecture.md | 392 | ||||
| -rw-r--r-- | docs/superpowers/plans/2026-04-03-task-project-fk.md | 837 | ||||
| -rw-r--r-- | docs/superpowers/plans/2026-04-04-task-checker-story-ship.md | 1226 | ||||
| -rw-r--r-- | docs/superpowers/specs/2026-04-04-task-checker-story-ship.md | 222 |
9 files changed, 2896 insertions, 799 deletions
diff --git a/docs/IMPROVEMENT_PLAN.md b/docs/IMPROVEMENT_PLAN.md new file mode 100644 index 0000000..db5d6a9 --- /dev/null +++ b/docs/IMPROVEMENT_PLAN.md @@ -0,0 +1,278 @@ +# Improvement Plan + +Comprehensive assessment of stability, architectural coherence, simplicity and cleanliness issues, with prioritized fixes. + +--- + +## Critical Bugs (fix immediately) + +### 1. `sandboxDir` variable shadowing — git clone failures + +**File:** `internal/executor/claude.go`, lines ~119–137 + +Both `setupSandbox` calls inside `Run()` use short variable declarations (`:=`) inside nested blocks, creating inner variables that shadow the outer `var sandboxDir string`. The outer variable is always `""`. + +**Effects:** +- `teardownSandbox` is never called → agent work discarded, changes never pushed to origin +- Sandboxes accumulate in `/tmp/claudomator-sandbox-*` until disk is full +- BLOCKED task resume always clones fresh → loses agent's partial work from first run +- Error messages omit sandbox path, making debugging harder + +**Fix:** + +```go +// In the resume path (inside `if projectDir != ""`): +var err error +sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger) // = not := + +// In the non-resume path (inside `else if projectDir != ""`): +var err error +sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger) // = not := +``` + +Both sites already declare `var err error` on the line above. Simply drop the `:` from `:=`. + +### 2. Hardcoded `master` branch in sandbox teardown + +**File:** `internal/executor/claude.go`, `teardownSandbox()`, line ~377 + +```go +git pull --rebase origin master // fails on repos using "main" +``` + +**Fix:** Detect the remote default branch before pushing: + +```go +branch, _ := exec.Command("git", "-C", sandboxDir, "rev-parse", "--abbrev-ref", "HEAD").Output() +defaultBranch := strings.TrimSpace(string(branch)) +if defaultBranch == "" || defaultBranch == "HEAD" { + defaultBranch = "main" +} +// then use defaultBranch instead of "master" +``` + +--- + +## High Priority — Stability & Correctness + +### 3. GeminiRunner is a non-functional stub in production code + +**File:** `internal/executor/gemini.go` + +`execOnce()` spawns a goroutine writing hardcoded JSON to a pipe instead of running the `gemini` binary. Any task with `agent.type: "gemini"` silently returns fake output. + +**Options (choose one):** +- **Implement properly:** Mirror ClaudeRunner structure — subprocess + process group + stream parsing + sandbox support. GeminiRunner must also implement `LogPather`. +- **Remove until ready:** Delete `gemini.go`, remove `GeminiRunner` from `NewPool` in `cli/serve.go`, and gate the `Classifier` on Claude-only operation. Add `// TODO: GeminiRunner not implemented` clearly. + +The stub in production is a correctness hazard. Removing it is safer than leaving it in place. + +### 4. Atomic execution creation + state update + +**Files:** `internal/executor/executor.go` `execute()`, `internal/storage/db.go` + +`CreateExecution` and `UpdateTaskState(RUNNING)` are two separate DB writes with no transaction. A crash between them leaves the task PENDING with an orphaned RUNNING execution record. `RecoverStaleRunning` handles the RUNNING-task case but not the PENDING-task-with-running-execution case. + +**Fix:** Add `storage.DB.CreateExecutionAndSetRunning(exec, taskID)` that wraps both writes in a single transaction. + +### 5. `context.Background()` in resume submission + +**File:** `internal/api/server.go`, `handleAnswerQuestion()` + +```go +p.SubmitResume(context.Background(), task, exec) +``` + +This detaches the resume from any server lifecycle context. If the server has a root context with a shutdown deadline, resumes submitted this way will not be cancelled. + +**Fix:** Pass the server's root context (or a derived context) instead of `Background()`. + +### 6. Pool `dispatch()` goroutine leaks on server shutdown + +**File:** `internal/executor/executor.go` + +`NewPool` starts `go p.dispatch()` but `workCh` is never closed. On server shutdown, `dispatch()` blocks forever on `<-p.workCh`. The pool has no `Shutdown()` method. + +**Fix:** Add `Pool.Shutdown(ctx context.Context)` that closes `workCh` and waits for active workers to drain. + +--- + +## Medium Priority — Architectural Coherence + +### 7. Priority field is stored but never dispatched + +**Files:** `internal/task/task.go`, `internal/executor/executor.go` + +Priority is parsed, stored in SQLite, returned via API — but `dispatch()` uses a single FIFO channel, ignoring priority entirely. + +**Options:** +- Use a priority queue (e.g., `container/heap`) keyed by `Priority` for `workCh` +- Or remove `Priority` from `Task` and documentation until implemented — don't ship half-features + +### 8. RetryConfig is stored but retry is manual + +**Files:** `internal/task/task.go`, `internal/executor/executor.go` + +`RetryConfig.MaxAttempts` and `Backoff` are stored but never read. Every "retry" is a manual user action via `/resume`. + +**Options:** +- Implement automatic retry in `handleRunResult` when `MaxAttempts > 1` and `currentAttempt < MaxAttempts` +- Or remove `RetryConfig` from the schema/API until implemented + +### 9. Changestats extracted in two places + +**Files:** `internal/executor/executor.go` `handleRunResult()`, `internal/api/server.go` `processResult()` + +Both call `task.ParseChangestatFromFile()` on the same file and write to the same row. The second write is idempotent but the duplication is confusing. + +**Fix:** Remove the extraction from `api.Server.processResult()`. The pool already handles it reliably before broadcasting the result. + +### 10. `withFailureHistory` shallow copy is unsafe + +**File:** `internal/executor/executor.go` + +```go +copy := *t // copies struct +copy.Agent = t.Agent // copies AgentConfig struct, slices share backing array +``` + +If any downstream code appends to `copy.Agent.AllowedTools`, it will mutate the original. The current code only writes to `SystemPromptAppend` (a string), so this is safe today but fragile. + +**Fix:** Deep-copy slices in AgentConfig: +```go +agent := t.Agent +agent.AllowedTools = append([]string(nil), t.Agent.AllowedTools...) +agent.DisallowedTools = append([]string(nil), t.Agent.DisallowedTools...) +agent.ContextFiles = append([]string(nil), t.Agent.ContextFiles...) +agent.AdditionalArgs = append([]string(nil), t.Agent.AdditionalArgs...) +copy.Agent = agent +``` + +### 11. Additive migration strategy has no version tracking + +**File:** `internal/storage/db.go` `migrate()` + +Migrations are applied in code order; the only idempotency guard is ignoring "column already exists" errors. There is no way to detect if a migration was partially applied, rolled back incorrectly, or was applied out of order by a different code version. + +**Fix (minimal):** Use a `schema_migrations` table with a version counter. Apply only migrations with version > current max. This prevents re-running migrations and makes the migration history explicit. + +### 12. Server test-seam fields in production struct + +**File:** `internal/api/server.go` + +```go +elaborateCmdPath string // overrides claudeBinPath in tests +validateCmdPath string // overrides claudeBinPath in tests +``` + +These exist solely to inject test paths. They are an internal testing seam leaked into the production struct. + +**Fix:** Extract an `Elaborator` interface (e.g., `Elaborate(ctx, request, workDir string) (string, error)`) and inject it. Tests inject a fake; production builds inject the real Claude-backed implementation. + +--- + +## Lower Priority — Simplicity & Cleanliness + +### 13. Classifier output parsing is brittle string manipulation + +**File:** `internal/executor/classifier.go` + +The classifier strips `"Loaded cached credentials."` by line scanning, strips markdown fences by `strings.TrimPrefix/TrimSuffix`, and falls back through multiple parsing paths. This will silently break if Gemini CLI output format changes. + +**Fix:** Request `--output-format json` and parse the structured response. Treat any parse failure as a classification error (fall back to default model), and log the raw output for debugging. + +### 14. `gitSafe` is inconsistently applied in teardownSandbox + +**File:** `internal/executor/claude.go` + +`setupSandbox` wraps all git commands with `gitSafe(...)` (adds `-c safe.directory=*`). `teardownSandbox` calls `git` directly without `gitSafe` on most commands. This means teardown can fail with "dubious ownership" errors in mixed-owner environments. + +**Fix:** Use `gitSafe` consistently in `teardownSandbox`. + +### 15. `tailFile` loads entire file into memory + +**File:** `internal/executor/claude.go` + +```go +func tailFile(path string, n int) string { + // scans all lines, keeps last n +} +``` + +For large log files this reads everything before discarding early lines. Fine in practice (stderr is typically small) but fragile if anything large is written to stderr. + +**Fix:** Use a circular buffer of n lines to avoid O(filesize) memory. + +### 16. `execute()` and `executeResume()` duplicate context/cancel/defer setup + +**File:** `internal/executor/executor.go` + +Both methods contain identical blocks for: context timeout + cancel setup, `activePerAgent` increment/decrement, `doneCh` signal, cancel registration/deregistration. This is ~40 lines duplicated. + +**Fix:** Extract `withWorkerSlot(ctx, t, fn func(ctx context.Context) error) error` that handles the common setup and calls `fn`. Both paths become 5–10 lines. + +### 17. `QuestionRegistry` and `QuestionHandler` are unused + +**Files:** `internal/executor/question.go`, `internal/executor/executor.go` + +`QuestionRegistry` is created in `NewPool` and stored on `Pool.Questions` but nothing reads from it. `QuestionHandler` interface exists but is never instantiated or wired to anything. The intended design (pool detects `tool_use` and invokes a handler) is not implemented. + +**Fix:** Remove `QuestionRegistry`, `QuestionHandler`, and `Pool.Questions` until the feature is actually needed. The current question flow (agent writes file, pool reads BlockedError) already works without them. + +### 18. Script registry endpoint is undocumented and empty + +**File:** `internal/api/scripts.go` + +The `ScriptRegistry` type and `POST /api/scripts/{name}` endpoint are defined but the registry is never populated from config. There are no example scripts. The endpoint exists in production code but can't be used. + +**Fix:** Either populate scripts from config (`[[scripts]]` entries in config.toml) with documentation, or remove the endpoint until it's ready. + +### 19. `parseStream` ignores scanner errors + +**File:** `internal/executor/claude.go` + +`parseStream()` loops over `scanner.Scan()` but never checks `scanner.Err()` after the loop. If the scanner hits a read error mid-stream, it silently stops. The function returns `nil` error, making the task appear to complete successfully when it may have lost output. + +**Fix:** +```go +if err := scanner.Err(); err != nil { + if streamErr == nil { + streamErr = fmt.Errorf("reading claude stdout: %w", err) + } +} +``` + +### 20. Log files have no size limit or rotation + +**Files:** `internal/executor/claude.go`, `internal/storage/db.go` + +Execution logs in `~/.claudomator/executions/` have no size limit, rotation, or TTL. Long-running or repeatedly-retried tasks can produce arbitrarily large log files. Old executions are never cleaned up. + +**Fix (minimal):** Add a `CleanupOldExecutions(olderThan time.Duration)` method to `storage.DB` and call it on server startup (or on a ticker). Delete execution log directories for executions older than the TTL. + +--- + +## Summary by Effort + +| # | Issue | Effort | Impact | +|---|-------|--------|--------| +| 1 | `sandboxDir` shadowing | XS (2 chars: `=` not `:=`) | Critical — root cause of git clone failures | +| 2 | Hardcoded `master` branch | S | High — breaks all `main`-branch repos | +| 3 | GeminiRunner stub | M–L | High — silent wrong output in production | +| 4 | Atomic exec creation | S | Medium — crash-recovery edge case | +| 5 | context.Background() resume | XS | Low — shutdown correctness | +| 6 | Pool shutdown | S | Medium — goroutine leak on shutdown | +| 7 | Priority dispatch | M | Medium — misleading API | +| 8 | RetryConfig unused | S | Low — misleading API | +| 9 | Changestats duplication | XS | Low — confusing code | +| 10 | Shallow copy in withFailureHistory | S | Low — latent bug | +| 11 | Migration versioning | M | Medium — operational risk | +| 12 | Test seams in Server struct | M | Low — architecture cleanliness | +| 13 | Brittle classifier parsing | S | Medium — silent classification failures | +| 14 | gitSafe inconsistency | XS | Medium — owner errors in teardown | +| 15 | tailFile memory | XS | Low — theoretical | +| 16 | execute/executeResume duplication | M | Low — maintainability | +| 17 | QuestionRegistry dead code | XS | Low — confusion | +| 18 | Script registry empty | S | Low — dead endpoint | +| 19 | parseStream ignores scanner.Err | XS | Medium — silent data loss | +| 20 | Log file cleanup | S | Medium — disk usage | diff --git a/docs/RAW_NARRATIVE.md b/docs/RAW_NARRATIVE.md deleted file mode 100644 index 834d812..0000000 --- a/docs/RAW_NARRATIVE.md +++ /dev/null @@ -1,399 +0,0 @@ -# Claudomator: Development Narrative - -This document is a chronological engineering history of the Claudomator project, -reconstructed from the git log, ADRs, and source code. - ---- - -## 1. Initial commit — core scaffolding (2e2b218) - -The project started with a single commit that established the full skeleton: -task model, executor, API server, CLI, storage layer, and reporter. The Go module -was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a -`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model, -instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was -chosen as the storage backend (see ADR-001). The executor pool used a bounded -goroutine model. The API server was plain `net/http` with no external framework. -The CLI was Cobra. - -## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d) - -Early housekeeping: added JSON struct tags to all exported types, renamed the Go -module to its final identifier, and set up the `.gitignore` to exclude the compiled -binary and local Claude settings. - -## 3. Verbose flag, logs CLI command (0377c06, f27d4f7) - -Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand -for tailing execution output. - -## 4. Embedded web UI and HTTP wiring (135d8eb) - -The first web UI was embedded into the binary using `go:embed`. This made the -binary fully self-contained: no separate static file server was needed. - -## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007) - -Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to -the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the -`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational -subtask plumbing. - -## 6. Dependency waiting and planning preamble (f527972) - -The executor gained dependency waiting: tasks with `depends_on` now block in a -polling loop until all dependencies reach `COMPLETED`. Any dependency entering a -terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`) -immediately fails the waiting task. - -The planning preamble was also introduced here — a system prompt prefix injected -into every task's instructions that explains to the agent how to write question -files, how to break tasks into subtasks via the `claudomator` CLI, and how to -commit all changes in git sandboxes. - -## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740) - -The API gained several new endpoints: -- `POST /api/elaborate` — calls Claude to expand a brief task description into - structured YAML. -- `GET /api/executions/{id}/stream` — live-streams the execution log. -- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed). -- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task. - -## 8. Web UI: tabs, new task modal, templates panel (e8d1b80) - -The web UI got a tabbed layout (Running / Done / Templates), a modal for creating -new tasks with AI-drafted instructions, and a templates panel. This was the first -version of the UI that matched the current design. - -## 9. READY state for human-in-the-loop review (6511d6e) - -A critical design point: when a top-level task's runner exits successfully, the -task does not immediately go to `COMPLETED`. Instead it transitions to `READY`, -meaning it paused for the operator to review the agent's output and explicitly -accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`. -`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`. - -This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY -and go directly to `COMPLETED` — only the root task requires human sign-off. - -## 10. Fix working_dir failures, hardcoded /root removed (3962597) - -Early deployments hardcoded `/root` as the base path for `working_dir`. This was -removed. `working_dir` is now validated to exist before the subprocess starts. - -## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4) - -Added the `scripts/` directory with `debug-execution` (inspects a specific -execution's logs) and `deploy` (builds and deploys the binary to the production -server). Added a CLI `start` command and the `version` package. - -## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444) - -A batch of features rescued from a detached-work branch: -- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json` - file before exiting. The pool detects this and transitions the task to `BLOCKED`, - storing the question for the user. `POST /api/tasks/{id}/answer` resumes the - Claude session with the user's answer injected as the next message. -- **Rate limiting**: the pool tracks which agents are rate-limited and when. - `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from - 5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline. -- **Start-next-task script**: a shell script that picks the highest-priority pending - task and starts it. - -## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3) - -The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state -and a "Start Next" button in the header that triggers the `start-next-task` script. - -## 14. Stream-level failure detection when claude exits 0 (4c0ee5c) - -Claude can exit 0 even when the task actually failed — for example when the -permission mode denies a tool_use and Claude exits politely. `parseStream` was -updated to detect `is_error: true` in the result message and -`tool_result.is_error: true` with permission-denial text, returning an error in -both cases so the task goes to `FAILED` rather than silently succeeding. - -## 15. Persist log paths at CreateExecution time (f8b5f25) - -Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the -execution record at `UpdateExecution` time (after the subprocess finished). This -prevented live log tailing. Introduced the `LogPather` interface: runners that -implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling -`CreateExecution`, making them available for streaming before the process ends. - -## 16. bypassPermissions as executor default (a33211d) - -`permission_mode` defaults to `bypassPermissions` when not set in the task YAML. -This was a deliberate trade-off: unattended automation needs to proceed without -tool-use confirmation prompts. Operators can override per-task via `permission_mode`. - -## 17. Cancel endpoint and pool cancel mechanism (3672981) - -`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map -from taskID to context cancel functions. Cancellation sends a SIGKILL to the -entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and -bash children that the claude subprocess spawned. - -## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace) - -The full BLOCKED cycle was wired end-to-end: -1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits. -2. Runner detects the file and returns `*BlockedError`. -3. Pool transitions task to `BLOCKED` and stores the question JSON. -4. User answers via `POST /api/tasks/{id}/answer`. -5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID` - and `ResumeAnswer`. -6. Runner invokes `claude --resume <session-id> -p <answer>`. - -A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`, -causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution` -returned no session ID. - -## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c) - -Resume executions now use `context.Background()` instead of inheriting a potentially -stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled -tasks can be manually restarted. - -## 20. git sandbox execution, project_dir rename (1f36e23) - -The `working_dir` field was renamed to `project_dir` across all layers (task YAML, -storage, API, UI). When `project_dir` is set, the runner no longer executes -directly in that directory. Instead it: - -1. Detects whether `project_dir` is a git repo (initialising one if not). -2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks` - to avoid permission issues with mixed-owner `.git/objects`). -3. Runs the agent in the sandbox clone. -4. After the agent exits, verifies no uncommitted changes remain and pushes - new commits to the canonical bare repo. -5. Removes the sandbox. - -On BLOCKED, the sandbox is preserved so the agent can resume where it left off -in the same working tree. - -Concurrent push conflicts (two sandboxes pushing at the same time) are handled -by a fetch-rebase-retry sequence. - -## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2) - -`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If -the transition is not allowed by the state machine, the function returns an error -and no write occurs. This is the enforcement point for the state machine invariants. - -## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97) - -The previous pool rejected `Submit` when all slots were taken. This was replaced -with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted -while the pool is at capacity are buffered in the channel and picked up as soon -as a slot opens. `Submit` now only returns an error if the channel itself is full -(which requires an enormous backlog). - -## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376) - -Several API reliability improvements: -- WebSocket connections now require an API token (if `SetAPIToken` was called) and - are capped at a configurable maximum number of clients. A ping/pong keepalive - prevents stale connections from accumulating. -- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate` - endpoints to prevent abuse. -- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of - individual handlers per script, a single handler dispatches to registered scripts - by name. - -## 24. API: extend executions and log streaming endpoints (7914153) - -`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs` -was added for fetching completed log files. Live streaming via SSE and the log -tail endpoint were polished. - -## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6) - -CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared -HTTP client, a default server URL (`http://localhost:8484`). Added the `report` -CLI subcommand for fetching execution summaries from the server. - -## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822) - -This was a major refactor over several commits: -1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`, - `"gemini"`, etc.). -2. `Pool` was changed from holding a single `ClaudeRunner` to holding a - `map[string]Runner` — one runner per agent type. -3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the - `gemini` CLI. -4. The storage layer, API handlers, elaborate/validate endpoints, and all tests - were updated to use `AgentConfig`. -5. The web UI was updated to expose agent type selection. - -## 27. Gemini-based task classification and explicit load balancing (406247b) - -`Classifier` and `pickAgent` were introduced to automate agent and model selection: - -- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available - (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active - if all agents are rate-limited. -- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick - the best model for the task. This is intentionally model-picks-model: use a fast, - cheap classifier to avoid wasting expensive tokens. - -After this commit the flow is: `execute()` → pick agent → call classifier → set -`t.Agent.Type` and `t.Agent.Model` → dispatch to runner. - -## 28. ADR-003: Security Model (93a4c85) - -The security model was documented formally: no auth, permissive CORS, `bypassPermissions` -as default, and the known risk inventory (see `docs/adr/003-security-model.md`). - -## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30) - -Running tasks became the default view. A "Running view" showing currently running -tasks alongside the 24h execution history was added. Agent type and model were -surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h. - -## 30. Quota exhaustion detection from stream (076c0fa) - -Previously, quota exhaustion (the 5-hour usage limit) was treated identically to -generic failures. `isQuotaExhausted` was introduced to distinguish it: quota -exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the -agent, rather than failing the task with a generic error. - -## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac) - -The sandbox teardown strategy was revised: instead of pushing directly into the -working copy (which fails for non-bare repos), the sandbox pushes to a bare repo -(`remote "local"` or `remote "origin"`) and the working copy is pulled separately -by the developer. This avoids permission errors from mixed-owner `.git/objects`. -The `--no-hardlinks` clone flag was added to prevent object sharing. - -## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46) - -When a top-level task exits the runner successfully but has subtasks, it transitions -to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new -`maybeUnblockParent` function is called every time a subtask completes: if all -siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is -presented for operator review. - -## 33. Stale RUNNING task recovery on server startup (9159572) - -`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for -tasks still in `RUNNING` state (left over from a previous server crash) and marks -them `FAILED`, closing their open execution records. This prevents stuck tasks -after server restarts. - -## 34. API: configurable mockRunner, async error-path tests (b33566b) - -The `api` test suite was hardened with a configurable `mockRunner` that can be -injected into the test server. Async error paths (runner returns an error, DB -update fails mid-execution) were now exercised in tests. - -## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409) - -Several storage correctness fixes: -- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`, - `idx_executions_task_id`, and `idx_executions_start_time` indexes were added. -- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions; - tests were added to catch this. -- `DeleteTask` was made atomic using a recursive CTE to delete the task and all - its subtasks in a single transaction. - -## 36. API: validate ?state= param, standardize operation response shapes (933af81) - -`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation -responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised -to return `{"status": "ok"}` on success. - -## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c) - -Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through -classification again so they pick up the latest agent/model selection logic. The -post-run error classification block was extracted into `handleRunResult` — a shared -helper called by both `execute` and `executeResume` — eliminating 60+ lines of -duplication. - -## 38. Legacy Claude field removed (b4371d0, a782bbf) - -The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir` -shim were removed. The schema is now fully generic. - -## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e) - -A documented invariant was added to the `execOnce` goroutine that kills the -subprocess process group: it cannot block indefinitely. Tests were added to verify -no goroutine leak occurs when a task is cancelled. - -## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b) - -The classifier now skips calling itself if the selected agent is rate-limited, -avoiding a redundant Gemini API call when the rate-limited agent is already known. -The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`, -`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`, -`gemini-2.5-flash`, `gemini-2.5-pro`). - -## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b) - -Two map leak bugs were fixed in the pool: -- `activePerAgent[agentType]` was decremented but never deleted when the count hit - zero, so inactive agents accumulated as dead entries. -- Expired `rateLimited[agentType]` entries were not deleted, so the map grew - unboundedly over long runs. - -## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624) - -The sandbox teardown removed the `git pull` into the working copy (which was failing -due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to -detect `"fetch first"` and `"non-fast-forward"` as the rejection signals. - -## 43. Explicit load balancing separated from classification (e033504) - -Previously the `Classifier` both picked the agent and selected the model. This was -split: `pickAgent` is deterministic code that picks the agent from the registered -runners using the load-balancing algorithm. The `Classifier` only picks the model -for the already-selected agent. This makes load balancing reliable and fast even -when the Gemini classifier is unavailable. - -## 44. Session ID fix on second block-and-resume cycle (65c7638) - -A bug was found where the second BLOCKED→answer→resume cycle passed the wrong -`--resume` session ID to Claude. The fix ensures that resume executions propagate -the original session ID rather than the new execution's UUID. - -## 45. validTransitions promoted to package-level var (3226af3) - -`validTransitions` was promoted to a package-level variable in `internal/task/task.go` -for clarity and potential reuse outside the package. ADR-002 was updated to reflect -the current state machine including the `BLOCKED→READY` transition for parent tasks. - ---- - -## Feature Summary (current state) - -| Feature | Status | -|---|---| -| Task YAML parsing, batch files | Done | -| SQLite persistence | Done | -| REST API (CRUD + lifecycle) | Done | -| WebSocket real-time events | Done | -| Claude subprocess execution | Done | -| Gemini subprocess execution | Done | -| Explicit load balancing (pickAgent) | Done | -| Gemini-based model classification | Done | -| BLOCKED / question-answer / resume | Done | -| git sandbox isolation | Done | -| Subtask creation and unblocking | Done | -| READY state / human accept-reject | Done | -| Rate-limit and quota tracking | Done | -| Stale RUNNING recovery on startup | Done | -| Per-IP rate limiter on elaborate | Done | -| Web UI (PWA) | Done | -| Push notifications (PWA) | Planned | - ---- 2026-03-16T00:56:20Z --- -Converter sudoku to rust - ---- 2026-03-16T01:14:27Z --- -For claudomator tasks that are ready, check the deployed server version against their fix commit - ---- 2026-03-16T01:17:00Z --- -For every claudomator task that is ready, display on the task whether the currently deployed server includes the commit which fixes that task diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md index b374561..0c9ef14 100644 --- a/docs/adr/005-sandbox-execution-model.md +++ b/docs/adr/005-sandbox-execution-model.md @@ -1,7 +1,7 @@ # ADR-005: Git Sandbox Execution Model ## Status -Accepted +Superseded by [ADR-006](006-containerized-execution.md) ## Context @@ -69,9 +69,13 @@ state), the sandbox is **not** torn down. The preserved sandbox allows the resumed execution to pick up the same working tree state, including any in-progress file changes made before the agent asked its question. -Resume executions (`SubmitResume`) skip sandbox setup entirely and run -directly in `project_dir`, passing `--resume <session-id>` to the agent -so Claude can continue its previous conversation. +**Known Risk: Resume skips sandbox.** Current implementation of +Resume executions (`SubmitResume`) skips sandbox setup entirely and runs +directly in `project_dir`. This is a significant behavioral divergence: if a +resumed task makes further changes, they land directly in the canonical working +copy, reintroducing the concurrent corruption and partial-work leak risks +identified in the Context section. A future iteration should ensure resumed +tasks pick up the preserved sandbox instead. ### Session ID propagation on resume @@ -113,10 +117,15 @@ The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as directory the server process inherited. - If a sandbox's push repeatedly fails (e.g. due to a bare repo that is itself broken), the task is failed with the sandbox preserved. -- If `/tmp` runs out of space (many large sandboxes), tasks will fail at - clone time. This is a known operational risk with no current mitigation. -- The `project_dir` field in task YAML must point to a git repository with - a configured `"local"` or `"origin"` remote that accepts pushes. +- **If `/tmp` runs out of space** (many large sandboxes), tasks will fail at + clone time. This is a known operational risk. Mitigations such as periodic + cleanup of old sandboxes (cron) or pre-clone disk space checks are required + as follow-up items. +- **The `project_dir` field in task YAML** must point to a git repository with + a configured `"local"` or `"origin"` remote that accepts pushes. If neither + remote exists or the push is rejected for other reasons, the task will be + marked as `FAILED` and the sandbox will be preserved for manual recovery. + ## Relevant Code Locations diff --git a/docs/adr/006-containerized-execution.md b/docs/adr/006-containerized-execution.md new file mode 100644 index 0000000..cdd1cc2 --- /dev/null +++ b/docs/adr/006-containerized-execution.md @@ -0,0 +1,51 @@ +# ADR-006: Containerized Repository-Based Execution Model + +## Status +Accepted (Supersedes ADR-005) + +## Context +ADR-005 introduced a sandbox execution model based on local git clones and pushes back to a local project directory. While this provided isolation, it had several flaws identified during early adoption: +1. **Host pollution**: Build dependencies (Node, Go, etc.) had to be installed on the host and were subject to permission issues (e.g., `/root/.nvm` access for `www-data`). +2. **Fragile Pushes**: Pushing to a checked-out local branch is non-standard and requires risky git configs. +3. **Resume Divergence**: Resumed tasks bypassed the sandbox, reintroducing corruption risks. +4. **Scale**: Local directory-based "project selection" is a hack that doesn't scale to multiple repos or environments. + +## Decision +We will move to a containerized execution model where projects are defined by canonical repository URLs and executed in isolated containers. + +### 1. Repository-Based Projects +- The `Task` model now uses `RepositoryURL` as the source of truth for the codebase. +- This replaces the fragile reliance on local `ProjectDir` paths. + +### 2. Containerized Sandboxes +- Each task execution runs in a fresh container (Docker/Podman). +- The runner clones the repository into a host-side temporary workspace and mounts it into the container. +- The container provides a "bare system" with the full build stack (Node, Go, etc.) pre-installed, isolating the host from build dependencies. + +### 3. Unified Workspace Management (including RESUME) +- Unlike ADR-005, the containerized model is designed to handle **Resume** by re-attaching to or re-mounting the same host-side workspace. +- This ensures that resumed tasks **do not** bypass the sandbox and never land directly in a production directory. + +### 4. Push to Actual Remotes +- Agents commit changes within the sandbox. +- The runner pushes these commits directly to the `RepositoryURL` (actual remote). +- If the remote is missing or the push fails, the task is marked `FAILED` and the host-side workspace is preserved for inspection. + +## Rationale +- **Isolation**: Containers prevent host pollution and ensure a consistent build environment. +- **Safety**: Repository URLs provide a standard way to manage codebases across environments. +- **Consistency**: Unified workspace management for initial runs and resumes eliminates the behavioral divergence found in ADR-005. + +## Consequences +- Requires a container runtime (Docker) on the host. +- Requires pre-built agent images (e.g., `claudomator-agent:latest`). +- **Disk Space Risk**: Host-side clones still consume `/tmp` space. Mitigation requires periodic cleanup of old workspaces or disk-space monitoring. +- **Git Config**: Repositories no longer require `receive.denyCurrentBranch = updateInstead` because we push to the remote, not a local worktree. + +## Relevant Code Locations +| Concern | File | +|---|---| +| Container Lifecycle | `internal/executor/container.go` | +| Runner Registration | `internal/cli/serve.go` | +| Task Model | `internal/task/task.go` | +| API Integration | `internal/api/server.go` | diff --git a/docs/adr/007-planning-layer-and-story-model.md b/docs/adr/007-planning-layer-and-story-model.md new file mode 100644 index 0000000..7efb66d --- /dev/null +++ b/docs/adr/007-planning-layer-and-story-model.md @@ -0,0 +1,265 @@ +# ADR-007: Planning Layer, Task Hierarchy, and Story-Gated Deployment + +**Status:** Draft +**Date:** 2026-03-19 +**Context:** Design discussion exploring the integration of Claudomator with Doot and a richer task hierarchy model. + +--- + +## Context + +Claudomator currently operates as a flat queue of tasks, each with optional subtasks (`parent_task_id`). There is no concept of grouping tasks into shippable units, no deploy automation, and no integration with personal planning tools. Separately, Doot is a personal dashboard that aggregates tasks, meals, calendar events, and bugs from third-party services (Todoist, Trello, PlanToEat, Google Calendar) into a unified `Atom` model. + +The goal of this ADR is to capture a design direction that: + +1. Integrates Claudomator into Doot as a first-class data source +2. Introduces a four-level task hierarchy (Epic → Story → Task → Subtask) +3. Defines a branching and execution model for stories +4. Establishes stories as the unit that gates deployment + +--- + +## Decision + +### 1. Claudomator as an Atom Source in Doot + +Doot already normalizes heterogeneous data sources into a unified `Atom` model (see `internal/models/atom.go`). Claudomator tasks are a natural peer to Todoist and Trello — they are their own source of truth (SQLite, full execution history) and should be surfaced in Doot's aggregation views without duplication elsewhere. + +**Design:** +- Add `SourceClaudomator AtomSource = "claudomator"` to Doot's atom model +- Implement a Claudomator API client in `internal/api/claudomator.go` (analogous to `todoist.go`, `trello.go`) +- Map Claudomator tasks to `Atom` with appropriate priority, status, and source icon +- Individual subtasks are **not** surfaced in the Doot timeline — they are execution-level details, not planning-level items + +**Rationale:** Claudomator is a peer to other task sources, not subordinate to them. Users should not need a Todoist card to track agent work — Claudomator is the source of truth for that domain. + +--- + +### 2. Four-Level Task Hierarchy + +The current flat model (task + optional subtask) is insufficient for feature-scale work. The following hierarchy is adopted: + +| Level | Name | Description | +|---|---|---| +| 4 | **Epic** | Large design initiative requiring back-and-forth, resulting in a set of stories. Lives primarily in the planning layer (Doot). Not an execution unit. | +| 3 | **Story** | A shippable slice of work. Independent and deployable on its own. Groups tasks that together constitute a releasable change. The unit that gates deployment. | +| 2 | **Task** | A feature- or bug-level unit of work. Individually buildable, but may not make sense to ship alone. Belongs to a story. | +| 1 | **Subtask** | A discrete, ordered agent action. The actual Claudomator execution unit. Belongs to a task. Performed in sequence. | + +**Key properties:** +- Stories are independently shippable — deployment is gated at this level +- Tasks are individually buildable but do not gate deployment alone +- Subtasks are the agent execution primitive — what `ContainerRunner` actually runs +- Epics are planning artifacts; they live in Doot or a future planning layer, not in Claudomator's execution model +- Scheduling prefers picking up subtasks from **already-started stories** before beginning new ones (WIP limiting) + +**Claudomator data model changes required:** +- Add `stories` table with deploy configuration and status +- Add `story_id` to tasks (foreign key to stories) +- `repository_url` moves from individual tasks to stories (all tasks in a story operate on the same repo) +- Story status is derived: all tasks completed → story is shippable + +--- + +### 3. Story-Level Branching Model + +Each story has a dedicated Git branch. Subtasks execute sequentially, each cloning the repository at the story branch's current HEAD, making commits, and pushing back before the next subtask begins. + +**Model:** One branch per story. Fresh clone + container per subtask. Subtasks commit to the story branch in sequence. + +**Properties:** + +- **Each subtask sees all prior subtask work** — it clones the story branch at HEAD, which includes all previous subtask commits +- **Clean environment per subtask** — no filesystem state leaks between subtasks; the container is ephemeral +- **Ordered execution enforced** — subtasks run strictly in order; each depends on the previous commit +- **Reviewable history** — the story branch accumulates one commit per subtask, giving a clean, auditable record before merge +- **Clear recovery points** — if subtask N fails, roll back to subtask N-1's commit, fix the subtask definition, rerun +- **Resilient to transient API failures** — transient rate-limit errors (429) do not fail the story or subtask; the executor requeues the task and "pauses" the story until the agent is unblocked, preserving the sequential chain. + +**Tradeoffs accepted:** +- Clone and container creation cost is paid per subtask (not amortized across the story). Acceptable at current usage scale. +- No parallelism within a story — subtasks are strictly sequential by design +- Concurrency lock required at the story level to prevent two subtasks running simultaneously (e.g., on retry races) + +**Rejected alternatives:** + +*Isolated commit model (fresh clone per subtask, independent branches):* Clean but subtasks cannot build on each other's work. Requires careful branch ordering and merging to assemble a story. + +*Persistent workspace per story (one container, one clone for the life of the story):* More efficient, natural continuity, but a bad subtask can corrupt the workspace for subsequent subtasks. Recovery is harder. Loses the discipline of enforced commit points. + +### Sequential Subtask Execution + +Subtasks within a story execute sequentially. This is enforced via `depends_on` links set automatically at task creation time — each subtask added to a story gets `depends_on: [previous_subtask_id]`, forming a linear chain. The existing pool dependency mechanism handles the rest. + +**Rejected alternative — pool-level story concurrency lock:** Would require the executor to become story-aware, lock state would be in-memory (fragile across restarts), and the ordering would be invisible in the data model. The `depends_on` approach is durable, inspectable, and reuses existing infrastructure. The 5-second polling delay between subtasks is an accepted tradeoff. + +--- + +### 4. Story-Gated Deployment and Agent Validation + +Deployment is triggered at the story level, not the task or subtask level. + +#### State Machine + +``` +PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY + ↘ NEEDS_FIX → IN_PROGRESS (retry) +``` + +- **SHIPPABLE:** All tasks completed. Ready to merge and deploy. +- **DEPLOYED:** Merged to main, deploy triggered. +- **VALIDATING:** Validation agent is running. +- **REVIEW_READY:** Validation passed. Awaiting human sign-off. +- **NEEDS_FIX:** Validation failed. Story returns to `IN_PROGRESS` with the validation report attached. + +#### Merge Strategy + +Merge to main first, then validate against the live deployment. No branch review phase — tests are the confidence mechanism. If test coverage is insufficient for a given story, the implementor is responsible for adding tests before marking it shippable. Branch review may be introduced later if needed. + +#### Deploy Configuration + +Stored on the story. Two project types are handled: + +| Project | Deploy trigger | What "deployed" means | +|---|---|---| +| claudomator | `git push` to local bare repo → systemd pulls and restarts | Live at `doot.terst.org` | +| nav (Android) | `git push` to GitHub → CI build action fires | APK distributed to testers via Play Store testing track | + +For nav, Claudomator does not interact with GitHub CI directly — it pushes the branch/commits; the CI action is an external trigger. "Deployed" is declared once the push succeeds; the CI result is not polled. + +#### Agent Validation + +After a story is deployed, a validation subtask is automatically created. The elaborator is responsible for specifying how validation should be performed — it has full context of what changed and can prescribe the appropriate check level. + +**Validation spec** (produced by elaborator, stored on the story): + +```yaml +validation: + type: curl # curl | tests | playwright | gradle + steps: + - "GET /api/stats — expect 200, body contains throughput[]" + - "GET /api/agents/status — expect agents array non-empty" + success_criteria: "All steps return expected responses with correct structure" +``` + +**Validation types by project:** + +| Type | When to use | What the agent does | +|---|---|---| +| `curl` | API changes, data model additions, simple UI text | HTTP requests, check status codes and response shape | +| `tests` | Logic changes with existing test coverage | Runs the project test suite against the live deployment or codebase | +| `playwright` | Subtle UI changes, interactive flows, visual correctness | Browser automation against the deployed URL | +| `gradle` | nav (Android) — any change | `./gradlew test`, `./gradlew lint`; optionally `./gradlew assembleDebug` | + +The elaborator selects `type` based on change scope. Curl is the default for small targeted changes; playwright is reserved for changes where visual or interactive correctness cannot be inferred from API responses alone. + +**Validation agent inputs:** +- The validation spec (type, steps, success_criteria) +- Deployed URL or project path +- Summary of what changed (story name + task list) + +**Validation agent outputs:** +- Structured pass/fail per step +- Evidence (response bodies, test output excerpts, screenshots for playwright) +- Overall verdict: pass → story moves to `REVIEW_READY`; fail → story moves to `NEEDS_FIX` with report attached + +Validation subtasks are governed by the same pool-level rate-limit resilience; a 429 during validation will requeue the subtask rather than failing the story. + +#### Failure Recovery + +If a subtask fails mid-story: pause the story and require human review before resuming. The options at that point are: +- Roll back to the previous subtask's commit and retry +- Amend the subtask definition and requeue + +Policy beyond this is deferred until failure patterns are observed in practice. + +--- + +## Consequences + +**Claudomator changes:** +- New `stories` table: `id, name, branch_name, project_id, deploy_config, validation_json, status` +- New `projects` table: `id, name, remote_url, local_path, type, deploy_script` +- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story → project) +- Sequential subtask ordering via auto-wired `depends_on` at task creation time +- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → merge + deploy trigger +- Post-deploy: auto-create validation subtask from story's `validation_json` spec +- Validation subtask completes → story transitions to REVIEW_READY or NEEDS_FIX +- Story state machine: PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX +- `ContainerRunner`: clone at story branch HEAD; push back to story branch after each subtask +- Deployment status check moves from task level to story level +- Elaborator output extended: `validation` block (type, steps, success_criteria) stored as `validation_json` on story +- Remove `Agent.RepositoryURL`, `Agent.ProjectDir` legacy fields, `skip_planning`, `fallbackGitInit()` +- Remove duplicate changestats extraction (keep pool-side, remove API server-side) +- Pool-level "requeue-and-skip" logic for rate-limited agents: tasks return to `QUEUED` and release worker slots if all candidate agents are blocked, allowing the system to "wait out" 429 errors without failing stories. +- Background "Recovery Scheduler" goroutine: periodically (every 30m or as hinted by API) runs minimal "test tasks" to verify agent availability and unblock the pool. + +**Doot changes:** +- New `SourceClaudomator` atom source +- Claudomator API client (`internal/api/claudomator.go`) +- Story → Atom mapper (title = story name, description = task progress e.g. "3/5 tasks done", priority from story config, deploy status) +- Task → Atom mapper (optional, feature-level visibility) +- Individual subtasks explicitly excluded from all views + +**Doot removals (dead code / superseded):** +- `bugs` table, `BugToAtom`, `SourceBug`, `TypeBug` — bug reporting becomes a thin UI shim that submits to Claudomator; nothing stored in Doot's data model +- `notes` table and all Obsidian skeleton code — never wired up +- `AddMealToPlanner()` stub — never called +- `UpdateCard()` stub — never called +- All bug handlers, templates, and store methods + +**Planning layer (future):** +- Epics live here, not in Claudomator +- Story creation via elaboration + validation flow (see below) +- WIP-limiting scheduler that prefers subtasks from started stories + +### Story Creation: Elaboration and Validation Flow + +Story creation is driven by a beefed-up version of Claudomator's existing elaboration and validation pipeline, not a YAML file or form. + +**Flow:** +1. User describes the story goal (rough, high-level) in the UI +2. Elaboration agent runs against a **local working copy** of the project (read-only mount, no clone) — reads the codebase, understands current state, produces story + task + subtask breakdown with `depends_on` chain wired +3. Validation agent checks the structure: tasks are independently buildable, subtasks properly scoped, story has a clear shippable definition, no dependency cycles +4. User reviews and approves in the UI +5. On approval: story branch created (`git checkout -b story/xxx origin/main`, pushed to remote); subtasks queued + +**Responsiveness:** +- Elaboration uses a local working copy — no clone cost, near-instant container start +- A `git fetch` (not pull) at elaboration start updates remote refs without touching the working tree +- Branch creation is deferred to approval — elaboration agent is purely read-only +- Execution clones use `git clone --reference /local/path <remote>` — reuses local object store, fetches only the delta; significantly faster than cold clone +- Rate-limit aware — if the elaboration agent is blocked, the UI surfaces the status and resumes automatically once unblocked via the Recovery Scheduler. + +### Project Registry + +The local working copy model requires a formal project concept. A `projects` table replaces the current ad-hoc `repository_url` + `working_dir` fields: + +| Field | Purpose | +|---|---| +| `id` | UUID | +| `name` | Human-readable label | +| `remote_url` | Git remote (clone target for execution) | +| `local_path` | Local working copy path (read cache for elaboration, object store for `--reference` clones) | +| `type` | `web` \| `android` — controls available validation types and deploy semantics | +| `deploy_script` | Optional path to project-specific deploy script | + +`repository_url` on stories becomes a FK to `projects`. The existing `project` string field on tasks (currently just a label) is replaced by `project_id`. `Agent.RepositoryURL`, `Agent.ProjectDir`, and `Task.RepositoryURL` are all removed — project is the single source of truth for repo location. + +**Initial registered projects:** + +| Name | Local path | Remote | Type | +|---|---|---|---| +| claudomator | `/workspace/claudomator` | local bare repo | web | +| nav | `/workspace/nav` | GitHub | android | + +--- + +## Out of Scope (for now) + +- Voice interface (noted as a future exploration, not an architectural requirement) +- Epic management tooling +- Parallelism within stories +- Branch review before merge — deferred; merge-first is the current strategy. May be revisited if confidence requires it. +- Polling GitHub CI result for nav deploys — Claudomator declares "deployed" on push success; CI outcome is out of band +- ADB / emulator-based UI validation for nav — `gradle` type covers unit and integration tests; device UI testing deferred diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index 27c7601..0000000 --- a/docs/architecture.md +++ /dev/null @@ -1,392 +0,0 @@ -# Claudomator — System Architecture - -## 1. System Purpose and Design Goals - -Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude, -Gemini), and reports results. Its primary use case is unattended, automated execution of agent -tasks with real-time status streaming to a mobile Progressive Web App. - -**Design goals:** - -- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite. -- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning. -- **Durability** — all task state survives server restarts (SQLite + WAL mode). -- **Real-time feedback** — WebSocket pushes task completion events to connected clients. -- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven - model-tier selection via a cheap Gemini classifier. -- **Git-isolated execution** — tasks that modify source code run in a temporary git clone - (sandbox) to prevent concurrent corruption and partial-work leakage. -- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before - reaching `COMPLETED`. - -See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale. - ---- - -## 2. High-Level Architecture - -```mermaid -flowchart TD - CLI["CLI\ncmd/claudomator\n(cobra)"] - API["HTTP API\ninternal/api\nREST + WebSocket"] - Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"] - Classifier["Gemini Classifier\ninternal/executor.Classifier"] - ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"] - GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"] - Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"] - Subprocess["AI subprocess\nclaude -p / gemini"] - SQLite["SQLite\ntasks.db"] - LogFiles["Log Files\n~/.claudomator/executions/"] - Hub["WebSocket Hub\ninternal/api.Hub"] - Clients["Clients\nbrowser / mobile PWA"] - Notifier["Notifier\ninternal/notify"] - - CLI -->|run / serve| API - CLI -->|direct YAML run| Pool - API -->|POST /api/tasks/run| Pool - Pool -->|pickAgent + Classify| Classifier - Pool -->|Submit / SubmitResume| ClaudeRunner - Pool -->|Submit / SubmitResume| GeminiRunner - ClaudeRunner -->|git clone| Sandbox - Sandbox -->|claude -p| Subprocess - GeminiRunner -->|gemini| Subprocess - Subprocess -->|stream-json stdout| LogFiles - Pool -->|UpdateTaskState| SQLite - Pool -->|resultCh| API - API -->|Broadcast| Hub - Hub -->|fan-out| Clients - API -->|Notify| Notifier -``` - ---- - -## 3. Component Table - -| Package | Role | Key Exported Types | -|---|---|---| -| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` | -| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` | -| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` | -| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` | -| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` | -| `internal/config` | TOML config loading; data-directory layout | `Config` | -| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` | -| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` | -| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) | -| `version` | Build-time version string | `Version` | - ---- - -## 4. Package Dependency Graph - -```mermaid -graph LR - cli["internal/cli"] - api["internal/api"] - executor["internal/executor"] - storage["internal/storage"] - task["internal/task"] - config["internal/config"] - reporter["internal/reporter"] - notify["internal/notify"] - web["web"] - version["version"] - - cli --> api - cli --> executor - cli --> storage - cli --> config - cli --> version - api --> executor - api --> storage - api --> task - api --> notify - api --> web - executor --> storage - executor --> task - reporter --> task - reporter --> storage - storage --> task -``` - ---- - -## 5. Task Execution Pipeline - -The following numbered steps trace a task from API submission to final state, with file and line -references to the key logic in each step. - -1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`. - Task is written to SQLite in `PENDING` state. - (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`) - -2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates - the `PENDING → QUEUED` transition) then `executor.Pool.Submit`. - (`internal/api/server.go:460`, `internal/executor/executor.go:125`) - -3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot - (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`. - (`internal/executor/executor.go:102`) - -4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest - active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes - `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal. - (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`) - -5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite - every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state). - (`internal/executor/executor.go:642`) - -6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING` - status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are - immediately available for tailing. - (`internal/executor/executor.go:483`) - -7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument - list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume - execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first. - (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`) - -8. **Output streaming** — stdout is written to `<LogDir>/<execID>/stdout.log`; the runner - concurrently parses the `stream-json` lines for cost and session ID. - (`internal/executor/claude.go`) - -9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error - type to a final task state and calls `storage.DB.UpdateTaskState`. - (`internal/executor/executor.go:256`) - - | Outcome | Final state | - |---|---| - | `runner.Run` → `nil`, top-level, no subtasks | `READY` | - | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` | - | `runner.Run` → `nil`, subtask | `COMPLETED` | - | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` | - | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` | - | `ctx.Err() == Canceled` | `CANCELLED` | - | quota exhausted | `BUDGET_EXCEEDED` | - | any other error | `FAILED` | - -10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults` - reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`. - (`internal/api/server.go:123`, `internal/api/server.go:129`) - -11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain, - `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task - fails and the sandbox is preserved for inspection. - (`internal/executor/claude.go:teardownSandbox`) - -12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or - `POST /api/tasks/{id}/reject` (`READY → PENDING`). - (`internal/api/server.go:487`, `internal/api/server.go:507`) - ---- - -## 6. Task State Machine - -```mermaid -stateDiagram-v2 - [*] --> PENDING : task created - - PENDING --> QUEUED : POST /run - PENDING --> CANCELLED : POST /cancel - - QUEUED --> RUNNING : pool goroutine starts - QUEUED --> CANCELLED : POST /cancel - - RUNNING --> READY : exit 0, top-level, no subtasks - RUNNING --> BLOCKED : exit 0, top-level, has subtasks - RUNNING --> BLOCKED : question.json written - RUNNING --> COMPLETED : exit 0, subtask - RUNNING --> FAILED : exit non-zero / stream error - RUNNING --> TIMED_OUT : context deadline exceeded - RUNNING --> CANCELLED : context cancelled - RUNNING --> BUDGET_EXCEEDED : quota exhausted - - READY --> COMPLETED : POST /accept - READY --> PENDING : POST /reject - - BLOCKED --> QUEUED : POST /answer - BLOCKED --> READY : all subtasks COMPLETED - - FAILED --> QUEUED : POST /run (retry) - TIMED_OUT --> QUEUED : POST /resume - CANCELLED --> QUEUED : POST /run (restart) - BUDGET_EXCEEDED --> QUEUED : POST /run (retry) - - COMPLETED --> [*] -``` - -**State definitions** (`internal/task/task.go:9`): - -| State | Meaning | -|---|---| -| `PENDING` | Created; not yet submitted for execution | -| `QUEUED` | Submitted to pool; waiting for a goroutine slot | -| `RUNNING` | Subprocess actively executing | -| `READY` | Top-level task done; awaiting operator accept/reject | -| `COMPLETED` | Fully done (only true terminal state) | -| `FAILED` | Execution error; eligible for retry | -| `TIMED_OUT` | Exceeded configured timeout; resumable | -| `CANCELLED` | Explicitly cancelled by operator | -| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` | -| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks | - -`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state -write. (`internal/task/task.go:113`) - ---- - -## 7. WebSocket Broadcast Flow - -```mermaid -sequenceDiagram - participant Runner as ClaudeRunner / GeminiRunner - participant Pool as executor.Pool - participant API as api.Server (forwardResults) - participant Hub as api.Hub - participant Client1 as WebSocket client 1 - participant Client2 as WebSocket client 2 - - Runner->>Pool: runner.Run() returns - Pool->>Pool: handleRunResult() — sets exec.Status - Pool->>SQLite: UpdateTaskState + UpdateExecution - Pool->>Pool: resultCh <- &Result{...} - API->>Pool: <-pool.Results() - API->>API: marshal task_completed JSON event - API->>Hub: hub.Broadcast(data) - Hub->>Client1: ws.Write(data) - Hub->>Client2: ws.Write(data) - API->>Notifier: notifier.Notify(Event{...}) [if set] -``` - -**Event payload** (JSON): -```json -{ - "type": "task_completed", - "task_id": "<uuid>", - "status": "READY | COMPLETED | FAILED | ...", - "exit_code": 0, - "cost_usd": 0.042, - "error": "", - "timestamp": "2026-03-11T12:00:00Z" -} -``` - -The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses -an interactive question tool (currently unused in the primary file-based `BLOCKED` flow). - -WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is -configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections. -(`internal/api/websocket.go`) - ---- - -## 8. Subtask / Parent-Task Dependency Resolution - -Claudomator supports two distinct dependency mechanisms: - -### 8a. `depends_on` — explicit task ordering - -Tasks declare `depends_on: [<task-id>, ...]` in their YAML or creation payload. When the -executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed -tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`, -`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED` -immediately. (`internal/executor/executor.go:642`) - -### 8b. Parent / subtask blocking (`BLOCKED` state) - -A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it) -transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows -an agent to dispatch subtasks and then wait for them. - -``` -Parent task runner exits 0 - ├── no subtasks → READY (normal review gate) - └── has subtasks → BLOCKED (waiting for subtasks) - │ - Each subtask completes → COMPLETED - │ - All subtasks COMPLETED? - ├── yes → maybeUnblockParent() → parent READY - └── no → parent stays BLOCKED -``` - -`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It -loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls -`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`) - -The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects -a `question.json` file in the execution log directory, it returns a `*BlockedError` containing -the question JSON and session ID. The pool stores the question in the `tasks.question_json` -column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer` -resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying -`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`, -`internal/api/server.go:221`) - ---- - -## 9. External Go Dependencies - -| Module | Version | Purpose | -|---|---|---| -| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) | -| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs | -| `github.com/spf13/cobra` | v1.10.2 | CLI command framework | -| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing | -| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server | -| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing | - ---- - -## 10. Related Documentation - -### Architectural Decision Records - -| ADR | Title | Summary | -|---|---|---| -| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale | -| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases | -| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist | -| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier | -| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation | - -### Package-Level Docs - -Per-package design notes live in [`docs/packages/`](packages/) (in progress). - -### Task YAML Reference - -```yaml -name: "My Task" -agent: - type: "claude" # "claude" | "gemini"; optional — load balancer may override - model: "sonnet" # model tier hint; optional — classifier may override - instructions: | - Do something useful. - project_dir: "/workspace/myproject" # if set, runs in a git sandbox - max_budget_usd: 1.00 - permission_mode: "bypassPermissions" # default - allowed_tools: ["Bash", "Read"] - context_files: ["README.md"] -timeout: "15m" -priority: "normal" # high | normal | low -tags: ["ci", "backend"] -depends_on: ["<task-uuid>"] # explicit ordering -parent_task_id: "<task-uuid>" # set by parent agent when creating subtasks -``` - -Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`. - -### Storage Schema - -Two tables auto-migrated on `storage.Open()`: - -- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`, - `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`, - `question_json`, `rejection_comment`, `created_at`, `updated_at` -- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`, - `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`, - `resume_session_id`, `resume_answer` - -Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`, -`executions.status`, `executions.start_time`. diff --git a/docs/superpowers/plans/2026-04-03-task-project-fk.md b/docs/superpowers/plans/2026-04-03-task-project-fk.md new file mode 100644 index 0000000..36e6a18 --- /dev/null +++ b/docs/superpowers/plans/2026-04-03-task-project-fk.md @@ -0,0 +1,837 @@ +# Task → Project FK Migration Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Replace the loose `project TEXT` and `repository_url TEXT` fields on Task with a proper `project_id TEXT` FK to the projects table, and remove the runtime-populated `BranchName` field (it belongs on Story). + +**Architecture:** Add `project_id` column to tasks table via additive migration; backfill from project name lookup via SQL; update `scanTask` to LEFT JOIN projects and resolve `repository_url` as `COALESCE(p.remote_url, t.repository_url)` so orphan (webhook) tasks still work; strip `Project`, `BranchName` from the Task struct and the three ADR-007 runtime patches that compensated for the old design. + +**Tech Stack:** Go, SQLite (database/sql + mattn/go-sqlite3), standard library only. + +--- + +## File Map + +| File | Change | +|------|--------| +| `internal/task/task.go` | Remove `Project`, `RepositoryURL`, `BranchName` fields; add `ProjectID` | +| `internal/task/validator.go` | Drop `repository_url` required; require at least one of `project_id` / `repository_url` | +| `internal/task/validator_test.go` | Update `validTask()` helper | +| `internal/task/task_test.go` | `Project` → `ProjectID` references | +| `internal/storage/db.go` | Add migration + backfill; define `taskSelectSQL` helper with LEFT JOIN; update `scanTask`, `CreateTask`, `UpdateTask`; add `GetProjectByName` | +| `internal/storage/db_test.go` | Update project field references in task tests | +| `internal/executor/executor.go` | Remove four ADR-007 patches (RepositoryURL ×2, BranchName ×2) | +| `internal/executor/container.go` | Remove `t.BranchName` fallback; always resolve from story | +| `internal/executor/container_test.go` | Remove `BranchName` from direct task construction; test via story instead | +| `internal/executor/executor_test.go` | Minor: `Project` → `ProjectID` in any task literals | +| `internal/api/server.go` | `handleCreateTask`: accept `project_id`; drop `project`/`repository_url` input fields | +| `internal/api/webhook.go` | `createCIFailureTask`: use `GetProjectByName` to set `project_id`; keep `repository_url` fallback when no DB project matches | +| `internal/api/task_view.go` | No change — `RepositoryURL` still populated by JOIN | +| `internal/api/server_test.go` | Replace `repository_url`/`project` in JSON payloads with `project_id` | +| `internal/api/webhook_test.go` | Seed DB project before assertions; check `ProjectID` not `RepositoryURL` directly | +| `internal/cli/list.go` | `t.Project` → `t.ProjectID` | +| `internal/cli/status.go` | `t.Project` → `t.ProjectID` | + +--- + +## Task 1: Update Task struct and validator + +**Files:** +- Modify: `internal/task/task.go` +- Modify: `internal/task/validator.go` + +- [ ] **Step 1: Write failing validator tests** + +In `internal/task/validator_test.go`, replace the existing `validTask()` helper and add a new failing test: + +```go +func validTask() *Task { + return &Task{ + ID: "test-id", + Name: "Valid Task", + ProjectID: "proj-1", + Agent: AgentConfig{ + Type: "claude", + Instructions: "do something", + }, + Retry: RetryConfig{MaxAttempts: 1, Backoff: "linear"}, + Priority: PriorityNormal, + } +} + +func TestValidate_MissingProjectIDAndRepositoryURL(t *testing.T) { + tk := validTask() + tk.ProjectID = "" + if err := Validate(tk); err == nil { + t.Error("expected error for missing project_id and repository_url") + } +} + +func TestValidate_RepositoryURLAloneIsValid(t *testing.T) { + tk := validTask() + tk.ProjectID = "" + tk.RepositoryURL = "https://github.com/owner/repo.git" + if err := Validate(tk); err != nil { + t.Errorf("expected no error with repository_url set, got: %v", err) + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/task/... 2>&1 | grep -E "FAIL|PASS|error" +``` +Expected: compile error — `ProjectID` undefined on Task. + +- [ ] **Step 3: Update `internal/task/task.go`** + +Replace lines 76-85: +```go +// Before: +Project string `yaml:"project" json:"project"` +RepositoryURL string `yaml:"repository_url" json:"repository_url"` +// ... +BranchName string `yaml:"-" json:"branch_name,omitempty"` + +// After: +ProjectID string `yaml:"project_id" json:"project_id"` +RepositoryURL string `yaml:"-" json:"repository_url,omitempty"` // derived at load time; not stored +``` + +Full replacement for lines 71-94: +```go +type Task struct { + ID string `yaml:"id" json:"id"` + ParentTaskID string `yaml:"parent_task_id" json:"parent_task_id"` + Name string `yaml:"name" json:"name"` + Description string `yaml:"description" json:"description"` + ProjectID string `yaml:"project_id" json:"project_id"` + RepositoryURL string `yaml:"-" json:"repository_url,omitempty"` + Agent AgentConfig `yaml:"agent" json:"agent"` + Timeout Duration `yaml:"timeout" json:"timeout"` + Retry RetryConfig `yaml:"retry" json:"retry"` + Priority Priority `yaml:"priority" json:"priority"` + Tags []string `yaml:"tags" json:"tags"` + DependsOn []string `yaml:"depends_on" json:"depends_on"` + StoryID string `yaml:"-" json:"story_id,omitempty"` + State State `yaml:"-" json:"state"` + RejectionComment string `yaml:"-" json:"rejection_comment,omitempty"` + QuestionJSON string `yaml:"-" json:"question,omitempty"` + ElaborationInput string `yaml:"-" json:"elaboration_input,omitempty"` + Summary string `yaml:"-" json:"summary,omitempty"` + Interactions []Interaction `yaml:"-" json:"interactions,omitempty"` + CreatedAt time.Time `yaml:"-" json:"created_at"` + UpdatedAt time.Time `yaml:"-" json:"updated_at"` +} +``` + +- [ ] **Step 4: Update `internal/task/validator.go`** + +Replace the `repository_url` check (lines 32-34) with: +```go +if t.ProjectID == "" && t.RepositoryURL == "" { + ve.Add("project_id or repository_url is required") +} +``` + +- [ ] **Step 5: Fix `internal/task/task_test.go`** + +Find lines that reference `task.Project` and replace with `task.ProjectID`: +```go +// Line ~106: +task := Task{ProjectID: "my-project"} +if task.ProjectID != "my-project" { + t.Errorf("expected ProjectID 'my-project', got %q", task.ProjectID) +} +// Line ~126: +if tasks[0].ProjectID != "my-project" { + t.Errorf("expected ProjectID 'my-project', got %q", tasks[0].ProjectID) +} +``` + +- [ ] **Step 6: Run tests** + +``` +go test ./internal/task/... 2>&1 +``` +Expected: all pass. + +- [ ] **Step 7: Commit** + +```bash +git add internal/task/task.go internal/task/validator.go internal/task/validator_test.go internal/task/task_test.go +git commit -m "refactor: replace Task.Project+RepositoryURL+BranchName with ProjectID FK" +``` + +--- + +## Task 2: Storage — migration, LEFT JOIN queries, scanTask + +**Files:** +- Modify: `internal/storage/db.go` + +- [ ] **Step 1: Write failing storage test** + +In `internal/storage/db_test.go`, find the test around line 1017 (`TestTask_ProjectRoundTrip` or similar) and update it to use `ProjectID`: + +```go +func TestTask_ProjectIDRoundTrip(t *testing.T) { + db := testDB(t) + + // Seed a project so the FK resolves. + proj := &task.Project{ + ID: "proj-rt", Name: "roundtrip-proj", + RemoteURL: "https://github.com/owner/rt.git", + Type: "web", + } + if err := db.UpsertProject(proj); err != nil { + t.Fatalf("UpsertProject: %v", err) + } + + now := time.Now().UTC().Truncate(time.Second) + tk := &task.Task{ + ID: "proj-task-1", + Name: "Task with project", + ProjectID: "proj-rt", + Agent: task.AgentConfig{Type: "claude", Instructions: "do x"}, + Priority: task.PriorityNormal, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"}, + Tags: []string{}, + DependsOn: []string{}, + State: task.StatePending, + CreatedAt: now, + UpdatedAt: now, + } + if err := db.CreateTask(tk); err != nil { + t.Fatalf("CreateTask: %v", err) + } + + got, err := db.GetTask("proj-task-1") + if err != nil { + t.Fatalf("GetTask: %v", err) + } + if got.ProjectID != "proj-rt" { + t.Errorf("ProjectID: want %q, got %q", "proj-rt", got.ProjectID) + } + // RepositoryURL should be derived from the project via JOIN. + if got.RepositoryURL != "https://github.com/owner/rt.git" { + t.Errorf("RepositoryURL: want derived from project, got %q", got.RepositoryURL) + } +} +``` + +- [ ] **Step 2: Run to confirm failure** + +``` +go test ./internal/storage/... 2>&1 | grep -E "FAIL|error|undefined" +``` +Expected: compile errors — `ProjectID` undefined on task struct (old DB code still references `Project`). + +- [ ] **Step 3: Add `project_id` migration and backfill to `db.go`** + +In `migrate()`, append to the `migrations` slice (after the existing `story_id` migration): +```go +`ALTER TABLE tasks ADD COLUMN project_id TEXT`, +// Backfill project_id from the project name stored in the legacy 'project' column. +`UPDATE tasks SET project_id = (SELECT id FROM projects WHERE name = tasks.project) WHERE project IS NOT NULL AND project != '' AND project_id IS NULL`, +``` + +- [ ] **Step 4: Add `taskSelectSQL` constant and update `scanTask`** + +Add a package-level constant just before `scanTask`: +```go +// taskSelectSQL is the base SELECT for all task queries. +// It LEFT JOINs projects so that RepositoryURL is always resolved: +// project-linked tasks use p.remote_url; orphan tasks fall back to t.repository_url. +const taskSelectSQL = ` +SELECT t.id, t.name, t.description, t.elaboration_input, t.project_id, + COALESCE(p.remote_url, t.repository_url, '') AS resolved_url, + t.config_json, t.priority, t.timeout_ns, t.retry_json, t.tags_json, + t.depends_on_json, t.parent_task_id, t.state, t.created_at, t.updated_at, + t.rejection_comment, t.question_json, t.summary, t.interactions_json, t.story_id +FROM tasks t +LEFT JOIN projects p ON p.id = t.project_id` +``` + +Update `scanTask` to match the new column order (21 columns): +```go +func scanTask(row scanner) (*task.Task, error) { + var ( + t task.Task + configJSON string + retryJSON string + tagsJSON string + depsJSON string + state string + priority string + timeoutNS int64 + projectID sql.NullString + resolvedURL sql.NullString + parentTaskID sql.NullString + elaborationInput sql.NullString + rejectionComment sql.NullString + questionJSON sql.NullString + summary sql.NullString + interactionsJSON sql.NullString + storyID sql.NullString + ) + err := row.Scan( + &t.ID, &t.Name, &t.Description, &elaborationInput, &projectID, &resolvedURL, + &configJSON, &priority, &timeoutNS, &retryJSON, &tagsJSON, &depsJSON, + &parentTaskID, &state, &t.CreatedAt, &t.UpdatedAt, + &rejectionComment, &questionJSON, &summary, &interactionsJSON, &storyID, + ) + t.ProjectID = projectID.String + t.RepositoryURL = resolvedURL.String + t.ElaborationInput = elaborationInput.String + t.ParentTaskID = parentTaskID.String + t.RejectionComment = rejectionComment.String + t.QuestionJSON = questionJSON.String + t.Summary = summary.String + t.StoryID = storyID.String + if err != nil { + return nil, err + } + t.State = task.State(state) + t.Priority = task.Priority(priority) + t.Timeout.Duration = time.Duration(timeoutNS) + if err := json.Unmarshal([]byte(configJSON), &t.Agent); err != nil { + return nil, fmt.Errorf("unmarshaling agent config: %w", err) + } + if err := json.Unmarshal([]byte(retryJSON), &t.Retry); err != nil { + return nil, fmt.Errorf("unmarshaling retry: %w", err) + } + if err := json.Unmarshal([]byte(tagsJSON), &t.Tags); err != nil { + return nil, fmt.Errorf("unmarshaling tags: %w", err) + } + if err := json.Unmarshal([]byte(depsJSON), &t.DependsOn); err != nil { + return nil, fmt.Errorf("unmarshaling depends_on: %w", err) + } + raw := interactionsJSON.String + if raw == "" { + raw = "[]" + } + if err := json.Unmarshal([]byte(raw), &t.Interactions); err != nil { + return nil, fmt.Errorf("unmarshaling interactions: %w", err) + } + return &t, nil +} +``` + +- [ ] **Step 5: Update all SELECT call sites to use `taskSelectSQL`** + +Replace every hard-coded SELECT string that queries tasks with `taskSelectSQL + " WHERE ..."`. + +**`GetTask`:** +```go +func (s *DB) GetTask(id string) (*task.Task, error) { + row := s.db.QueryRow(taskSelectSQL+` WHERE t.id = ?`, id) + return scanTask(row) +} +``` + +**`ListTasks`:** +```go +func (s *DB) ListTasks(filter TaskFilter) ([]*task.Task, error) { + query := taskSelectSQL + ` WHERE 1=1` + var args []interface{} + if filter.State != "" { + query += " AND t.state = ?" + args = append(args, string(filter.State)) + } + if !filter.Since.IsZero() { + query += " AND t.updated_at > ?" + args = append(args, filter.Since.UTC()) + } + query += " ORDER BY t.created_at DESC" + if filter.Limit > 0 { + query += " LIMIT ?" + args = append(args, filter.Limit) + } + rows, err := s.db.Query(query, args...) + // ... rest unchanged +``` + +**`ListSubtasks`:** +```go +func (s *DB) ListSubtasks(parentID string) ([]*task.Task, error) { + rows, err := s.db.Query(taskSelectSQL+` WHERE t.parent_task_id = ? ORDER BY t.created_at ASC`, parentID) + // ... rest unchanged +``` + +**`ResetTaskForRetry`** (inner SELECT): +```go +t, err := scanTask(tx.QueryRow(taskSelectSQL+` WHERE t.id = ?`, id)) +``` + +**`ListTasksByStory`:** +```go +rows, err := s.db.Query(taskSelectSQL+` WHERE t.story_id = ? ORDER BY t.created_at ASC`, storyID) +``` + +- [ ] **Step 6: Update `CreateTask` to write `project_id` instead of `project`/`repository_url`** + +Replace the INSERT: +```go +_, err = s.db.Exec(` + INSERT INTO tasks (id, name, description, elaboration_input, project_id, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, story_id) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, + t.ID, t.Name, t.Description, t.ElaborationInput, t.ProjectID, t.RepositoryURL, string(configJSON), string(t.Priority), + t.Timeout.Duration.Nanoseconds(), string(retryJSON), string(tagsJSON), string(depsJSON), + t.ParentTaskID, string(t.State), t.CreatedAt.UTC(), t.UpdatedAt.UTC(), t.StoryID, +) +``` + +Note: `repository_url` is kept as a fallback column for orphan (webhook) tasks with no project. + +- [ ] **Step 7: Update `TaskUpdate` struct and `UpdateTask`** + +Replace `RepositoryURL string` with `ProjectID string` in `TaskUpdate`: +```go +type TaskUpdate struct { + Name string + Description string + ProjectID string + Config task.AgentConfig + Priority task.Priority + TimeoutNS int64 + Retry task.RetryConfig + Tags []string + DependsOn []string +} +``` + +Update the UPDATE query in `UpdateTask`: +```go +result, err := s.db.Exec(` + UPDATE tasks + SET name = ?, description = ?, project_id = ?, config_json = ?, priority = ?, timeout_ns = ?, + retry_json = ?, tags_json = ?, depends_on_json = ?, state = ?, updated_at = ? + WHERE id = ?`, + u.Name, u.Description, u.ProjectID, configJSON, string(u.Priority), u.TimeoutNS, + retryJSON, tagsJSON, depsJSON, string(task.StatePending), now, id) +``` + +- [ ] **Step 8: Add `GetProjectByName`** + +```go +// GetProjectByName retrieves a project by its name field. +func (s *DB) GetProjectByName(name string) (*task.Project, error) { + row := s.db.QueryRow(`SELECT id, name, remote_url, local_path, type, deploy_script FROM projects WHERE name = ?`, name) + p := &task.Project{} + if err := row.Scan(&p.ID, &p.Name, &p.RemoteURL, &p.LocalPath, &p.Type, &p.DeployScript); err != nil { + return nil, err + } + return p, nil +} +``` + +- [ ] **Step 9: Fix the old `TestTask_ProjectRoundTrip` test in `db_test.go`** + +Find and replace the test that asserts `got.Project != "my-project"` (around line 1017). Replace the entire test with the new `TestTask_ProjectIDRoundTrip` written in Step 1 above. Delete the old test. + +Also find and replace the `TestUpdateTask` test's `RepositoryURL` field with `ProjectID`: +```go +// In the update test, change: +u := TaskUpdate{ + Name: "Updated", + ProjectID: "proj-upd", + // ... +} +// And assert: +if got.ProjectID != "proj-upd" { ... } +``` + +- [ ] **Step 10: Run storage tests** + +``` +go test -count=1 ./internal/storage/... 2>&1 +``` +Expected: all pass. + +- [ ] **Step 11: Commit** + +```bash +git add internal/storage/db.go internal/storage/db_test.go +git commit -m "feat: add project_id FK to tasks; LEFT JOIN resolves repository_url at read time" +``` + +--- + +## Task 3: Remove ADR-007 patches from executor + +**Files:** +- Modify: `internal/executor/executor.go` + +- [ ] **Step 1: Remove RepositoryURL patches** + +Delete these two blocks (they appear in both `execute` and `executeResume`): +```go +// DELETE this block (appears twice): +// Populate RepositoryURL from Project registry if missing (ADR-007). +if t.RepositoryURL == "" && t.Project != "" { + if proj, err := p.store.GetProject(t.Project); err == nil && proj.RemoteURL != "" { + t.RepositoryURL = proj.RemoteURL + } +} +``` + +- [ ] **Step 2: Remove BranchName patches** + +Delete these two blocks (also appear twice): +```go +// DELETE this block (appears twice): +// Populate BranchName from Story if missing (ADR-007). +if t.BranchName == "" && t.StoryID != "" { + if story, err := p.store.GetStory(t.StoryID); err == nil && story.BranchName != "" { + t.BranchName = story.BranchName + } +} +``` + +- [ ] **Step 3: Build check** + +``` +go build ./internal/executor/... 2>&1 +``` +Expected: clean build. + +- [ ] **Step 4: Run executor tests** + +``` +go test -count=1 ./internal/executor/... 2>&1 | tail -5 +``` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add internal/executor/executor.go +git commit -m "refactor: remove ADR-007 runtime patches for RepositoryURL and BranchName" +``` + +--- + +## Task 4: Clean up container runner + +**Files:** +- Modify: `internal/executor/container.go` +- Modify: `internal/executor/container_test.go` + +- [ ] **Step 1: Remove `t.BranchName` fallback in `container.go`** + +Find lines 154-157: +```go +// Fall back to task-level BranchName (e.g. set explicitly by executor or tests). +if storyBranch == "" { + storyBranch = t.BranchName +} +``` +Delete these four lines. The container already resolves the branch from the story via the store lookup at lines 144-153; the fallback only existed for the (now-removed) `BranchName` field. + +- [ ] **Step 2: Update container tests that set `BranchName` directly** + +In `container_test.go`, find `TestContainerRunner_ClonesStoryBranch` (around line 544). It currently sets `BranchName: "story/my-feature"` directly on the task. Replace with a story + store setup: + +```go +func TestContainerRunner_ClonesStoryBranch(t *testing.T) { + // ... existing setup ... + store := testContainerStore(t) // use the real test store + story := &task.Story{ + ID: "story-branch-1", + Name: "Branch Test", + BranchName: "story/my-feature", + ProjectID: "", + Status: task.StoryInProgress, + } + if err := store.CreateStory(story); err != nil { + t.Fatalf("CreateStory: %v", err) + } + + tk := &task.Task{ + ID: "story-branch-test", + RepositoryURL: "https://example.com/repo.git", + StoryID: "story-branch-1", + Agent: task.AgentConfig{Type: "claude"}, + } + // ... rest of assertions unchanged +``` + +Note: check whether `container_test.go` has a `testContainerStore` helper or uses a different pattern; adapt accordingly. The store reference in `ContainerRunner` is `r.Store` — ensure the runner is initialized with the test store. + +- [ ] **Step 3: Run container tests** + +``` +go test -count=1 -run TestContainerRunner ./internal/executor/... 2>&1 +``` +Expected: all pass. + +- [ ] **Step 4: Commit** + +```bash +git add internal/executor/container.go internal/executor/container_test.go +git commit -m "refactor: remove Task.BranchName fallback in container runner; always resolve from story" +``` + +--- + +## Task 5: Update API handlers + +**Files:** +- Modify: `internal/api/server.go` +- Modify: `internal/api/server_test.go` + +- [ ] **Step 1: Update `handleCreateTask` in `server.go`** + +Replace the input struct and task construction (lines ~447-488): + +```go +func (s *Server) handleCreateTask(w http.ResponseWriter, r *http.Request) { + var input struct { + Name string `json:"name"` + Description string `json:"description"` + ElaborationInput string `json:"elaboration_input"` + ProjectID string `json:"project_id"` + Agent task.AgentConfig `json:"agent"` + Claude task.AgentConfig `json:"claude"` // legacy alias + Timeout string `json:"timeout"` + Priority string `json:"priority"` + Tags []string `json:"tags"` + ParentTaskID string `json:"parent_task_id"` + } + // ... decode unchanged ... + + t := &task.Task{ + ID: uuid.New().String(), + Name: input.Name, + Description: input.Description, + ElaborationInput: input.ElaborationInput, + ProjectID: input.ProjectID, + Agent: input.Agent, + Priority: task.Priority(input.Priority), + Tags: input.Tags, + DependsOn: []string{}, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "exponential"}, + State: task.StatePending, + CreatedAt: now, + UpdatedAt: now, + ParentTaskID: input.ParentTaskID, + } + // ... rest unchanged +``` + +- [ ] **Step 2: Update `server_test.go` task payloads** + +Replace every `"repository_url": "https://github.com/user/repo"` in JSON payloads with `"project_id": "test-proj"`. + +First seed a project in `testServer` or `testServerWithRunner` so the FK is valid and `GetTask` returns `RepositoryURL` via JOIN. Add to `testServerWithRunner`: + +```go +// Seed a default project for tests that don't need a specific one. +defaultProj := &task.Project{ + ID: "test-proj", Name: "test-project", + RemoteURL: "https://github.com/user/repo", + Type: "web", +} +if err := store.UpsertProject(defaultProj); err != nil { + t.Fatalf("seed test project: %v", err) +} +``` + +Then in each test that previously sent `"repository_url": "..."`, use `"project_id": "test-proj"` instead. + +Update assertions that checked `created.Project != "test-project"` to check `created.ProjectID != "test-proj"`. + +- [ ] **Step 3: Run API tests** + +``` +go test -count=1 ./internal/api/... 2>&1 | grep -E "FAIL|ok" +``` +Expected: all pass. + +- [ ] **Step 4: Commit** + +```bash +git add internal/api/server.go internal/api/server_test.go +git commit -m "feat: handleCreateTask accepts project_id; drop project/repository_url input fields" +``` + +--- + +## Task 6: Update webhook handler + +**Files:** +- Modify: `internal/api/webhook.go` +- Modify: `internal/api/webhook_test.go` + +- [ ] **Step 1: Add `GetProjectByName` to the `Store` interface in `server.go`** + +Find the `store` interface (or wherever `GetProject` is declared in the api package). Add: +```go +GetProjectByName(name string) (*task.Project, error) +``` + +- [ ] **Step 2: Update `createCIFailureTask` in `webhook.go`** + +Replace lines 203-224: +```go +now := time.Now().UTC() +t := &task.Task{ + ID: uuid.New().String(), + Name: fmt.Sprintf("Fix CI failure: %s on %s", checkName, branch), + Agent: task.AgentConfig{ + Type: "claude", + Model: "sonnet", + Instructions: instructions, + MaxBudgetUSD: 3.0, + AllowedTools: []string{"Read", "Edit", "Bash", "Glob", "Grep"}, + }, + Priority: task.PriorityNormal, + Tags: []string{"ci", "auto"}, + DependsOn: []string{}, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "exponential"}, + State: task.StatePending, + CreatedAt: now, + UpdatedAt: now, +} + +// Resolve project_id from DB using the config project name. +// If no DB project matches, fall back to storing the repo URL directly. +if project != nil { + if dbProj, err := s.store.GetProjectByName(project.Name); err == nil { + t.ProjectID = dbProj.ID + } else { + // No DB project yet — store URL directly so the executor can clone. + t.RepositoryURL = fmt.Sprintf("https://github.com/%s.git", fullName) + } +} else { + t.RepositoryURL = fmt.Sprintf("https://github.com/%s.git", fullName) +} +``` + +- [ ] **Step 3: Update webhook tests** + +In `webhook_test.go`, seed a DB project before the assertions that check `tk.RepositoryURL`. After the migration, `RepositoryURL` is derived from the project, so seed the project: + +```go +// In TestHandleCheckRun_CreatesTask (and similar tests): +store.UpsertProject(&task.Project{ + ID: "myrepo-proj", Name: "myrepo", + RemoteURL: "https://github.com/owner/myrepo.git", + Type: "web", +}) +// ... +// Check ProjectID instead of RepositoryURL directly: +if tk.ProjectID != "myrepo-proj" { + t.Errorf("task project_id = %q, want myrepo-proj", tk.ProjectID) +} +// RepositoryURL is still populated via JOIN: +if tk.RepositoryURL != "https://github.com/owner/myrepo.git" { + t.Errorf("task repository url = %q, want https://github.com/owner/myrepo.git", tk.RepositoryURL) +} +``` + +For tests where `matchProject` returns nil (no matching project), verify `t.RepositoryURL` is set directly (fallback path). + +- [ ] **Step 4: Run webhook tests** + +``` +go test -count=1 -run TestHandle ./internal/api/... 2>&1 +``` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add internal/api/webhook.go internal/api/webhook_test.go internal/api/server.go +git commit -m "feat: webhook sets project_id via DB lookup; falls back to repository_url for unknown repos" +``` + +--- + +## Task 7: Update CLI + +**Files:** +- Modify: `internal/cli/list.go` +- Modify: `internal/cli/status.go` + +- [ ] **Step 1: Update `list.go`** + +Find line ~55: `t.ID, t.Name, t.Project, ...` +Replace with: `t.ID, t.Name, t.ProjectID, ...` + +- [ ] **Step 2: Update `status.go`** + +Find lines ~42-44: +```go +if t.Project != "" { + fmt.Printf("Project: %s\n", t.Project) +} +``` +Replace with: +```go +if t.ProjectID != "" { + fmt.Printf("Project: %s\n", t.ProjectID) +} +``` + +- [ ] **Step 3: Build check** + +``` +go build ./... 2>&1 +``` +Expected: clean. + +- [ ] **Step 4: Commit** + +```bash +git add internal/cli/list.go internal/cli/status.go +git commit -m "fix: CLI list/status use ProjectID" +``` + +--- + +## Task 8: Full test suite + production migration verification + +- [ ] **Step 1: Run full test suite with race detector** + +``` +go test -race -count=1 ./... 2>&1 | grep -E "FAIL|ok|panic" +``` +Expected: all packages pass. + +- [ ] **Step 2: Verify production DB migration** + +```bash +sqlite3 /site/doot.terst.org/data/claudomator.db " +SELECT t.id, t.name, t.project_id, COALESCE(p.remote_url, t.repository_url) as url +FROM tasks t LEFT JOIN projects p ON p.id = t.project_id +WHERE t.state NOT IN ('COMPLETED','CANCELLED') +ORDER BY t.created_at DESC LIMIT 10;" +``` +Expected: `project_id` populated for tasks that had a `project` name; URL resolves correctly. + +- [ ] **Step 3: Push and deploy** + +```bash +git push local main +sudo ./scripts/deploy +``` + +--- + +## Self-Review + +**Spec coverage check:** +- ✅ `Task.Project` removed → `ProjectID` added (Tasks 1, 2, 5, 6, 7) +- ✅ `Task.RepositoryURL` no longer stored directly; derived via LEFT JOIN (Task 2) +- ✅ `Task.BranchName` removed; container always resolves from story (Tasks 1, 4) +- ✅ ADR-007 patches removed (Task 3) +- ✅ `GetProjectByName` added to storage (Task 2 Step 8) +- ✅ Webhook falls back gracefully for unknown repos (Task 6) +- ✅ DB backfill migration in place (Task 2 Step 3) + +**Placeholder scan:** No TBDs or incomplete steps found. + +**Type consistency:** `ProjectID` used consistently throughout. `RepositoryURL` remains on Task struct (derived, not stored). `taskSelectSQL` constant referenced uniformly in all SELECT call sites. diff --git a/docs/superpowers/plans/2026-04-04-task-checker-story-ship.md b/docs/superpowers/plans/2026-04-04-task-checker-story-ship.md new file mode 100644 index 0000000..021405f --- /dev/null +++ b/docs/superpowers/plans/2026-04-04-task-checker-story-ship.md @@ -0,0 +1,1226 @@ +# Task Checker Agent and Story Ship Gate — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add an async per-task checker agent that auto-accepts passing tasks, and replace the auto-deploy story trigger with an explicit human "Ship" action. + +**Architecture:** Checker tasks are regular pool tasks with a new `checker_for_task_id` field; when they complete successfully the pool auto-accepts the linked task. `checkStoryCompletion` still transitions stories to SHIPPABLE but no longer fires the deploy — a new `POST /api/stories/{id}/ship` endpoint and "Ship" button do that instead. Story elaboration is extended to produce `acceptance_criteria` per task. + +**Tech Stack:** Go 1.25, SQLite (database/sql + go-sqlite3), vanilla JS (no framework) + +--- + +## File Map + +| File | Change | +|---|---| +| `internal/task/task.go` | Add `AcceptanceCriteria`, `CheckerForTaskID`, `CheckerReport` fields to `Task` | +| `internal/storage/db.go` | 3 migrations; extend `CreateTask`, `scanTask`, all SELECT queries; add `UpdateTaskCheckerReport`, `GetCheckerTask` | +| `internal/executor/executor.go` | Add 2 methods to `Store` interface; add `spawnCheckerTask`; modify `handleRunResult`; guard `checkStoryCompletion`; remove auto-deploy; add `ShipStory` | +| `internal/api/server.go` | Register `POST /api/stories/{id}/ship` | +| `internal/api/stories.go` | Add `handleShipStory`; pass `AcceptanceCriteria` in `handleApproveStory` | +| `internal/api/elaborate.go` | Add `AcceptanceCriteria` to `elaboratedStoryTask`; update `buildStoryElaboratePrompt` | +| `web/app.js` | Ship button on SHIPPABLE story cards; checker report on READY task cards | + +--- + +## Task 1: Task struct — three new fields + +**Files:** +- Modify: `internal/task/task.go` + +- [ ] **Step 1: Add fields to Task struct** + +In `internal/task/task.go`, add three fields after `StoryID`: + +```go +type Task struct { + ID string `yaml:"id" json:"id"` + ParentTaskID string `yaml:"parent_task_id" json:"parent_task_id"` + Name string `yaml:"name" json:"name"` + Description string `yaml:"description" json:"description"` + Project string `yaml:"project" json:"project"` + RepositoryURL string `yaml:"repository_url" json:"repository_url"` + Agent AgentConfig `yaml:"agent" json:"agent"` + Timeout Duration `yaml:"timeout" json:"timeout"` + Retry RetryConfig `yaml:"retry" json:"retry"` + Priority Priority `yaml:"priority" json:"priority"` + Tags []string `yaml:"tags" json:"tags"` + DependsOn []string `yaml:"depends_on" json:"depends_on"` + StoryID string `yaml:"-" json:"story_id,omitempty"` + BranchName string `yaml:"-" json:"branch_name,omitempty"` + AcceptanceCriteria string `yaml:"-" json:"acceptance_criteria,omitempty"` + CheckerForTaskID string `yaml:"-" json:"checker_for_task_id,omitempty"` + CheckerReport string `yaml:"-" json:"checker_report,omitempty"` + State State `yaml:"-" json:"state"` + RejectionComment string `yaml:"-" json:"rejection_comment,omitempty"` + QuestionJSON string `yaml:"-" json:"question,omitempty"` + ElaborationInput string `yaml:"-" json:"elaboration_input,omitempty"` + Summary string `yaml:"-" json:"summary,omitempty"` + Interactions []Interaction `yaml:"-" json:"interactions,omitempty"` + CreatedAt time.Time `yaml:"-" json:"created_at"` + UpdatedAt time.Time `yaml:"-" json:"updated_at"` +} +``` + +- [ ] **Step 2: Build to verify no compilation errors** + +```bash +cd /workspace/claudomator && go build ./... +``` + +Expected: no output (success). + +- [ ] **Step 3: Commit** + +```bash +git add internal/task/task.go +git commit -m "feat: add AcceptanceCriteria, CheckerForTaskID, CheckerReport to Task struct" +``` + +--- + +## Task 2: Storage — migrations, queries, two new methods + +**Files:** +- Modify: `internal/storage/db.go` +- Test: `internal/storage/db_test.go` + +- [ ] **Step 1: Write failing tests for the two new storage methods** + +Find the existing test file and add at the end: + +```go +func TestUpdateTaskCheckerReport(t *testing.T) { + db := openTestDB(t) + tk := &task.Task{ + ID: "cr-1", Name: "orig", RepositoryURL: "https://github.com/x/y", + Agent: task.AgentConfig{Type: "claude", Instructions: "x"}, + Priority: task.PriorityNormal, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"}, + Tags: []string{}, DependsOn: []string{}, + State: task.StatePending, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + if err := db.CreateTask(tk); err != nil { + t.Fatalf("CreateTask: %v", err) + } + if err := db.UpdateTaskCheckerReport("cr-1", "Tests failed: missing endpoint"); err != nil { + t.Fatalf("UpdateTaskCheckerReport: %v", err) + } + got, err := db.GetTask("cr-1") + if err != nil { + t.Fatalf("GetTask: %v", err) + } + if got.CheckerReport != "Tests failed: missing endpoint" { + t.Errorf("expected checker report, got %q", got.CheckerReport) + } +} + +func TestGetCheckerTask(t *testing.T) { + db := openTestDB(t) + checked := &task.Task{ + ID: "chk-orig", Name: "orig", RepositoryURL: "https://github.com/x/y", + Agent: task.AgentConfig{Type: "claude", Instructions: "x"}, + Priority: task.PriorityNormal, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"}, + Tags: []string{}, DependsOn: []string{}, + State: task.StatePending, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + if err := db.CreateTask(checked); err != nil { + t.Fatalf("CreateTask checked: %v", err) + } + checker := &task.Task{ + ID: "chk-checker", Name: "Check: orig", CheckerForTaskID: "chk-orig", + RepositoryURL: "https://github.com/x/y", + Agent: task.AgentConfig{Type: "claude", Instructions: "validate"}, + Priority: task.PriorityNormal, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"}, + Tags: []string{}, DependsOn: []string{}, + State: task.StatePending, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + if err := db.CreateTask(checker); err != nil { + t.Fatalf("CreateTask checker: %v", err) + } + + // Should find the checker task. + got, err := db.GetCheckerTask("chk-orig") + if err != nil { + t.Fatalf("GetCheckerTask: %v", err) + } + if got == nil || got.ID != "chk-checker" { + t.Errorf("expected checker task ID chk-checker, got %v", got) + } + + // Should return nil when no checker exists. + none, err := db.GetCheckerTask("nonexistent") + if err != nil { + t.Fatalf("GetCheckerTask nonexistent: %v", err) + } + if none != nil { + t.Errorf("expected nil for task with no checker, got %v", none) + } +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +cd /workspace/claudomator && go test ./internal/storage/... -run "TestUpdateTaskCheckerReport|TestGetCheckerTask" -v +``` + +Expected: FAIL — `db.UpdateTaskCheckerReport undefined`, `db.GetCheckerTask undefined`. + +- [ ] **Step 3: Add three migrations to `db.go`** + +In the `migrations` slice in `migrate()`, append after the `ALTER TABLE tasks ADD COLUMN story_id TEXT` entry: + +```go +`ALTER TABLE tasks ADD COLUMN acceptance_criteria TEXT NOT NULL DEFAULT ''`, +`ALTER TABLE tasks ADD COLUMN checker_for_task_id TEXT NOT NULL DEFAULT ''`, +`ALTER TABLE tasks ADD COLUMN checker_report TEXT NOT NULL DEFAULT ''`, +``` + +- [ ] **Step 4: Update `CreateTask` INSERT to include the three new columns** + +Replace the `INSERT INTO tasks` statement in `CreateTask`: + +```go +_, err = s.db.Exec(` + INSERT INTO tasks (id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, story_id, acceptance_criteria, checker_for_task_id, checker_report) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, + t.ID, t.Name, t.Description, t.ElaborationInput, t.Project, t.RepositoryURL, string(configJSON), string(t.Priority), + t.Timeout.Duration.Nanoseconds(), string(retryJSON), string(tagsJSON), string(depsJSON), + t.ParentTaskID, string(t.State), t.CreatedAt.UTC(), t.UpdatedAt.UTC(), t.StoryID, + t.AcceptanceCriteria, t.CheckerForTaskID, t.CheckerReport, + ) +``` + +- [ ] **Step 5: Update `scanTask` to scan the three new columns** + +`scanTask` currently declares local vars and calls `row.Scan(...)` with 21 positional arguments. Add three new vars and extend the scan. The new `var` block: + +```go +func scanTask(row scanner) (*task.Task, error) { + var ( + t task.Task + configJSON string + retryJSON string + tagsJSON string + depsJSON string + state string + priority string + timeoutNS int64 + parentTaskID sql.NullString + elaborationInput sql.NullString + project sql.NullString + repositoryURL sql.NullString + rejectionComment sql.NullString + questionJSON sql.NullString + summary sql.NullString + interactionsJSON sql.NullString + storyID sql.NullString + acceptanceCriteria sql.NullString + checkerForTaskID sql.NullString + checkerReport sql.NullString + ) + err := row.Scan( + &t.ID, &t.Name, &t.Description, &elaborationInput, &project, &repositoryURL, + &configJSON, &priority, &timeoutNS, &retryJSON, &tagsJSON, &depsJSON, + &parentTaskID, &state, &t.CreatedAt, &t.UpdatedAt, + &rejectionComment, &questionJSON, &summary, &interactionsJSON, &storyID, + &acceptanceCriteria, &checkerForTaskID, &checkerReport, + ) + t.ParentTaskID = parentTaskID.String + t.ElaborationInput = elaborationInput.String + t.Project = project.String + t.RepositoryURL = repositoryURL.String + t.RejectionComment = rejectionComment.String + t.QuestionJSON = questionJSON.String + t.Summary = summary.String + t.StoryID = storyID.String + t.AcceptanceCriteria = acceptanceCriteria.String + t.CheckerForTaskID = checkerForTaskID.String + t.CheckerReport = checkerReport.String + // ... rest of function unchanged +``` + +- [ ] **Step 6: Update all SELECT queries to include the three new columns** + +There are five SELECT statements that need `acceptance_criteria, checker_for_task_id, checker_report` appended to the column list. The pattern to find: every query with `story_id FROM tasks`. Update each one: + +In `GetTask` (line ~185): +```go +row := s.db.QueryRow(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE id = ?`, id) +``` + +In `ListTasks` (line ~191): +```go +query := `SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE 1=1` +``` + +In `ListSubtasks` (line ~227): +```go +rows, err := s.db.Query(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE parent_task_id = ? ORDER BY created_at ASC`, parentID) +``` + +In `ResetTaskForRetry` (line ~280): +```go +t, err := scanTask(tx.QueryRow(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE id = ?`, id)) +``` + +In `ListTasksByStory` (line ~1202): +```go +`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE story_id = ? ORDER BY created_at ASC`, +``` + +- [ ] **Step 7: Add `UpdateTaskCheckerReport`** + +Add after `UpdateTaskSummary`: + +```go +// UpdateTaskCheckerReport sets the checker_report field on a task. +func (s *DB) UpdateTaskCheckerReport(id, report string) error { + now := time.Now().UTC() + _, err := s.db.Exec(`UPDATE tasks SET checker_report = ?, updated_at = ? WHERE id = ?`, report, now, id) + return err +} +``` + +- [ ] **Step 8: Add `GetCheckerTask`** + +Add after `UpdateTaskCheckerReport`: + +```go +// GetCheckerTask returns the checker task for the given checked task ID, +// or nil if no checker task exists. +func (s *DB) GetCheckerTask(checkedTaskID string) (*task.Task, error) { + row := s.db.QueryRow(`SELECT id, name, description, elaboration_input, project, repository_url, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, created_at, updated_at, rejection_comment, question_json, summary, interactions_json, story_id, acceptance_criteria, checker_for_task_id, checker_report FROM tasks WHERE checker_for_task_id = ? LIMIT 1`, checkedTaskID) + t, err := scanTask(row) + if err == sql.ErrNoRows { + return nil, nil + } + return t, err +} +``` + +- [ ] **Step 9: Run the failing tests to verify they pass** + +```bash +cd /workspace/claudomator && go test ./internal/storage/... -run "TestUpdateTaskCheckerReport|TestGetCheckerTask" -v +``` + +Expected: PASS. + +- [ ] **Step 10: Run full storage tests** + +```bash +cd /workspace/claudomator && go test ./internal/storage/... -v +``` + +Expected: all PASS. + +- [ ] **Step 11: Commit** + +```bash +git add internal/storage/db.go internal/storage/db_test.go +git commit -m "feat: add checker task columns, UpdateTaskCheckerReport, GetCheckerTask" +``` + +--- + +## Task 3: Executor — checker task spawn and completion handling + +**Files:** +- Modify: `internal/executor/executor.go` +- Modify: `internal/executor/executor_test.go` + +- [ ] **Step 1: Add two methods to executor's `Store` interface** + +In `executor.go`, the `Store` interface (around line 22). Add after `CreateTask`: + +```go +UpdateTaskCheckerReport(id, report string) error +GetCheckerTask(checkedTaskID string) (*task.Task, error) +``` + +- [ ] **Step 2: Write failing tests** + +In `executor_test.go`, add: + +```go +func TestPool_CheckerSpawned_OnReady(t *testing.T) { + store := testStore(t) + runner := &mockRunner{} // succeeds instantly + pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError}))) + + tk := makeTask("checker-spawn-1") + tk.RepositoryURL = "https://github.com/x/y" + store.CreateTask(tk) + pool.Submit(context.Background(), tk) + <-pool.Results() // wait for original task to finish + + // Give the async spawnCheckerTask goroutine a moment to run. + time.Sleep(200 * time.Millisecond) + + checker, err := store.GetCheckerTask("checker-spawn-1") + if err != nil { + t.Fatalf("GetCheckerTask: %v", err) + } + if checker == nil { + t.Fatal("expected a checker task to be created, got nil") + } + if checker.CheckerForTaskID != "checker-spawn-1" { + t.Errorf("expected CheckerForTaskID=checker-spawn-1, got %q", checker.CheckerForTaskID) + } +} + +func TestPool_CheckerNotSpawned_ForSubtask(t *testing.T) { + store := testStore(t) + runner := &mockRunner{} + pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError}))) + + parent := makeTask("no-checker-parent") + parent.RepositoryURL = "https://github.com/x/y" + store.CreateTask(parent) + + sub := makeTask("no-checker-sub") + sub.ParentTaskID = "no-checker-parent" + sub.RepositoryURL = "https://github.com/x/y" + store.CreateTask(sub) + + pool.Submit(context.Background(), sub) + <-pool.Results() + + time.Sleep(100 * time.Millisecond) + + checker, err := store.GetCheckerTask("no-checker-sub") + if err != nil { + t.Fatalf("GetCheckerTask: %v", err) + } + if checker != nil { + t.Error("expected no checker for subtask, but one was created") + } +} + +func TestPool_CheckerPass_AutoAcceptsTask(t *testing.T) { + store := testStore(t) + // Two-phase: first runner succeeds (original task), second also succeeds (checker). + callCount := 0 + runner := &mockRunner{ + onRun: func(t *task.Task, e *storage.Execution) error { + callCount++ + return nil // both original and checker succeed + }, + } + pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError}))) + + tk := makeTask("autoaccept-1") + tk.RepositoryURL = "https://github.com/x/y" + store.CreateTask(tk) + pool.Submit(context.Background(), tk) + <-pool.Results() // original finishes → READY + checker spawned + + // Wait for checker to run and complete. + deadline := time.Now().Add(5 * time.Second) + for time.Now().Before(deadline) { + got, _ := store.GetTask("autoaccept-1") + if got != nil && got.State == task.StateCompleted { + break + } + <-pool.Results() + } + + got, err := store.GetTask("autoaccept-1") + if err != nil { + t.Fatalf("GetTask: %v", err) + } + if got.State != task.StateCompleted { + t.Errorf("expected COMPLETED after checker pass, got %s", got.State) + } +} + +func TestPool_CheckerFail_AttachesReport(t *testing.T) { + store := testStore(t) + callCount := 0 + runner := &mockRunner{ + onRun: func(t *task.Task, e *storage.Execution) error { + callCount++ + if t.CheckerForTaskID != "" { + return fmt.Errorf("test suite failed: 3 failures") + } + return nil // original task succeeds + }, + } + pool := NewPool(2, map[string]Runner{"claude": runner}, store, slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError}))) + + tk := makeTask("fail-checker-1") + tk.RepositoryURL = "https://github.com/x/y" + store.CreateTask(tk) + pool.Submit(context.Background(), tk) + <-pool.Results() // original → READY + + // Wait for checker to fail. + deadline := time.Now().Add(5 * time.Second) + for time.Now().Before(deadline) { + got, _ := store.GetTask("fail-checker-1") + if got != nil && got.CheckerReport != "" { + break + } + select { + case <-pool.Results(): + case <-time.After(100 * time.Millisecond): + } + } + + got, err := store.GetTask("fail-checker-1") + if err != nil { + t.Fatalf("GetTask: %v", err) + } + if got.State != task.StateReady { + t.Errorf("expected task to stay READY after checker fail, got %s", got.State) + } + if got.CheckerReport == "" { + t.Error("expected checker_report to be set after checker failure") + } +} +``` + +- [ ] **Step 3: Run tests to verify they fail** + +```bash +cd /workspace/claudomator && go test ./internal/executor/... -run "TestPool_Checker" -v 2>&1 | head -30 +``` + +Expected: FAIL — `store.UpdateTaskCheckerReport undefined`, `store.GetCheckerTask undefined`, `spawnCheckerTask undefined`. + +- [ ] **Step 4: Add `spawnCheckerTask` to `executor.go`** + +Add this function after `checkStoryCompletion`: + +```go +// spawnCheckerTask creates and submits a checker task for the given completed task. +// Guards: not called for subtasks, checker tasks, or tasks that already have a checker. +func (p *Pool) spawnCheckerTask(ctx context.Context, checked *task.Task) { + // Never spawn a checker for subtasks or checker tasks themselves. + if checked.ParentTaskID != "" || checked.CheckerForTaskID != "" { + return + } + // Idempotent: don't create a second checker if one already exists. + existing, err := p.store.GetCheckerTask(checked.ID) + if err != nil { + p.logger.Error("spawnCheckerTask: GetCheckerTask failed", "taskID", checked.ID, "error", err) + return + } + if existing != nil { + return + } + + criteria := checked.AcceptanceCriteria + if criteria == "" { + criteria = checked.Agent.Instructions + } + + instructions := fmt.Sprintf(`You are validating a completed task. Do not make any changes to the code or repository. + +Task: %s +Instructions given to the implementor: +%s + +Acceptance criteria: +%s + +Steps: +1. Clone the repository and review the changes made. +2. Verify each acceptance criterion is met. Run tests or make HTTP requests as needed. +3. If all criteria are satisfied, exit normally (success). +4. If any criterion is not met, use the Bash tool to exit with a non-zero code: + bash -c "exit 1" + Before exiting, write a brief summary of what failed.`, checked.Name, checked.Agent.Instructions, criteria) + + now := time.Now().UTC() + checker := &task.Task{ + ID: uuid.New().String(), + Name: "Check: " + checked.Name, + CheckerForTaskID: checked.ID, + RepositoryURL: checked.RepositoryURL, + Agent: task.AgentConfig{ + Type: "claude", + Instructions: instructions, + MaxBudgetUSD: 0.50, + AllowedTools: []string{"Bash", "Read", "Glob", "Grep"}, + }, + Timeout: task.Duration{Duration: 10 * time.Minute}, + Priority: task.PriorityNormal, + Tags: []string{}, + DependsOn: []string{}, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "linear"}, + State: task.StatePending, + CreatedAt: now, + UpdatedAt: now, + } + + if err := p.store.CreateTask(checker); err != nil { + p.logger.Error("spawnCheckerTask: CreateTask failed", "error", err) + return + } + checker.State = task.StateQueued + if err := p.store.UpdateTaskState(checker.ID, task.StateQueued); err != nil { + p.logger.Error("spawnCheckerTask: UpdateTaskState failed", "error", err) + return + } + if err := p.Submit(ctx, checker); err != nil { + p.logger.Error("spawnCheckerTask: Submit failed", "error", err) + } +} +``` + +- [ ] **Step 5: Modify `handleRunResult` — success path** + +Find the success branch in `handleRunResult` (the `} else {` block after all the error handling). Currently it looks like: + +```go +} else { + p.mu.Lock() + p.consecutiveFailures[agentType] = 0 + p.mu.Unlock() + if t.ParentTaskID == "" { + subtasks, subErr := p.store.ListSubtasks(t.ID) + // ... + if subErr == nil && len(subtasks) > 0 { + exec.Status = "BLOCKED" + if err := p.store.UpdateTaskState(t.ID, task.StateBlocked); err != nil { ... } + } else { + exec.Status = "READY" + if err := p.store.UpdateTaskState(t.ID, task.StateReady); err != nil { ... } + } + } else { + exec.Status = "COMPLETED" + if err := p.store.UpdateTaskState(t.ID, task.StateCompleted); err != nil { ... } + p.maybeUnblockParent(t.ParentTaskID) + } + if t.StoryID != "" { + // ...checkStoryCompletion / checkValidationResult + } +} +``` + +Replace it with: + +```go +} else { + p.mu.Lock() + p.consecutiveFailures[agentType] = 0 + p.mu.Unlock() + if t.CheckerForTaskID != "" { + // Checker task succeeded — auto-accept the checked task. + exec.Status = "COMPLETED" + if err := p.store.UpdateTaskState(t.ID, task.StateCompleted); err != nil { + p.logger.Error("handleRunResult: failed to complete checker task", "taskID", t.ID, "error", err) + } + checkedTask, getErr := p.store.GetTask(t.CheckerForTaskID) + if getErr == nil { + if acceptErr := p.store.UpdateTaskState(t.CheckerForTaskID, task.StateCompleted); acceptErr != nil { + p.logger.Error("handleRunResult: failed to auto-accept checked task", "taskID", t.CheckerForTaskID, "error", acceptErr) + } else if checkedTask.StoryID != "" { + go p.checkStoryCompletion(ctx, checkedTask.StoryID) + } + } else { + p.logger.Error("handleRunResult: failed to get checked task", "taskID", t.CheckerForTaskID, "error", getErr) + } + } else if t.ParentTaskID == "" { + subtasks, subErr := p.store.ListSubtasks(t.ID) + if subErr != nil { + p.logger.Error("failed to list subtasks", "taskID", t.ID, "error", subErr) + } + if subErr == nil && len(subtasks) > 0 { + exec.Status = "BLOCKED" + if err := p.store.UpdateTaskState(t.ID, task.StateBlocked); err != nil { + p.logger.Error("failed to update task state", "taskID", t.ID, "state", task.StateBlocked, "error", err) + } + } else { + exec.Status = "READY" + if err := p.store.UpdateTaskState(t.ID, task.StateReady); err != nil { + p.logger.Error("failed to update task state", "taskID", t.ID, "state", task.StateReady, "error", err) + } + go p.spawnCheckerTask(ctx, t) + } + } else { + exec.Status = "COMPLETED" + if err := p.store.UpdateTaskState(t.ID, task.StateCompleted); err != nil { + p.logger.Error("failed to update task state", "taskID", t.ID, "state", task.StateCompleted, "error", err) + } + p.maybeUnblockParent(t.ParentTaskID) + } + if t.StoryID != "" { + storyID := t.StoryID + go func() { + story, getErr := p.store.GetStory(storyID) + if getErr != nil { + p.logger.Error("handleRunResult: failed to get story", "storyID", storyID, "error", getErr) + return + } + if story.Status == task.StoryValidating { + p.checkValidationResult(ctx, storyID, task.StateCompleted, "") + } else { + p.checkStoryCompletion(ctx, storyID) + } + }() + } +} +``` + +- [ ] **Step 6: Modify `handleRunResult` — failure path, attach checker report** + +In the generic failure `else` case (where `exec.Status = "FAILED"` is set and `consecutiveFailures` is incremented), add after the failures increment: + +```go +p.mu.Lock() +p.consecutiveFailures[agentType]++ +p.mu.Unlock() +// If this is a checker task, attach the failure report to the checked task. +if t.CheckerForTaskID != "" { + report := exec.ErrorMsg + if reportErr := p.store.UpdateTaskCheckerReport(t.CheckerForTaskID, report); reportErr != nil { + p.logger.Error("handleRunResult: failed to set checker report", "taskID", t.CheckerForTaskID, "error", reportErr) + } +} +``` + +Also update the checker report after summary extraction (around the `summary := exec.Summary` block), to prefer summary over error message when available. After the summary is resolved, add: + +```go +if t.CheckerForTaskID != "" && exec.Status == "FAILED" && summary != "" { + // Overwrite the initial error-message report with the richer summary. + if reportErr := p.store.UpdateTaskCheckerReport(t.CheckerForTaskID, summary); reportErr != nil { + p.logger.Error("handleRunResult: failed to update checker report with summary", "taskID", t.CheckerForTaskID, "error", reportErr) + } +} +``` + +- [ ] **Step 7: Run the checker tests** + +```bash +cd /workspace/claudomator && go test ./internal/executor/... -run "TestPool_Checker" -v -timeout 30s +``` + +Expected: all PASS. + +- [ ] **Step 8: Run full executor tests** + +```bash +cd /workspace/claudomator && go test ./internal/executor/... -race -timeout 120s +``` + +Expected: all PASS. + +- [ ] **Step 9: Commit** + +```bash +git add internal/executor/executor.go internal/executor/executor_test.go +git commit -m "feat: spawn checker task on READY; auto-accept on pass; attach report on fail" +``` + +--- + +## Task 4: Story ship gate — remove auto-deploy, add explicit ship endpoint + +**Files:** +- Modify: `internal/executor/executor.go` +- Modify: `internal/api/server.go` +- Modify: `internal/api/stories.go` +- Test: `internal/api/server_test.go` + +- [ ] **Step 1: Write failing test for ship endpoint** + +In `internal/api/server_test.go`, add: + +```go +func TestShipStory_ShippableStory_Returns202(t *testing.T) { + srv, store := testServer(t) + + // Create a project with a deploy script (empty path — deploy will fail but that's OK for this test). + proj := &task.Project{ + ID: "ship-proj-1", Name: "test", RemoteURL: "https://github.com/x/y", + Type: "web", DeployScript: "", + CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + if err := store.CreateProject(proj); err != nil { + t.Fatalf("CreateProject: %v", err) + } + + story := &task.Story{ + ID: "ship-story-1", Name: "Ship Test", ProjectID: "ship-proj-1", + Status: task.StoryShippable, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + if err := store.CreateStory(story); err != nil { + t.Fatalf("CreateStory: %v", err) + } + + req := httptest.NewRequest("POST", "/api/stories/ship-story-1/ship", nil) + w := httptest.NewRecorder() + srv.Handler().ServeHTTP(w, req) + + if w.Code != http.StatusAccepted { + t.Errorf("expected 202, got %d: %s", w.Code, w.Body.String()) + } +} + +func TestShipStory_NonShippable_Returns409(t *testing.T) { + srv, store := testServer(t) + + story := &task.Story{ + ID: "nonship-1", Name: "Not Ready", ProjectID: "", + Status: task.StoryInProgress, CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + if err := store.CreateStory(story); err != nil { + t.Fatalf("CreateStory: %v", err) + } + + req := httptest.NewRequest("POST", "/api/stories/nonship-1/ship", nil) + w := httptest.NewRecorder() + srv.Handler().ServeHTTP(w, req) + + if w.Code != http.StatusConflict { + t.Errorf("expected 409, got %d", w.Code) + } +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +cd /workspace/claudomator && go test ./internal/api/... -run "TestShipStory" -v +``` + +Expected: FAIL — `404 page not found` (route doesn't exist yet). + +- [ ] **Step 3: Remove auto-deploy from `checkStoryCompletion` and add status guard** + +In `executor.go`, replace `checkStoryCompletion`: + +```go +func (p *Pool) checkStoryCompletion(ctx context.Context, storyID string) { + story, err := p.store.GetStory(storyID) + if err != nil { + p.logger.Error("checkStoryCompletion: failed to get story", "storyID", storyID, "error", err) + return + } + if story.Status != task.StoryInProgress { + return // already SHIPPABLE or beyond — nothing to do + } + tasks, err := p.store.ListTasksByStory(storyID) + if err != nil { + p.logger.Error("checkStoryCompletion: failed to list tasks", "storyID", storyID, "error", err) + return + } + if len(tasks) == 0 { + return + } + topLevelCount := 0 + for _, t := range tasks { + if t.ParentTaskID != "" { + continue // subtasks are covered by their parent + } + topLevelCount++ + if t.State != task.StateCompleted && t.State != task.StateReady { + return // not all top-level tasks done + } + } + if topLevelCount == 0 { + return + } + if err := p.store.UpdateStoryStatus(storyID, task.StoryShippable); err != nil { + p.logger.Error("checkStoryCompletion: failed to update story status", "storyID", storyID, "error", err) + return + } + p.logger.Info("story transitioned to SHIPPABLE", "storyID", storyID) + // Deploy is now triggered explicitly by the human via POST /api/stories/{id}/ship. +} +``` + +- [ ] **Step 4: Add `ShipStory` to Pool** + +Add after `checkStoryCompletion`: + +```go +// ShipStory merges the story branch and runs the deploy script. +// Returns an error if the story is not in SHIPPABLE state. +func (p *Pool) ShipStory(ctx context.Context, storyID string) error { + story, err := p.store.GetStory(storyID) + if err != nil { + return fmt.Errorf("story not found: %w", err) + } + if story.Status != task.StoryShippable { + return fmt.Errorf("story is not SHIPPABLE (current status: %s)", story.Status) + } + go p.triggerStoryDeploy(ctx, storyID) + return nil +} +``` + +- [ ] **Step 5: Register the route in `server.go`** + +In the `routes()` method, after the existing story routes, add: + +```go +s.mux.HandleFunc("POST /api/stories/{id}/ship", s.handleShipStory) +``` + +- [ ] **Step 6: Add `handleShipStory` to `stories.go`** + +Add at the end of `stories.go`: + +```go +// handleShipStory triggers the merge + deploy for a SHIPPABLE story. +// POST /api/stories/{id}/ship +func (s *Server) handleShipStory(w http.ResponseWriter, r *http.Request) { + id := r.PathValue("id") + if err := s.pool.ShipStory(r.Context(), id); err != nil { + writeJSON(w, http.StatusConflict, map[string]string{"error": err.Error()}) + return + } + writeJSON(w, http.StatusAccepted, map[string]string{"message": "story shipping initiated", "story_id": id}) +} +``` + +- [ ] **Step 7: Run the ship tests** + +```bash +cd /workspace/claudomator && go test ./internal/api/... -run "TestShipStory" -v +``` + +Expected: both PASS. + +- [ ] **Step 8: Run full test suite** + +```bash +cd /workspace/claudomator && go test ./... -race -timeout 120s +``` + +Expected: all PASS. + +- [ ] **Step 9: Commit** + +```bash +git add internal/executor/executor.go internal/api/server.go internal/api/stories.go internal/api/server_test.go +git commit -m "feat: story ship gate — explicit POST /api/stories/{id}/ship; remove auto-deploy" +``` + +--- + +## Task 5: Elaborator — acceptance criteria per story task + +**Files:** +- Modify: `internal/api/elaborate.go` +- Modify: `internal/api/stories.go` +- Test: `internal/api/stories_test.go` + +- [ ] **Step 1: Write failing test** + +In `internal/api/stories_test.go`, find (or add) a test for story approval and verify acceptance criteria flows through: + +```go +func TestApproveStory_AcceptanceCriteriaStored(t *testing.T) { + srv, store := testServer(t) + + proj := &task.Project{ + ID: "ac-proj", Name: "test", RemoteURL: "https://github.com/x/y", + Type: "web", DeployScript: "", + CreatedAt: time.Now().UTC(), UpdatedAt: time.Now().UTC(), + } + store.CreateProject(proj) + + body := `{ + "name": "AC Story", + "branch_name": "story/ac-test", + "project_id": "ac-proj", + "tasks": [ + { + "name": "Add feature", + "instructions": "implement the thing", + "acceptance_criteria": "run go test ./... and verify all pass", + "subtasks": [] + } + ], + "validation": {"type": "test", "steps": [], "success_criteria": "tests pass"} + }` + req := httptest.NewRequest("POST", "/api/stories/approve", strings.NewReader(body)) + req.Header.Set("Content-Type", "application/json") + w := httptest.NewRecorder() + srv.Handler().ServeHTTP(w, req) + + if w.Code != http.StatusCreated { + t.Fatalf("expected 201, got %d: %s", w.Code, w.Body.String()) + } + + var resp struct { + TaskIDs []string `json:"task_ids"` + } + json.NewDecoder(w.Body).Decode(&resp) + if len(resp.TaskIDs) == 0 { + t.Fatal("expected task_ids in response") + } + + tk, err := store.GetTask(resp.TaskIDs[0]) + if err != nil { + t.Fatalf("GetTask: %v", err) + } + if tk.AcceptanceCriteria != "run go test ./... and verify all pass" { + t.Errorf("expected acceptance criteria stored on task, got %q", tk.AcceptanceCriteria) + } +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +cd /workspace/claudomator && go test ./internal/api/... -run "TestApproveStory_AcceptanceCriteriaStored" -v +``` + +Expected: FAIL — acceptance criteria is empty on the created task. + +- [ ] **Step 3: Add `AcceptanceCriteria` to `elaboratedStoryTask` in `elaborate.go`** + +```go +type elaboratedStoryTask struct { + Name string `json:"name"` + Instructions string `json:"instructions"` + AcceptanceCriteria string `json:"acceptance_criteria"` + Subtasks []elaboratedStorySubtask `json:"subtasks"` +} +``` + +- [ ] **Step 4: Update `buildStoryElaboratePrompt` to request acceptance criteria** + +In `buildStoryElaboratePrompt()`, update the JSON schema in the returned string. Replace the tasks section: + +```go +func buildStoryElaboratePrompt() string { + return `You are a software architect. Given a goal, analyze the codebase at /workspace and produce a structured implementation plan as JSON. + +Output ONLY valid JSON matching this schema: +{ + "name": "story name", + "branch_name": "story/kebab-case-name", + "tasks": [ + { + "name": "task name", + "instructions": "detailed instructions including file paths and what to change", + "acceptance_criteria": "specific, verifiable conditions a separate reviewer can check — e.g. 'run go test ./... and verify all pass; confirm GET /api/foo returns 200 with expected JSON shape'", + "subtasks": [ + { "name": "subtask name", "instructions": "..." } + ] + } + ], + "validation": { + "type": "build|test|smoke", + "steps": ["step1", "step2"], + "success_criteria": "what success looks like" + } +} + +Rules: +- Tasks must be independently buildable (each can be deployed alone) +- Subtasks within a task are order-dependent and run sequentially +- Instructions must include specific file paths, function names, and exact changes +- Instructions must end with: git add -A && git commit -m "..." && git push origin <branch> +- acceptance_criteria must be concrete and verifiable by a separate agent — no vague assertions like "code looks good" +- Validation should match the scope: small change = build check; new feature = smoke test` +} +``` + +- [ ] **Step 5: Pass `AcceptanceCriteria` through in `handleApproveStory`** + +In `stories.go`, inside `handleApproveStory`, find the task creation block (the `for _, tp := range input.Tasks` loop). Add `AcceptanceCriteria` to the `task.Task` literal: + +```go +t := &task.Task{ + ID: uuid.New().String(), + Name: tp.Name, + Project: input.ProjectID, + RepositoryURL: repoURL, + StoryID: story.ID, + AcceptanceCriteria: tp.AcceptanceCriteria, + Agent: task.AgentConfig{Type: "claude", Instructions: tp.Instructions}, + Priority: task.PriorityNormal, + Tags: []string{}, + DependsOn: []string{}, + Retry: task.RetryConfig{MaxAttempts: 1, Backoff: "exponential"}, + State: task.StatePending, + CreatedAt: time.Now().UTC(), + UpdatedAt: time.Now().UTC(), +} +``` + +- [ ] **Step 6: Run the test** + +```bash +cd /workspace/claudomator && go test ./internal/api/... -run "TestApproveStory_AcceptanceCriteriaStored" -v +``` + +Expected: PASS. + +- [ ] **Step 7: Run full API tests** + +```bash +cd /workspace/claudomator && go test ./internal/api/... -race -timeout 120s +``` + +Expected: all PASS. + +- [ ] **Step 8: Commit** + +```bash +git add internal/api/elaborate.go internal/api/stories.go internal/api/stories_test.go +git commit -m "feat: acceptance_criteria per story task in elaboration and approval" +``` + +--- + +## Task 6: UI — Ship button and checker report + +**Files:** +- Modify: `web/app.js` + +- [ ] **Step 1: Add "Ship" button to SHIPPABLE story cards** + +In `renderStoryCard`, after the `meta` element is appended to `card`, add: + +```js +export function renderStoryCard(story, doc = document) { + // ... existing code building header, badge, meta ... + + card.appendChild(header); + if (meta.children.length) card.appendChild(meta); + + // Ship button for SHIPPABLE stories. + if (story.status === 'SHIPPABLE') { + const shipBtn = doc.createElement('button'); + shipBtn.className = 'btn-primary story-ship-btn'; + shipBtn.textContent = 'Ship'; + shipBtn.addEventListener('click', async (e) => { + e.stopPropagation(); + shipBtn.disabled = true; + shipBtn.textContent = 'Shipping…'; + try { + const res = await fetch(`${API_BASE}/api/stories/${story.id}/ship`, { method: 'POST' }); + if (!res.ok) { + const body = await res.json().catch(() => ({})); + alert(body.error || `Ship failed (${res.status})`); + shipBtn.disabled = false; + shipBtn.textContent = 'Ship'; + } else { + renderStoriesPanel(); + } + } catch { + shipBtn.disabled = false; + shipBtn.textContent = 'Ship'; + } + }); + card.appendChild(shipBtn); + } + + return card; +} +``` + +> Note: `API_BASE` is a module-level constant already defined in `app.js`. Verify it's accessible in this scope; if not, use `BASE_PATH` (also defined at module level) instead. + +- [ ] **Step 2: Add checker report to READY task cards** + +In `createTaskCard`, after the `// Error message for failed tasks` block, add: + +```js + // Checker report for READY tasks where the checker flagged a problem. + if (task.state === 'READY' && task.checker_report) { + const reportEl = document.createElement('div'); + reportEl.className = 'task-checker-report'; + const label = document.createElement('span'); + label.className = 'task-checker-report-label'; + label.textContent = '⚠ Checker flagged:'; + const text = document.createElement('span'); + text.textContent = task.checker_report; + reportEl.appendChild(label); + reportEl.appendChild(text); + card.appendChild(reportEl); + } +``` + +- [ ] **Step 3: Add CSS for checker report** + +In `web/style.css`, add after the `.ready-completed-label` block: + +```css +.task-checker-report { + margin: 0.5rem 0; + padding: 0.5rem 0.75rem; + background: var(--warning-bg, rgba(255, 180, 0, 0.12)); + border-left: 3px solid var(--warning, #f0a500); + border-radius: 4px; + font-size: 0.8rem; + color: var(--text); +} + +.task-checker-report-label { + font-weight: 600; + margin-right: 0.4rem; +} +``` + +- [ ] **Step 4: Build and verify** + +```bash +cd /workspace/claudomator && go build ./... +``` + +Expected: no errors. + +- [ ] **Step 5: Commit** + +```bash +git add web/app.js web/style.css +git commit -m "feat: Ship button on SHIPPABLE stories; checker report on READY task cards" +``` + +--- + +## Task 7: Full test run and deploy + +**Files:** none + +- [ ] **Step 1: Run full test suite with race detector** + +```bash +cd /workspace/claudomator && go test ./... -race -timeout 120s +``` + +Expected: all PASS. + +- [ ] **Step 2: Push and deploy** + +```bash +git push && sudo scripts/deploy +``` + +Expected: build passes, tests pass, binary installs, service restarts. + +--- + +## Self-Review + +**Spec coverage:** +- ✅ Checker spawned after task → READY (Task 3) +- ✅ Checker uses acceptance_criteria or falls back to task instructions (Task 3) +- ✅ Pass → auto-accept (READY → COMPLETED) (Task 3) +- ✅ Fail → task stays READY + checker_report attached (Task 3) +- ✅ No checker for subtasks or checker tasks (Task 3, guards in spawnCheckerTask) +- ✅ Story elaborator generates acceptance_criteria per task (Task 5) +- ✅ `checkStoryCompletion` no longer auto-deploys (Task 4) +- ✅ `POST /api/stories/{id}/ship` endpoint (Task 4) +- ✅ Ship button in UI (Task 6) +- ✅ Checker report shown on READY task cards (Task 6) +- ✅ New DB columns + migrations (Task 2) + +**Placeholder scan:** none found. + +**Type consistency:** `UpdateTaskCheckerReport(id, report string)`, `GetCheckerTask(checkedTaskID string) (*task.Task, error)`, `ShipStory(ctx context.Context, storyID string) error` — all consistent across Tasks 2, 3, 4. diff --git a/docs/superpowers/specs/2026-04-04-task-checker-story-ship.md b/docs/superpowers/specs/2026-04-04-task-checker-story-ship.md new file mode 100644 index 0000000..1be2f3c --- /dev/null +++ b/docs/superpowers/specs/2026-04-04-task-checker-story-ship.md @@ -0,0 +1,222 @@ +# Task Checker Agent and Story Ship Gate — Design Spec + +**Date:** 2026-04-04 +**Status:** Approved + +--- + +## Goal + +Reduce per-task human review burden by running an independent checker agent after every top-level task completes. If the checker passes, the task auto-accepts. Human attention is required only when the checker flags a problem. Stories accumulate completed tasks until all are done, then surface a single "Ship" action for human approval before deploy. + +--- + +## Context + +The current flow requires a human to manually accept every READY task. For stories with many tasks this is friction — the human has to review each one before `checkStoryCompletion` can fire and ship the story. Additionally, `checkStoryCompletion` currently auto-triggers deploy with no human gate at the story level. + +ADR-007 describes a post-deploy validation agent (runs after merge, verifies the live deployment). This spec adds a pre-ship, per-task checker that is independent of the implementor and runs asynchronously. + +--- + +## Design + +### Two-tier validation + +| Tier | When | Author | Gate | +|---|---|---|---| +| **Checker** (this spec) | After task → READY | Different agent from implementor | Auto-accepts on pass; leaves READY + attaches report on fail | +| **Post-deploy validation** (existing) | After story → DEPLOYED | Separate validation task | Story → REVIEW_READY or NEEDS_FIX | + +### Full story flow after this change + +``` +Task runs → READY + → checker task spawned (async, independent) + → checker passes: task → COMPLETED (silent) + → checker fails: task stays READY, checker_report attached + +All story top-level tasks COMPLETED + → story → SHIPPABLE (human gate — no auto-deploy) + +Human clicks "Ship" on SHIPPABLE story + → merge branch to main + run deploy script → DEPLOYED + → post-deploy validation task created → VALIDATING + → REVIEW_READY | NEEDS_FIX +``` + +--- + +## Data Model + +### `tasks` table — three new columns + +| Column | Type | Purpose | +|---|---|---| +| `acceptance_criteria` | `TEXT` | Criteria for the checker. Empty = use task instructions as spec. | +| `checker_for_task_id` | `TEXT` | Set on checker tasks. Points to the task being checked. | +| `checker_report` | `TEXT` | Populated when checker fails. Shown in the UI on the READY task. | + +Checker tasks have `checker_for_task_id` set and no `story_id`. They do not appear in story task lists and do not affect `checkStoryCompletion`. + +--- + +## Acceptance Criteria Source + +**Story tasks:** The elaborator generates `acceptance_criteria` alongside each task's instructions. Example: + +``` +Run the full test suite and verify all tests pass. +Confirm the /api/tasks endpoint returns repository_url in the response body. +``` + +**Standalone tasks (no story):** `acceptance_criteria` is empty. The checker uses the task's own `instructions` field as the specification. + +--- + +## Checker Task + +### Creation + +Spawned by the executor pool when a top-level task (no `parent_task_id`, no `checker_for_task_id`) transitions to READY. The pool constructs and submits a checker task immediately: + +``` +Name: "Check: <original task name>" +checker_for_task_id: <original task ID> +story_id: (empty — checker is not a story task) +repository_url: same as the checked task +agent.type: claude +agent.instructions: (see below) +agent.max_budget_usd: 0.50 +timeout: 10m +retry.max_attempts: 1 +``` + +**Do not spawn a checker if:** +- `t.ParentTaskID != ""` (subtasks go directly to COMPLETED, never READY) +- `t.CheckerForTaskID != ""` (never check a checker) +- A checker task already exists for this task (query by `checker_for_task_id` before spawning) + +### Instructions template + +``` +You are validating a completed task. Do not make any changes to the code or repository. + +Task: <name> +Instructions given to the implementor: +<task instructions> + +Acceptance criteria: +<acceptance_criteria, or task instructions if acceptance_criteria is empty> + +Steps: +1. Clone the repository and review the changes made. +2. Verify each acceptance criterion is met. Run tests or make HTTP requests as needed. +3. If all criteria are satisfied, exit normally (success). +4. If any criterion is not met, use the Bash tool to exit with a non-zero code: + bash -c "exit 1" + Before exiting, write a brief summary of what failed. +``` + +### Completion handling + +In `executor.Pool.handleRunResult`, after determining the task outcome, check `t.CheckerForTaskID`: + +- **Checker succeeded (exit 0):** Call `store.UpdateTaskState(t.CheckerForTaskID, task.StateCompleted)`. Then call `checkStoryCompletion` for the checked task's story (if any). +- **Checker failed (exit non-0 or error):** Extract the execution summary and call `store.UpdateTaskCheckerReport(t.CheckerForTaskID, summary)`. The checked task stays READY; the report surfaces in the UI. + +The checker task itself always resolves to COMPLETED or FAILED through the normal state machine — no special states needed. + +--- + +## Story Ship Gate + +### Remove auto-deploy from `checkStoryCompletion` + +Current code at the end of `checkStoryCompletion`: +```go +go p.triggerStoryDeploy(ctx, storyID) // REMOVE THIS +``` + +After this change, `checkStoryCompletion` only transitions the story to SHIPPABLE. Deploy is triggered explicitly by the human. + +### New endpoint + +`POST /api/stories/{id}/ship` + +- Verifies story state is SHIPPABLE. Returns 409 otherwise. +- Calls `p.triggerStoryDeploy(ctx, storyID)` (existing function, no changes needed). +- Returns 202 Accepted immediately; deploy runs async. + +### UI + +The stories panel shows a **"Ship"** button on any story in SHIPPABLE state. No other UI changes required for the story panel. The button calls `POST /api/stories/{id}/ship`. + +--- + +## Task Card UI + +When `checker_report` is non-empty on a READY task: + +- Show a warning badge on the task card (e.g. "⚠ Checker failed") +- Expand the card or side panel to show the report text +- Human can still accept or reject the task manually regardless of checker result + +When a checker is pending/running for a READY task: + +- Show a subtle indicator (e.g. "Checking…") — optional, nice to have + +--- + +## Elaborator Changes + +The story elaboration endpoint currently returns a list of tasks with `name` and `instructions`. Add `acceptance_criteria` to each task in the elaborated output: + +```json +{ + "tasks": [ + { + "name": "Add repository_url to task struct", + "instructions": "...", + "acceptance_criteria": "Run go test ./... and verify all tests pass. Confirm Task struct has RepositoryURL field with correct json tag." + } + ] +} +``` + +The elaborator prompt should instruct the LLM to write acceptance criteria that are concrete and verifiable by a separate agent: specific commands to run, specific API responses to check, specific file contents to verify. Vague criteria like "code looks good" are not acceptable. + +--- + +## Storage + +New methods on `storage.DB`: + +```go +// UpdateTaskCheckerReport sets the checker_report field on a task. +func (s *DB) UpdateTaskCheckerReport(id, report string) error + +// GetCheckerTask returns the checker task for a given task ID, or nil if none exists. +func (s *DB) GetCheckerTask(checkedTaskID string) (*task.Task, error) +``` + +Migration: `ALTER TABLE tasks ADD COLUMN acceptance_criteria TEXT NOT NULL DEFAULT ''`, same for `checker_for_task_id` and `checker_report`. + +--- + +## What This Does Not Change + +- Post-deploy validation flow (DEPLOYED → VALIDATING → REVIEW_READY/NEEDS_FIX) is unchanged. +- Subtask handling is unchanged — subtasks never go READY, so they never get checkers. +- The `handleAcceptTask` endpoint remains — humans can still manually accept READY tasks. +- The `handleRejectTask` endpoint remains — humans can still manually reject. +- Checker tasks are subject to normal rate-limiting, retry, and budget enforcement. + +--- + +## Out of Scope + +- Checker result overriding the human (human can always accept/reject regardless) +- Parallel checker runs (one checker per task, no re-run unless task is re-run) +- Configurable checker agent type per project +- Checker budget/timeout configuration beyond the defaults above |
