2 files changed, 589 insertions, 8 deletions
diff --git a/CLAUDE.md b/CLAUDE.md
index 5b01d7b..c895c3a 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,12 +1,315 @@
-# Claudomator — Agent Instructions
+# CLAUDE.md
 
-This repository uses a centralized agent configuration.
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 
-**Primary Source of Truth:** ".agent/config.md"
+Also check `~/.claude/CLAUDE.md` for user-level development standards (TDD workflow, git practices, session state management, etc.) that apply globally across all projects.
 
-## Quick Reference
-- **Worklog:** ".agent/worklog.md"
-- **Design:** ".agent/design.md"
-- **Coding Standards:** ".agent/coding_standards.md"
+## Canonical Repository
 
-Refer to ".agent/config.md" before performing any tasks.
+**The canonical source of truth is `/workspace/claudomator`.** All development must happen here.
+Do not work in any other directory unless explicitly instructed. Do not explore `/site/doot.terst.org/` for source files.
+
+## Build & Test Commands
+
+```bash
+# Build
+go build ./...
+
+# Run all tests
+go test ./...
+
+# Run a single package's tests
+go test ./internal/executor/...
+
+# Run a single test by name
+go test ./internal/api/ -run TestServer_CreateTask_MissingName
+
+# Run with race detector (important for executor/pool tests)
+go test -race ./...
+
+# Build the binary
+go build -o claudomator ./cmd/claudomator/
+```
+
+> **Note:** `go-sqlite3` uses CGo. A C compiler (`gcc`) must be present for builds and tests.
+
+## Running the Server
+
+```bash
+# Initialize data directory
+./claudomator init
+
+# Start API server (default :8484)
+./claudomator serve
+
+# Run a task file directly (bypasses server)
+./claudomator run ./test/fixtures/tasks/simple-task.yaml
+
+# List tasks via CLI
+./claudomator list
+```
+
+Config defaults to `~/.claudomator/config.toml`. Data is stored in `~/.claudomator/` (SQLite DB + execution logs).
+
+---
+
+## Architecture
+
+**Pipeline:** CLI/API → `executor.Pool` → `executor.ContainerRunner` → `claude -p` subprocess → SQLite + log files
+
+### Package Overview
+
+| Package | Role |
+|---|---|
+| `internal/task` | `Task` struct, YAML/JSON parsing, state machine constants, validation |
+| `internal/executor` | `Pool` (bounded goroutine dispatcher) + `ClaudeRunner` (subprocess + sandbox) + `GeminiRunner` (stub) + `Classifier` + preamble + question/summary helpers |
+| `internal/storage` | SQLite wrapper; additive migrations; tasks + executions tables |
+| `internal/api` | HTTP/WebSocket server — REST endpoints, webhook handler, elaborate/validate, script runner |
+| `internal/notify` | `Notifier` interface; webhook, multi, log implementations |
+| `internal/reporter` | Console/JSON/HTML report generation |
+| `internal/deployment` | Deployment-status checking (polls URL for expected version) |
+| `internal/config` | TOML config loading + data-dir layout helpers |
+| `internal/cli` | Cobra commands: `run`, `serve`, `list`, `status`, `start`, `logs`, `create`, `report`, `init` |
+| `internal/version` | VCS version detection (`debug.ReadBuildInfo`) |
+| `web` | Embedded static UI (`embed.go`) |
+
+### Key Data Flows
+
+**Task execution:**
+1. Task created via `POST /api/tasks` or YAML file (`task.ParseFile`)
+2. `POST /api/tasks/{id}/run` → `executor.Pool.Submit()` → buffered work queue
+3. `dispatch()` goroutine picks from queue, waits for slot, launches `execute()`
+4. `execute()` calls `ContainerRunner.Run()` → `claude -p <instructions> --output-format stream-json`
+5. stdout piped through `parseStream()` to `~/.claudomator/executions/<exec-id>/stdout.log`
+6. Execution result written to SQLite, broadcast via WebSocket to connected clients
+
+**Task state machine** (enforced in `storage.UpdateTaskState` via `task.ValidTransition`):
+
+```
+PENDING ──→ QUEUED ──→ RUNNING ──→ READY ──→ COMPLETED
+               ↑              │     └──→ PENDING (rejected)
+               │              │
+               │              ├──→ BLOCKED ──→ READY (all subtasks done)
+               │              │         └──→ QUEUED (question answered)
+               │              │
+               └──────────────├──→ FAILED
+                              ├──→ TIMED_OUT
+                              ├──→ CANCELLED
+                              └──→ BUDGET_EXCEEDED
+```
+
+- **BLOCKED**: Parent task completed but has subtasks that are not yet COMPLETED, OR agent wrote a question file. Unblocked by `maybeUnblockParent()` or user answer via `/api/tasks/{id}/answer`.
+- **READY**: Execution succeeded; awaits manual accept/reject via `/api/tasks/{id}/accept` or `/api/tasks/{id}/reject`.
+- **COMPLETED**: Terminal — entered only via user accept (top-level) or automatic subtask completion.
+- `FAILED/TIMED_OUT/CANCELLED/BUDGET_EXCEEDED` all re-enter at `QUEUED` for retry/resume.
+
+**WebSocket:** `Hub` fans out task completion events to all connected clients. `Server.StartHub()` must be called before `ListenAndServe`.
+
+### Sandbox Lifecycle (ContainerRunner (Docker-based))
+
+When `agent.project_dir` is set:
+1. `setupSandbox()` clones the project into `/tmp/claudomator-sandbox-*` via the "local" remote (bare repo), then falls back to "origin", then the working copy path.
+2. The claude subprocess runs inside the sandbox.
+3. After successful execution, `teardownSandbox()` auto-commits any uncommitted changes (after running a build if `Makefile`/`go.mod`/`gradlew` is present), then pushes new commits to the bare repo (`origin` from the sandbox's perspective). The sandbox is then removed.
+4. On failure the sandbox is preserved and its path is returned in the error.
+5. On BLOCKED (question written), the sandbox path is stored in `executions.sandbox_dir` so the resume execution can reuse it.
+
+> **Known bug:** Variable shadowing in `claude.go` `Run()` means the outer `sandboxDir` is never assigned (both `setupSandbox` calls use `:=` inside nested blocks). This causes: (a) `teardownSandbox` is never called — work is discarded, sandboxes accumulate in `/tmp`; (b) `BlockedError.SandboxDir` is always `""`, so resume clones a fresh sandbox and loses the agent's partial work. See [Known Bugs](#known-bugs).
+
+> **Known bug:** `teardownSandbox` hardcodes `origin/master` when rebasing on conflict. Repos using `main` will fail on concurrent push. See [Known Bugs](#known-bugs).
+
+### Task YAML Format
+
+```yaml
+name: "My Task"
+description: "Optional longer description"
+agent:
+  type: "claude"              # "claude" (default) or "gemini" (stub, not production-ready)
+  model: "sonnet"             # optional; auto-classified by Classifier if omitted
+  instructions: |
+    Do something useful.
+  project_dir: "/path/to/project"  # optional; triggers sandbox isolation
+  max_budget_usd: 1.00
+  permission_mode: "bypassPermissions"  # default; or "default", "acceptEdits"
+  allowed_tools: ["Bash", "Read", "Edit"]
+  disallowed_tools: []
+  context_files: ["/extra/context/path"]
+  system_prompt_append: "Extra instructions appended to system prompt."
+  skip_planning: false        # if false, prepends planning/orchestration preamble
+  additional_args: []         # extra flags forwarded verbatim to claude CLI
+timeout: "15m"
+priority: "normal"            # "high" | "normal" | "low" (stored but not yet used for scheduling)
+tags: ["ci"]
+depends_on: ["other-task-id"]
+retry:
+  max_attempts: 1             # stored but retry is currently manual via /resume
+  backoff: "exponential"
+```
+
+> **Note:** The YAML key is `agent:`, not `claude:`. Earlier docs showed `claude:` which was wrong.
+
+Batch files wrap multiple tasks under a `tasks:` key.
+
+### Storage Schema
+
+Two tables. Schema is auto-migrated additively on `storage.Open()` — new columns are `ALTER TABLE ... ADD COLUMN` statements that silently succeed if the column already exists.
+
+```
+tasks:      id, name, description, config_json, priority, timeout_ns, retry_json,
+            tags_json, depends_on_json, parent_task_id, state, rejection_comment,
+            question_json, summary, elaboration_input, interactions_json,
+            created_at, updated_at
+
+executions: id, task_id, start_time, end_time, exit_code, status, stdout_path,
+            stderr_path, artifact_dir, cost_usd, error_msg, session_id,
+            sandbox_dir, changestats_json, commits_json
+```
+
+JSON blobs: `config_json` (AgentConfig), `retry_json`, `tags_json`, `depends_on_json`, `interactions_json`, `changestats_json`, `commits_json`.
+
+---
+
+## Features
+
+### Planning Preamble & Orchestration
+
+When `agent.skip_planning` is false (the default), `withPlanningPreamble()` prepends a system-level prompt to the agent's instructions that:
+- Instructs the agent to POST subtasks to `$CLAUDOMATOR_API_URL/api/tasks` and stop if the task will take more than ~3 minutes
+- Instructs the agent to write a JSON question to `$CLAUDOMATOR_QUESTION_FILE` and exit if it needs user input
+- Requires all changes to be committed before exit
+- Requires a summary written to `$CLAUDOMATOR_SUMMARY_FILE`
+
+Env vars injected into every execution: `CLAUDOMATOR_API_URL`, `CLAUDOMATOR_TASK_ID`, `CLAUDOMATOR_PROJECT_DIR`, `CLAUDOMATOR_QUESTION_FILE`, `CLAUDOMATOR_SUMMARY_FILE`.
+
+### Changestats
+
+After each execution, changestats (files changed, lines added/removed) are parsed from git `diff --stat` output in `stdout.log` and stored in `executions.changestats_json`.
+
+> **Duplication debt:** Changestats are extracted in two places: `executor.Pool.handleRunResult()` and `api.Server.processResult()`. Both write the same value to the same row (idempotent), but the double-extraction is confusing and should be consolidated. See [Design Debt](#design-debt).
+
+**Parser:** `internal/task/changestats.go` — `ParseChangestatFromOutput`, `ParseChangestatFromFile`.
+
+**Frontend:** `web/app.js` renders a `.changestats-badge` on COMPLETED/READY task cards.
+
+### GitHub Webhook Integration
+
+`POST /api/webhooks/github` accepts `check_run` and `workflow_run` events. Returns `{"task_id": "..."}` (200) on task creation or 204 if ignored.
+
+#### Config (`~/.claudomator/config.toml`)
+
+```toml
+webhook_secret = "your-github-webhook-secret"   # HMAC-SHA256; skip validation if omitted
+
+[[projects]]
+name = "myrepo"
+dir  = "/workspace/myrepo"
+```
+
+#### Matching logic
+
+Repository name matched case-insensitively against each project's `name` and the basename of its `dir`. Falls back to the only configured project if no match found.
+
+#### Task creation
+
+Tasks created for:
+- `check_run` with `action: completed` and `conclusion: failure`
+- `workflow_run` with `action: completed` and `conclusion: failure` or `timed_out`
+
+Tagged `["ci", "auto"]`, capped at $3 USD, allowed tools: Read, Edit, Bash, Glob, Grep.
+
+### Elaborate Endpoint
+
+`POST /api/tasks/elaborate` converts natural language → task JSON via a `claude --prompt` invocation. Optionally reads `CLAUDE.md` / `SESSION_STATE.md` from a configured working directory for context. Per-IP rate-limited.
+
+> **Implementation gap:** The elaborate endpoint is not tested against real Claude invocations. `sanitizeElaboratedTask()` uses keyword heuristics to infer missing tools (fragile). No caching.
+
+### Model Classifier
+
+`executor.Classifier` calls the Gemini CLI (`gemini-2.5-flash-lite`) to pick the best Claude or Gemini model for a task. Falls back to the default model (`sonnet`) if Gemini fails. Agent type is selected first by load balancer; classifier only picks the model within that agent.
+
+> **Implementation gap:** Output parsing is brittle — strips `"Loaded cached credentials."` lines and markdown fences by string matching. No fallback if Gemini CLI isn't installed. Classification results are not cached or logged for learning.
+
+---
+
+
+---
+
+## Design Debt
+
+### GeminiRunner is a non-functional stub
+
+`internal/executor/gemini.go` `execOnce()` does not run the `gemini` binary. It starts a goroutine that writes hardcoded fake JSON to a pipe. `parseGeminiStream()` strips markdown fences but does no semantic parsing. There is no session/resume support.
+
+Any task with `agent.type: "gemini"` will silently return canned output. This is dangerous in production.
+
+**Decision needed:** Either implement GeminiRunner properly (subprocess + stream parsing + sandbox integration mirroring ClaudeRunner) or remove it and the `Classifier` from the codebase until it's ready.
+
+### Priority field is stored but never used
+
+`task.Priority` (`high`, `normal`, `low`) is persisted in SQLite and surfaced in the API. The executor `dispatch()` goroutine uses a simple FIFO channel (`workCh`) with no priority ordering.
+
+### RetryConfig is stored but retry is manual
+
+`task.RetryConfig.MaxAttempts` and `Backoff` are parsed and stored. No code reads them during execution. Retries must be triggered manually via `POST /api/tasks/{id}/resume`.
+
+### Changestats extracted in two places
+
+`executor.Pool.handleRunResult()` and `api.Server.processResult()` both call `task.ParseChangestatFromFile()` and write to `executions.changestats_json`. The second write is idempotent but wasteful and confusing. One of the two should be removed.
+
+### context.Background() in resume path
+
+`api.Server.handleAnswerQuestion()` calls `p.SubmitResume(context.Background(), ...)`. If the HTTP request context is cancelled, the resume still runs. Inversely, if the server shuts down, in-flight resumes using the server's root context would be cancelled while this one would not. Should use a long-lived server-level context, not `Background()`.
+
+### Non-transactional execution creation
+
+`pool.execute()` calls `store.CreateExecution(exec)` followed by `store.UpdateTaskState(t.ID, task.StateRunning)` as separate statements. If the server crashes between them, the task stays PENDING while an execution record exists with status RUNNING. Recovery (`RecoverStaleRunning`) partially handles this but the root cause is the missing transaction.
+
+### Elaborate/validate cmd path indirection
+
+`Server` has two separate fields `elaborateCmdPath` and `validateCmdPath` that override `claudeBinPath` only for tests. This is a testing-time seam that leaks into the production struct. A cleaner approach would be to inject an `Elaborator` interface.
+
+### `withFailureHistory` mutates a shallow copy
+
+In `executor.go`, `withFailureHistory` creates a copy of the task struct (`copy := *t`) but `copy.Agent = t.Agent` copies the struct value — slices inside AgentConfig (`AllowedTools`, `DisallowedTools`, etc.) share the backing array. Appending to `SystemPromptAppend` is safe but any mutation of slices would affect the original.
+
+### Additive migration strategy is fragile
+
+`storage.migrate()` lists every `ALTER TABLE ADD COLUMN` statement in code order. The only idempotency guard is catching "column already exists" errors. There is no migration version tracking. Columns dropped in `CREATE TABLE IF NOT EXISTS` and added back via ALTER are indistinguishable from new columns. Concurrent server instances running migrations simultaneously have no protection.
+
+---
+
+## REST API Reference
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/tasks` | List tasks; `?state=RUNNING&since=<RFC3339>&limit=50` |
+| POST | `/api/tasks` | Create task (JSON body) |
+| GET | `/api/tasks/{id}` | Get task |
+| DELETE | `/api/tasks/{id}` | Delete task + subtasks + executions |
+| POST | `/api/tasks/{id}/run` | Submit PENDING task to executor |
+| POST | `/api/tasks/{id}/cancel` | Cancel RUNNING/QUEUED task |
+| POST | `/api/tasks/{id}/accept` | Accept READY task → COMPLETED |
+| POST | `/api/tasks/{id}/reject` | Reject READY task → PENDING |
+| POST | `/api/tasks/{id}/answer` | Answer BLOCKED task question → QUEUED |
+| POST | `/api/tasks/{id}/resume` | Resume FAILED/TIMED_OUT/CANCELLED task |
+| GET | `/api/tasks/{id}/subtasks` | List subtasks |
+| GET | `/api/tasks/{id}/executions` | List execution history |
+| GET | `/api/executions/{id}` | Get execution |
+| GET | `/api/executions/{id}/log` | Get execution log (`?tail=100`) |
+| GET | `/api/executions/{id}/logs/stream` | Stream logs as SSE |
+| GET | `/api/tasks/{id}/logs/stream` | Stream latest execution logs |
+| GET | `/api/executions` | List recent executions across all tasks |
+| GET | `/api/tasks/{id}/deployment-status` | Poll deployment readiness |
+| POST | `/api/tasks/elaborate` | Convert natural language → task JSON |
+| POST | `/api/tasks/validate` | Validate task JSON |
+| POST | `/api/scripts/{name}` | Run named script with task context |
+| GET | `/api/ws` | WebSocket upgrade (live task updates) |
+| GET | `/api/workspaces` | List directories under `workspace_root` |
+| GET | `/api/health` | Server health |
+| POST | `/api/webhooks/github` | GitHub CI webhook |
+
+---
+
+## ADRs
+
+See `docs/adr/001-language-and-architecture.md` for the Go + SQLite + WebSocket rationale.
diff --git a/docs/IMPROVEMENT_PLAN.md b/docs/IMPROVEMENT_PLAN.md
new file mode 100644
index 0000000..db5d6a9
--- /dev/null
+++ b/docs/IMPROVEMENT_PLAN.md
@@ -0,0 +1,278 @@
+# Improvement Plan
+
+Comprehensive assessment of stability, architectural coherence, simplicity and cleanliness issues, with prioritized fixes.
+
+---
+
+## Critical Bugs (fix immediately)
+
+### 1. `sandboxDir` variable shadowing — git clone failures
+
+**File:** `internal/executor/claude.go`, lines ~119–137
+
+Both `setupSandbox` calls inside `Run()` use short variable declarations (`:=`) inside nested blocks, creating inner variables that shadow the outer `var sandboxDir string`. The outer variable is always `""`.
+
+**Effects:**
+- `teardownSandbox` is never called → agent work discarded, changes never pushed to origin
+- Sandboxes accumulate in `/tmp/claudomator-sandbox-*` until disk is full
+- BLOCKED task resume always clones fresh → loses agent's partial work from first run
+- Error messages omit sandbox path, making debugging harder
+
+**Fix:**
+
+```go
+// In the resume path (inside `if projectDir != ""`):
+var err error
+sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger)  // = not :=
+
+// In the non-resume path (inside `else if projectDir != ""`):
+var err error
+sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger)  // = not :=
+```
+
+Both sites already declare `var err error` on the line above. Simply drop the `:` from `:=`.
+
+### 2. Hardcoded `master` branch in sandbox teardown
+
+**File:** `internal/executor/claude.go`, `teardownSandbox()`, line ~377
+
+```go
+git pull --rebase origin master   // fails on repos using "main"
+```
+
+**Fix:** Detect the remote default branch before pushing:
+
+```go
+branch, _ := exec.Command("git", "-C", sandboxDir, "rev-parse", "--abbrev-ref", "HEAD").Output()
+defaultBranch := strings.TrimSpace(string(branch))
+if defaultBranch == "" || defaultBranch == "HEAD" {
+    defaultBranch = "main"
+}
+// then use defaultBranch instead of "master"
+```
+
+---
+
+## High Priority — Stability & Correctness
+
+### 3. GeminiRunner is a non-functional stub in production code
+
+**File:** `internal/executor/gemini.go`
+
+`execOnce()` spawns a goroutine writing hardcoded JSON to a pipe instead of running the `gemini` binary. Any task with `agent.type: "gemini"` silently returns fake output.
+
+**Options (choose one):**
+- **Implement properly:** Mirror ClaudeRunner structure — subprocess + process group + stream parsing + sandbox support. GeminiRunner must also implement `LogPather`.
+- **Remove until ready:** Delete `gemini.go`, remove `GeminiRunner` from `NewPool` in `cli/serve.go`, and gate the `Classifier` on Claude-only operation. Add `// TODO: GeminiRunner not implemented` clearly.
+
+The stub in production is a correctness hazard. Removing it is safer than leaving it in place.
+
+### 4. Atomic execution creation + state update
+
+**Files:** `internal/executor/executor.go` `execute()`, `internal/storage/db.go`
+
+`CreateExecution` and `UpdateTaskState(RUNNING)` are two separate DB writes with no transaction. A crash between them leaves the task PENDING with an orphaned RUNNING execution record. `RecoverStaleRunning` handles the RUNNING-task case but not the PENDING-task-with-running-execution case.
+
+**Fix:** Add `storage.DB.CreateExecutionAndSetRunning(exec, taskID)` that wraps both writes in a single transaction.
+
+### 5. `context.Background()` in resume submission
+
+**File:** `internal/api/server.go`, `handleAnswerQuestion()`
+
+```go
+p.SubmitResume(context.Background(), task, exec)
+```
+
+This detaches the resume from any server lifecycle context. If the server has a root context with a shutdown deadline, resumes submitted this way will not be cancelled.
+
+**Fix:** Pass the server's root context (or a derived context) instead of `Background()`.
+
+### 6. Pool `dispatch()` goroutine leaks on server shutdown
+
+**File:** `internal/executor/executor.go`
+
+`NewPool` starts `go p.dispatch()` but `workCh` is never closed. On server shutdown, `dispatch()` blocks forever on `<-p.workCh`. The pool has no `Shutdown()` method.
+
+**Fix:** Add `Pool.Shutdown(ctx context.Context)` that closes `workCh` and waits for active workers to drain.
+
+---
+
+## Medium Priority — Architectural Coherence
+
+### 7. Priority field is stored but never dispatched
+
+**Files:** `internal/task/task.go`, `internal/executor/executor.go`
+
+Priority is parsed, stored in SQLite, returned via API — but `dispatch()` uses a single FIFO channel, ignoring priority entirely.
+
+**Options:**
+- Use a priority queue (e.g., `container/heap`) keyed by `Priority` for `workCh`
+- Or remove `Priority` from `Task` and documentation until implemented — don't ship half-features
+
+### 8. RetryConfig is stored but retry is manual
+
+**Files:** `internal/task/task.go`, `internal/executor/executor.go`
+
+`RetryConfig.MaxAttempts` and `Backoff` are stored but never read. Every "retry" is a manual user action via `/resume`.
+
+**Options:**
+- Implement automatic retry in `handleRunResult` when `MaxAttempts > 1` and `currentAttempt < MaxAttempts`
+- Or remove `RetryConfig` from the schema/API until implemented
+
+### 9. Changestats extracted in two places
+
+**Files:** `internal/executor/executor.go` `handleRunResult()`, `internal/api/server.go` `processResult()`
+
+Both call `task.ParseChangestatFromFile()` on the same file and write to the same row. The second write is idempotent but the duplication is confusing.
+
+**Fix:** Remove the extraction from `api.Server.processResult()`. The pool already handles it reliably before broadcasting the result.
+
+### 10. `withFailureHistory` shallow copy is unsafe
+
+**File:** `internal/executor/executor.go`
+
+```go
+copy := *t      // copies struct
+copy.Agent = t.Agent  // copies AgentConfig struct, slices share backing array
+```
+
+If any downstream code appends to `copy.Agent.AllowedTools`, it will mutate the original. The current code only writes to `SystemPromptAppend` (a string), so this is safe today but fragile.
+
+**Fix:** Deep-copy slices in AgentConfig:
+```go
+agent := t.Agent
+agent.AllowedTools = append([]string(nil), t.Agent.AllowedTools...)
+agent.DisallowedTools = append([]string(nil), t.Agent.DisallowedTools...)
+agent.ContextFiles = append([]string(nil), t.Agent.ContextFiles...)
+agent.AdditionalArgs = append([]string(nil), t.Agent.AdditionalArgs...)
+copy.Agent = agent
+```
+
+### 11. Additive migration strategy has no version tracking
+
+**File:** `internal/storage/db.go` `migrate()`
+
+Migrations are applied in code order; the only idempotency guard is ignoring "column already exists" errors. There is no way to detect if a migration was partially applied, rolled back incorrectly, or was applied out of order by a different code version.
+
+**Fix (minimal):** Use a `schema_migrations` table with a version counter. Apply only migrations with version > current max. This prevents re-running migrations and makes the migration history explicit.
+
+### 12. Server test-seam fields in production struct
+
+**File:** `internal/api/server.go`
+
+```go
+elaborateCmdPath string   // overrides claudeBinPath in tests
+validateCmdPath  string   // overrides claudeBinPath in tests
+```
+
+These exist solely to inject test paths. They are an internal testing seam leaked into the production struct.
+
+**Fix:** Extract an `Elaborator` interface (e.g., `Elaborate(ctx, request, workDir string) (string, error)`) and inject it. Tests inject a fake; production builds inject the real Claude-backed implementation.
+
+---
+
+## Lower Priority — Simplicity & Cleanliness
+
+### 13. Classifier output parsing is brittle string manipulation
+
+**File:** `internal/executor/classifier.go`
+
+The classifier strips `"Loaded cached credentials."` by line scanning, strips markdown fences by `strings.TrimPrefix/TrimSuffix`, and falls back through multiple parsing paths. This will silently break if Gemini CLI output format changes.
+
+**Fix:** Request `--output-format json` and parse the structured response. Treat any parse failure as a classification error (fall back to default model), and log the raw output for debugging.
+
+### 14. `gitSafe` is inconsistently applied in teardownSandbox
+
+**File:** `internal/executor/claude.go`
+
+`setupSandbox` wraps all git commands with `gitSafe(...)` (adds `-c safe.directory=*`). `teardownSandbox` calls `git` directly without `gitSafe` on most commands. This means teardown can fail with "dubious ownership" errors in mixed-owner environments.
+
+**Fix:** Use `gitSafe` consistently in `teardownSandbox`.
+
+### 15. `tailFile` loads entire file into memory
+
+**File:** `internal/executor/claude.go`
+
+```go
+func tailFile(path string, n int) string {
+    // scans all lines, keeps last n
+}
+```
+
+For large log files this reads everything before discarding early lines. Fine in practice (stderr is typically small) but fragile if anything large is written to stderr.
+
+**Fix:** Use a circular buffer of n lines to avoid O(filesize) memory.
+
+### 16. `execute()` and `executeResume()` duplicate context/cancel/defer setup
+
+**File:** `internal/executor/executor.go`
+
+Both methods contain identical blocks for: context timeout + cancel setup, `activePerAgent` increment/decrement, `doneCh` signal, cancel registration/deregistration. This is ~40 lines duplicated.
+
+**Fix:** Extract `withWorkerSlot(ctx, t, fn func(ctx context.Context) error) error` that handles the common setup and calls `fn`. Both paths become 5–10 lines.
+
+### 17. `QuestionRegistry` and `QuestionHandler` are unused
+
+**Files:** `internal/executor/question.go`, `internal/executor/executor.go`
+
+`QuestionRegistry` is created in `NewPool` and stored on `Pool.Questions` but nothing reads from it. `QuestionHandler` interface exists but is never instantiated or wired to anything. The intended design (pool detects `tool_use` and invokes a handler) is not implemented.
+
+**Fix:** Remove `QuestionRegistry`, `QuestionHandler`, and `Pool.Questions` until the feature is actually needed. The current question flow (agent writes file, pool reads BlockedError) already works without them.
+
+### 18. Script registry endpoint is undocumented and empty
+
+**File:** `internal/api/scripts.go`
+
+The `ScriptRegistry` type and `POST /api/scripts/{name}` endpoint are defined but the registry is never populated from config. There are no example scripts. The endpoint exists in production code but can't be used.
+
+**Fix:** Either populate scripts from config (`[[scripts]]` entries in config.toml) with documentation, or remove the endpoint until it's ready.
+
+### 19. `parseStream` ignores scanner errors
+
+**File:** `internal/executor/claude.go`
+
+`parseStream()` loops over `scanner.Scan()` but never checks `scanner.Err()` after the loop. If the scanner hits a read error mid-stream, it silently stops. The function returns `nil` error, making the task appear to complete successfully when it may have lost output.
+
+**Fix:**
+```go
+if err := scanner.Err(); err != nil {
+    if streamErr == nil {
+        streamErr = fmt.Errorf("reading claude stdout: %w", err)
+    }
+}
+```
+
+### 20. Log files have no size limit or rotation
+
+**Files:** `internal/executor/claude.go`, `internal/storage/db.go`
+
+Execution logs in `~/.claudomator/executions/` have no size limit, rotation, or TTL. Long-running or repeatedly-retried tasks can produce arbitrarily large log files. Old executions are never cleaned up.
+
+**Fix (minimal):** Add a `CleanupOldExecutions(olderThan time.Duration)` method to `storage.DB` and call it on server startup (or on a ticker). Delete execution log directories for executions older than the TTL.
+
+---
+
+## Summary by Effort
+
+| # | Issue | Effort | Impact |
+|---|-------|--------|--------|
+| 1 | `sandboxDir` shadowing | XS (2 chars: `=` not `:=`) | Critical — root cause of git clone failures |
+| 2 | Hardcoded `master` branch | S | High — breaks all `main`-branch repos |
+| 3 | GeminiRunner stub | M–L | High — silent wrong output in production |
+| 4 | Atomic exec creation | S | Medium — crash-recovery edge case |
+| 5 | context.Background() resume | XS | Low — shutdown correctness |
+| 6 | Pool shutdown | S | Medium — goroutine leak on shutdown |
+| 7 | Priority dispatch | M | Medium — misleading API |
+| 8 | RetryConfig unused | S | Low — misleading API |
+| 9 | Changestats duplication | XS | Low — confusing code |
+| 10 | Shallow copy in withFailureHistory | S | Low — latent bug |
+| 11 | Migration versioning | M | Medium — operational risk |
+| 12 | Test seams in Server struct | M | Low — architecture cleanliness |
+| 13 | Brittle classifier parsing | S | Medium — silent classification failures |
+| 14 | gitSafe inconsistency | XS | Medium — owner errors in teardown |
+| 15 | tailFile memory | XS | Low — theoretical |
+| 16 | execute/executeResume duplication | M | Low — maintainability |
+| 17 | QuestionRegistry dead code | XS | Low — confusion |
+| 18 | Script registry empty | S | Low — dead endpoint |
+| 19 | parseStream ignores scanner.Err | XS | Medium — silent data loss |
+| 20 | Log file cleanup | S | Medium — disk usage |