8 files changed, 2010 insertions, 1 deletions
diff --git a/docs/RAW_NARRATIVE.md b/docs/RAW_NARRATIVE.md
new file mode 100644
index 0000000..a9c1492
--- /dev/null
+++ b/docs/RAW_NARRATIVE.md
@@ -0,0 +1,390 @@
+# Claudomator: Development Narrative
+
+This document is a chronological engineering history of the Claudomator project,
+reconstructed from the git log, ADRs, and source code.
+
+---
+
+## 1. Initial commit — core scaffolding (2e2b218)
+
+The project started with a single commit that established the full skeleton:
+task model, executor, API server, CLI, storage layer, and reporter. The Go module
+was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a
+`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model,
+instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was
+chosen as the storage backend (see ADR-001). The executor pool used a bounded
+goroutine model. The API server was plain `net/http` with no external framework.
+The CLI was Cobra.
+
+## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d)
+
+Early housekeeping: added JSON struct tags to all exported types, renamed the Go
+module to its final identifier, and set up the `.gitignore` to exclude the compiled
+binary and local Claude settings.
+
+## 3. Verbose flag, logs CLI command (0377c06, f27d4f7)
+
+Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand
+for tailing execution output.
+
+## 4. Embedded web UI and HTTP wiring (135d8eb)
+
+The first web UI was embedded into the binary using `go:embed`. This made the
+binary fully self-contained: no separate static file server was needed.
+
+## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007)
+
+Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to
+the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the
+`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational
+subtask plumbing.
+
+## 6. Dependency waiting and planning preamble (f527972)
+
+The executor gained dependency waiting: tasks with `depends_on` now block in a
+polling loop until all dependencies reach `COMPLETED`. Any dependency entering a
+terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`)
+immediately fails the waiting task.
+
+The planning preamble was also introduced here — a system prompt prefix injected
+into every task's instructions that explains to the agent how to write question
+files, how to break tasks into subtasks via the `claudomator` CLI, and how to
+commit all changes in git sandboxes.
+
+## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740)
+
+The API gained several new endpoints:
+- `POST /api/elaborate` — calls Claude to expand a brief task description into
+  structured YAML.
+- `GET /api/executions/{id}/stream` — live-streams the execution log.
+- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed).
+- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task.
+
+## 8. Web UI: tabs, new task modal, templates panel (e8d1b80)
+
+The web UI got a tabbed layout (Running / Done / Templates), a modal for creating
+new tasks with AI-drafted instructions, and a templates panel. This was the first
+version of the UI that matched the current design.
+
+## 9. READY state for human-in-the-loop review (6511d6e)
+
+A critical design point: when a top-level task's runner exits successfully, the
+task does not immediately go to `COMPLETED`. Instead it transitions to `READY`,
+meaning it paused for the operator to review the agent's output and explicitly
+accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`.
+`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`.
+
+This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY
+and go directly to `COMPLETED` — only the root task requires human sign-off.
+
+## 10. Fix working_dir failures, hardcoded /root removed (3962597)
+
+Early deployments hardcoded `/root` as the base path for `working_dir`. This was
+removed. `working_dir` is now validated to exist before the subprocess starts.
+
+## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4)
+
+Added the `scripts/` directory with `debug-execution` (inspects a specific
+execution's logs) and `deploy` (builds and deploys the binary to the production
+server). Added a CLI `start` command and the `version` package.
+
+## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444)
+
+A batch of features rescued from a detached-work branch:
+- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json`
+  file before exiting. The pool detects this and transitions the task to `BLOCKED`,
+  storing the question for the user. `POST /api/tasks/{id}/answer` resumes the
+  Claude session with the user's answer injected as the next message.
+- **Rate limiting**: the pool tracks which agents are rate-limited and when.
+  `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from
+  5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline.
+- **Start-next-task script**: a shell script that picks the highest-priority pending
+  task and starts it.
+
+## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3)
+
+The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state
+and a "Start Next" button in the header that triggers the `start-next-task` script.
+
+## 14. Stream-level failure detection when claude exits 0 (4c0ee5c)
+
+Claude can exit 0 even when the task actually failed — for example when the
+permission mode denies a tool_use and Claude exits politely. `parseStream` was
+updated to detect `is_error: true` in the result message and
+`tool_result.is_error: true` with permission-denial text, returning an error in
+both cases so the task goes to `FAILED` rather than silently succeeding.
+
+## 15. Persist log paths at CreateExecution time (f8b5f25)
+
+Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the
+execution record at `UpdateExecution` time (after the subprocess finished). This
+prevented live log tailing. Introduced the `LogPather` interface: runners that
+implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling
+`CreateExecution`, making them available for streaming before the process ends.
+
+## 16. bypassPermissions as executor default (a33211d)
+
+`permission_mode` defaults to `bypassPermissions` when not set in the task YAML.
+This was a deliberate trade-off: unattended automation needs to proceed without
+tool-use confirmation prompts. Operators can override per-task via `permission_mode`.
+
+## 17. Cancel endpoint and pool cancel mechanism (3672981)
+
+`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map
+from taskID to context cancel functions. Cancellation sends a SIGKILL to the
+entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and
+bash children that the claude subprocess spawned.
+
+## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace)
+
+The full BLOCKED cycle was wired end-to-end:
+1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits.
+2. Runner detects the file and returns `*BlockedError`.
+3. Pool transitions task to `BLOCKED` and stores the question JSON.
+4. User answers via `POST /api/tasks/{id}/answer`.
+5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID`
+   and `ResumeAnswer`.
+6. Runner invokes `claude --resume <session-id> -p <answer>`.
+
+A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`,
+causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution`
+returned no session ID.
+
+## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c)
+
+Resume executions now use `context.Background()` instead of inheriting a potentially
+stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled
+tasks can be manually restarted.
+
+## 20. git sandbox execution, project_dir rename (1f36e23)
+
+The `working_dir` field was renamed to `project_dir` across all layers (task YAML,
+storage, API, UI). When `project_dir` is set, the runner no longer executes
+directly in that directory. Instead it:
+
+1. Detects whether `project_dir` is a git repo (initialising one if not).
+2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks`
+   to avoid permission issues with mixed-owner `.git/objects`).
+3. Runs the agent in the sandbox clone.
+4. After the agent exits, verifies no uncommitted changes remain and pushes
+   new commits to the canonical bare repo.
+5. Removes the sandbox.
+
+On BLOCKED, the sandbox is preserved so the agent can resume where it left off
+in the same working tree.
+
+Concurrent push conflicts (two sandboxes pushing at the same time) are handled
+by a fetch-rebase-retry sequence.
+
+## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2)
+
+`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If
+the transition is not allowed by the state machine, the function returns an error
+and no write occurs. This is the enforcement point for the state machine invariants.
+
+## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97)
+
+The previous pool rejected `Submit` when all slots were taken. This was replaced
+with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted
+while the pool is at capacity are buffered in the channel and picked up as soon
+as a slot opens. `Submit` now only returns an error if the channel itself is full
+(which requires an enormous backlog).
+
+## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376)
+
+Several API reliability improvements:
+- WebSocket connections now require an API token (if `SetAPIToken` was called) and
+  are capped at a configurable maximum number of clients. A ping/pong keepalive
+  prevents stale connections from accumulating.
+- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate`
+  endpoints to prevent abuse.
+- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of
+  individual handlers per script, a single handler dispatches to registered scripts
+  by name.
+
+## 24. API: extend executions and log streaming endpoints (7914153)
+
+`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs`
+was added for fetching completed log files. Live streaming via SSE and the log
+tail endpoint were polished.
+
+## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6)
+
+CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared
+HTTP client, a default server URL (`http://localhost:8484`). Added the `report`
+CLI subcommand for fetching execution summaries from the server.
+
+## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822)
+
+This was a major refactor over several commits:
+1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`,
+   `"gemini"`, etc.).
+2. `Pool` was changed from holding a single `ClaudeRunner` to holding a
+   `map[string]Runner` — one runner per agent type.
+3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the
+   `gemini` CLI.
+4. The storage layer, API handlers, elaborate/validate endpoints, and all tests
+   were updated to use `AgentConfig`.
+5. The web UI was updated to expose agent type selection.
+
+## 27. Gemini-based task classification and explicit load balancing (406247b)
+
+`Classifier` and `pickAgent` were introduced to automate agent and model selection:
+
+- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available
+  (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active
+  if all agents are rate-limited.
+- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick
+  the best model for the task. This is intentionally model-picks-model: use a fast,
+  cheap classifier to avoid wasting expensive tokens.
+
+After this commit the flow is: `execute()` → pick agent → call classifier → set
+`t.Agent.Type` and `t.Agent.Model` → dispatch to runner.
+
+## 28. ADR-003: Security Model (93a4c85)
+
+The security model was documented formally: no auth, permissive CORS, `bypassPermissions`
+as default, and the known risk inventory (see `docs/adr/003-security-model.md`).
+
+## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30)
+
+Running tasks became the default view. A "Running view" showing currently running
+tasks alongside the 24h execution history was added. Agent type and model were
+surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h.
+
+## 30. Quota exhaustion detection from stream (076c0fa)
+
+Previously, quota exhaustion (the 5-hour usage limit) was treated identically to
+generic failures. `isQuotaExhausted` was introduced to distinguish it: quota
+exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the
+agent, rather than failing the task with a generic error.
+
+## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac)
+
+The sandbox teardown strategy was revised: instead of pushing directly into the
+working copy (which fails for non-bare repos), the sandbox pushes to a bare repo
+(`remote "local"` or `remote "origin"`) and the working copy is pulled separately
+by the developer. This avoids permission errors from mixed-owner `.git/objects`.
+The `--no-hardlinks` clone flag was added to prevent object sharing.
+
+## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46)
+
+When a top-level task exits the runner successfully but has subtasks, it transitions
+to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new
+`maybeUnblockParent` function is called every time a subtask completes: if all
+siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is
+presented for operator review.
+
+## 33. Stale RUNNING task recovery on server startup (9159572)
+
+`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for
+tasks still in `RUNNING` state (left over from a previous server crash) and marks
+them `FAILED`, closing their open execution records. This prevents stuck tasks
+after server restarts.
+
+## 34. API: configurable mockRunner, async error-path tests (b33566b)
+
+The `api` test suite was hardened with a configurable `mockRunner` that can be
+injected into the test server. Async error paths (runner returns an error, DB
+update fails mid-execution) were now exercised in tests.
+
+## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409)
+
+Several storage correctness fixes:
+- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`,
+  `idx_executions_task_id`, and `idx_executions_start_time` indexes were added.
+- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions;
+  tests were added to catch this.
+- `DeleteTask` was made atomic using a recursive CTE to delete the task and all
+  its subtasks in a single transaction.
+
+## 36. API: validate ?state= param, standardize operation response shapes (933af81)
+
+`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation
+responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised
+to return `{"status": "ok"}` on success.
+
+## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c)
+
+Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through
+classification again so they pick up the latest agent/model selection logic. The
+post-run error classification block was extracted into `handleRunResult` — a shared
+helper called by both `execute` and `executeResume` — eliminating 60+ lines of
+duplication.
+
+## 38. Legacy Claude field removed (b4371d0, a782bbf)
+
+The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir`
+shim were removed. The schema is now fully generic.
+
+## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e)
+
+A documented invariant was added to the `execOnce` goroutine that kills the
+subprocess process group: it cannot block indefinitely. Tests were added to verify
+no goroutine leak occurs when a task is cancelled.
+
+## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b)
+
+The classifier now skips calling itself if the selected agent is rate-limited,
+avoiding a redundant Gemini API call when the rate-limited agent is already known.
+The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`,
+`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`,
+`gemini-2.5-flash`, `gemini-2.5-pro`).
+
+## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b)
+
+Two map leak bugs were fixed in the pool:
+- `activePerAgent[agentType]` was decremented but never deleted when the count hit
+  zero, so inactive agents accumulated as dead entries.
+- Expired `rateLimited[agentType]` entries were not deleted, so the map grew
+  unboundedly over long runs.
+
+## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624)
+
+The sandbox teardown removed the `git pull` into the working copy (which was failing
+due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to
+detect `"fetch first"` and `"non-fast-forward"` as the rejection signals.
+
+## 43. Explicit load balancing separated from classification (e033504)
+
+Previously the `Classifier` both picked the agent and selected the model. This was
+split: `pickAgent` is deterministic code that picks the agent from the registered
+runners using the load-balancing algorithm. The `Classifier` only picks the model
+for the already-selected agent. This makes load balancing reliable and fast even
+when the Gemini classifier is unavailable.
+
+## 44. Session ID fix on second block-and-resume cycle (65c7638)
+
+A bug was found where the second BLOCKED→answer→resume cycle passed the wrong
+`--resume` session ID to Claude. The fix ensures that resume executions propagate
+the original session ID rather than the new execution's UUID.
+
+## 45. validTransitions promoted to package-level var (3226af3)
+
+`validTransitions` was promoted to a package-level variable in `internal/task/task.go`
+for clarity and potential reuse outside the package. ADR-002 was updated to reflect
+the current state machine including the `BLOCKED→READY` transition for parent tasks.
+
+---
+
+## Feature Summary (current state)
+
+| Feature | Status |
+|---|---|
+| Task YAML parsing, batch files | Done |
+| SQLite persistence | Done |
+| REST API (CRUD + lifecycle) | Done |
+| WebSocket real-time events | Done |
+| Claude subprocess execution | Done |
+| Gemini subprocess execution | Done |
+| Explicit load balancing (pickAgent) | Done |
+| Gemini-based model classification | Done |
+| BLOCKED / question-answer / resume | Done |
+| git sandbox isolation | Done |
+| Subtask creation and unblocking | Done |
+| READY state / human accept-reject | Done |
+| Rate-limit and quota tracking | Done |
+| Stale RUNNING recovery on startup | Done |
+| Per-IP rate limiter on elaborate | Done |
+| Web UI (PWA) | Done |
+| Push notifications (PWA) | Planned |
diff --git a/docs/adr/002-task-state-machine.md b/docs/adr/002-task-state-machine.md
index 1d41619..310c337 100644
--- a/docs/adr/002-task-state-machine.md
+++ b/docs/adr/002-task-state-machine.md
@@ -23,7 +23,7 @@ execution (subprocess), user interaction (review, Q&A), retries, and cancellatio
 | `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` (terminal) |
 | `BLOCKED` | Agent paused and wrote a question file; awaiting user answer |
 
-Terminal states with no outgoing transitions: `COMPLETED`, `CANCELLED`, `BUDGET_EXCEEDED`.
+True terminal state (no outgoing transitions): `COMPLETED`. All other non-success states (`CANCELLED`, `FAILED`, `TIMED_OUT`, `BUDGET_EXCEEDED`) may transition back to `QUEUED` to restart or retry.
 
 ## State Transition Diagram
 
@@ -77,6 +77,8 @@ Terminal states with no outgoing transitions: `COMPLETED`, `CANCELLED`, `BUDGET_
 | `READY` | `PENDING` | `POST /api/tasks/{id}/reject` (with optional comment) |
 | `FAILED` | `QUEUED` | Retry (manual re-run via `POST /api/tasks/{id}/run`) |
 | `TIMED_OUT` | `QUEUED` | `POST /api/tasks/{id}/resume` (resumes with session ID) |
+| `CANCELLED` | `QUEUED` | Restart (manual re-run via `POST /api/tasks/{id}/run`) |
+| `BUDGET_EXCEEDED` | `QUEUED` | Retry (manual re-run via `POST /api/tasks/{id}/run`) |
 | `BLOCKED` | `QUEUED` | `POST /api/tasks/{id}/answer` (resumes with user answer) |
 | `BLOCKED` | `READY` | All subtasks reached `COMPLETED` (parent task unblocked by subtask completion watcher) |
 
diff --git a/docs/adr/004-multi-agent-routing-and-classification.md b/docs/adr/004-multi-agent-routing-and-classification.md
new file mode 100644
index 0000000..7afb10d
--- /dev/null
+++ b/docs/adr/004-multi-agent-routing-and-classification.md
@@ -0,0 +1,107 @@
+# ADR-004: Multi-Agent Routing and Gemini-Based Classification
+
+## Status
+Accepted
+
+## Context
+
+Claudomator started as a Claude-only system. As Gemini became a viable coding
+agent, the architecture needed to support multiple agent backends without requiring
+operators to manually select an agent or model for each task.
+
+Two distinct problems needed solving:
+
+1. **Which agent should run this task?** — Claude and Gemini have different API
+   quotas and rate limits. When Claude is rate-limited, tasks should flow to
+   Gemini automatically.
+2. **Which model tier should the agent use?** — Both agents offer a spectrum from
+   fast/cheap to slow/powerful models. Using the wrong tier wastes money or
+   produces inferior results.
+
+## Decision
+
+The two problems are solved independently:
+
+### Agent selection: explicit load balancing in code (`pickAgent`)
+
+`pickAgent(SystemStatus)` selects the agent with the fewest active tasks,
+preferring non-rate-limited agents. The algorithm is:
+
+1. First pass: consider only non-rate-limited agents; pick the one with the
+   fewest active tasks (alphabetical tie-break for determinism).
+2. Fallback: if all agents are rate-limited, pick the least-active regardless
+   of rate-limit status.
+
+This is deterministic code, not an AI call. It runs in-process with no I/O and
+cannot fail in ways that would block task execution.
+
+### Model selection: Gemini-based classifier (`Classifier`)
+
+Once the agent is selected, `Classifier.Classify` invokes the Gemini CLI with
+`gemini-2.5-flash-lite` to select the best model tier for the task. The classifier
+receives the task name, instructions, and the required agent type, and returns
+a `Classification` with `agent_type`, `model`, and `reason`.
+
+The classifier uses a cheap, fast model for classification to minimise the cost
+overhead. The response is parsed from JSON, with fallback handling for markdown
+code blocks and credential noise in the output.
+
+### Separation of concerns
+
+These two decisions were initially merged (the classifier picked both agent and
+model). They were separated in commit `e033504` because:
+
+- Load balancing must be reliable even when the Gemini API is unavailable.
+- Classifier failures are non-fatal: if classification fails, the pool logs the
+  error and proceeds with the agent's default model.
+
+### Re-classification on manual restart
+
+When an operator manually restarts a task from a non-`QUEUED` state (e.g. `FAILED`,
+`CANCELLED`), the task goes through `execute()` again and is re-classified. This
+ensures restarts pick up any changes to agent availability or rate-limit status.
+
+## Rationale
+
+- **AI-picks-model**: the model selection decision is genuinely complex and
+  subjective. Using an AI classifier avoids hardcoding heuristics that would need
+  constant tuning.
+- **Code-picks-agent**: load balancing is a scheduling problem with measurable
+  inputs (active task counts, rate-limit deadlines). Delegating this to an AI
+  would introduce unnecessary non-determinism and latency.
+- **Gemini for classification**: using Gemini to classify Claude tasks (and vice
+  versa) prevents circular dependencies. Using the cheapest available Gemini model
+  keeps classification cost negligible.
+
+## Alternatives Considered
+
+- **Operator always picks agent and model**: too much manual overhead. Operators
+  should be able to submit tasks without knowing which agent is currently
+  rate-limited.
+- **Single classifier picks both agent and model**: rejected after operational
+  experience showed that load balancing needs to work even when the Gemini API
+  is unavailable or returning errors.
+- **Round-robin agent selection**: simpler but does not account for rate limits
+  or imbalanced task durations.
+
+## Consequences
+
+- Agent selection is deterministic and testable without mocking AI APIs.
+- Classification failures are logged but non-fatal; the task runs with the
+  agent's default model.
+- The classifier adds ~1–2 seconds of latency to task start (one Gemini API call).
+- Tasks with `agent.type` pre-set in YAML still go through load balancing;
+  `pickAgent` may override the requested type if the requested type is not a
+  registered runner. This is by design: the operator's type hint is overridden
+  by the load balancer to ensure tasks are always routable.
+
+## Relevant Code Locations
+
+| Concern | File |
+|---|---|
+| `pickAgent` | `internal/executor/executor.go` |
+| `Classifier` | `internal/executor/classifier.go` |
+| Load balancing in `execute()` | `internal/executor/executor.go` |
+| Re-classification gate | `internal/api/server.go` (handleRunTask) |
+| `pickAgent` tests | `internal/executor/executor_test.go` |
+| `Classifier` mock test | `internal/executor/classifier_test.go` |
diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md
new file mode 100644
index 0000000..b374561
--- /dev/null
+++ b/docs/adr/005-sandbox-execution-model.md
@@ -0,0 +1,130 @@
+# ADR-005: Git Sandbox Execution Model
+
+## Status
+Accepted
+
+## Context
+
+Tasks that modify source code need a safe execution environment. Running agents
+directly in the canonical working copy creates several problems:
+
+1. **Concurrent corruption**: multiple agents running in the same directory stomp
+   on each other's changes.
+2. **Partial work leaks**: if a task is cancelled mid-run, half-written files
+   remain in the working tree, blocking other work.
+3. **No rollback**: a failed agent may leave the codebase in a broken state with
+   no clean way to recover without manual `git reset`.
+4. **Audit trail**: changes made by an agent should be visible as discrete,
+   attributable git commits — not as an anonymous blob of working-tree noise.
+
+## Decision
+
+When a task has `agent.project_dir` set, `ClaudeRunner.Run` executes the agent
+in an isolated git clone (a "sandbox") rather than in the project directory
+directly.
+
+### Sandbox lifecycle
+
+```
+project_dir (canonical working copy)
+    |
+    | git clone --no-hardlinks <clone-source> /tmp/claudomator-sandbox-*
+    |
+    v
+sandbox (temp clone)
+    |
+    | agent runs here; commits its changes
+    |
+    | git push (to bare repo "local" or "origin")
+    |
+teardown: verify no uncommitted changes, remove sandbox dir
+```
+
+### Clone source
+
+`sandboxCloneSource` prefers a remote named `"local"` (a local bare repo).
+If not present it falls back to the `"origin"` remote. Using a bare repo
+accepts pushes cleanly; pushing to a non-bare working copy fails when the
+receiving branch is checked out.
+
+### Uncommitted-change enforcement
+
+Before teardown, the runner runs `git status --porcelain` in the sandbox.
+If any uncommitted changes are detected, the task is failed with an error
+message listing the files. The sandbox is **preserved** so the operator
+can inspect or recover the partial work. The error message includes the
+sandbox path.
+
+### Concurrent push conflicts
+
+If two sandboxes try to push at the same time, the second push is rejected
+(`"fetch first"` or `"non-fast-forward"` in the error output). The runner
+detects these signals and performs a fetch → rebase → retry sequence, up to
+a fixed retry limit, before giving up.
+
+### BLOCKED state and sandbox preservation
+
+When an agent exits with a `question.json` file (entering the `BLOCKED`
+state), the sandbox is **not** torn down. The preserved sandbox allows the
+resumed execution to pick up the same working tree state, including any
+in-progress file changes made before the agent asked its question.
+
+Resume executions (`SubmitResume`) skip sandbox setup entirely and run
+directly in `project_dir`, passing `--resume <session-id>` to the agent
+so Claude can continue its previous conversation.
+
+### Session ID propagation on resume
+
+A subtle bug was found and fixed: when a resumed execution is itself blocked
+again (a second BLOCKED→answer→resume cycle), the new execution record must
+carry the **original** `ResumeSessionID`, not the new execution's own UUID.
+If the wrong ID is used, `claude --resume` fails with "No conversation found".
+The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as
+`e.SessionID` rather than `e.ID`.
+
+## Rationale
+
+- **`--no-hardlinks`**: git defaults to hardlinking objects between clone and
+  source when both are on the same filesystem. This causes permission errors
+  when the source is owned by a different user (e.g. `www-data` vs. `root`).
+  The flag forces a full copy.
+- **Bare repo for push target**: non-bare repos reject pushes to checked-out
+  branches. A bare repo (`git init --bare`) accepts all pushes safely.
+- **Preserve sandbox on failure**: partial agent work may be valuable for
+  debugging or resumption. Destroying it immediately on failure was considered
+  and rejected.
+- **Agent must commit**: requiring the agent to commit all changes before
+  exiting ensures git history is always the source of truth. The enforcement
+  check (uncommitted files → FAILED) makes this invariant observable.
+
+## Alternatives Considered
+
+- **Run directly in working copy**: rejected because of concurrent corruption
+  and partial-work leakage.
+- **Copy files instead of git clone**: rejected because the agent needs a
+  working git history (for `git log`, `git blame`, and to push commits back).
+- **Docker/container isolation**: considered for stronger isolation but
+  rejected due to operational complexity, dependency on container runtime,
+  and inability to use the host's claude/gemini credentials.
+
+## Consequences
+
+- Tasks without `project_dir` are unaffected; they run in whatever working
+  directory the server process inherited.
+- If a sandbox's push repeatedly fails (e.g. due to a bare repo that is
+  itself broken), the task is failed with the sandbox preserved.
+- If `/tmp` runs out of space (many large sandboxes), tasks will fail at
+  clone time. This is a known operational risk with no current mitigation.
+- The `project_dir` field in task YAML must point to a git repository with
+  a configured `"local"` or `"origin"` remote that accepts pushes.
+
+## Relevant Code Locations
+
+| Concern | File |
+|---|---|
+| Sandbox setup/teardown | `internal/executor/claude.go` |
+| `setupSandbox`, `teardownSandbox` | `internal/executor/claude.go` |
+| `sandboxCloneSource` | `internal/executor/claude.go` |
+| Resume skips sandbox | `internal/executor/claude.go` (Run) |
+| Session ID propagation fix | `internal/executor/claude.go` (Run) |
+| Sandbox tests | `internal/executor/claude_test.go` |
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..27c7601
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,392 @@
+# Claudomator — System Architecture
+
+## 1. System Purpose and Design Goals
+
+Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude,
+Gemini), and reports results. Its primary use case is unattended, automated execution of agent
+tasks with real-time status streaming to a mobile Progressive Web App.
+
+**Design goals:**
+
+- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite.
+- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning.
+- **Durability** — all task state survives server restarts (SQLite + WAL mode).
+- **Real-time feedback** — WebSocket pushes task completion events to connected clients.
+- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven
+  model-tier selection via a cheap Gemini classifier.
+- **Git-isolated execution** — tasks that modify source code run in a temporary git clone
+  (sandbox) to prevent concurrent corruption and partial-work leakage.
+- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before
+  reaching `COMPLETED`.
+
+See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale.
+
+---
+
+## 2. High-Level Architecture
+
+```mermaid
+flowchart TD
+    CLI["CLI\ncmd/claudomator\n(cobra)"]
+    API["HTTP API\ninternal/api\nREST + WebSocket"]
+    Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"]
+    Classifier["Gemini Classifier\ninternal/executor.Classifier"]
+    ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"]
+    GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"]
+    Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"]
+    Subprocess["AI subprocess\nclaude -p / gemini"]
+    SQLite["SQLite\ntasks.db"]
+    LogFiles["Log Files\n~/.claudomator/executions/"]
+    Hub["WebSocket Hub\ninternal/api.Hub"]
+    Clients["Clients\nbrowser / mobile PWA"]
+    Notifier["Notifier\ninternal/notify"]
+
+    CLI -->|run / serve| API
+    CLI -->|direct YAML run| Pool
+    API -->|POST /api/tasks/run| Pool
+    Pool -->|pickAgent + Classify| Classifier
+    Pool -->|Submit / SubmitResume| ClaudeRunner
+    Pool -->|Submit / SubmitResume| GeminiRunner
+    ClaudeRunner -->|git clone| Sandbox
+    Sandbox -->|claude -p| Subprocess
+    GeminiRunner -->|gemini| Subprocess
+    Subprocess -->|stream-json stdout| LogFiles
+    Pool -->|UpdateTaskState| SQLite
+    Pool -->|resultCh| API
+    API -->|Broadcast| Hub
+    Hub -->|fan-out| Clients
+    API -->|Notify| Notifier
+```
+
+---
+
+## 3. Component Table
+
+| Package | Role | Key Exported Types |
+|---|---|---|
+| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` |
+| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` |
+| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` |
+| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` |
+| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` |
+| `internal/config` | TOML config loading; data-directory layout | `Config` |
+| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` |
+| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` |
+| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) |
+| `version` | Build-time version string | `Version` |
+
+---
+
+## 4. Package Dependency Graph
+
+```mermaid
+graph LR
+    cli["internal/cli"]
+    api["internal/api"]
+    executor["internal/executor"]
+    storage["internal/storage"]
+    task["internal/task"]
+    config["internal/config"]
+    reporter["internal/reporter"]
+    notify["internal/notify"]
+    web["web"]
+    version["version"]
+
+    cli --> api
+    cli --> executor
+    cli --> storage
+    cli --> config
+    cli --> version
+    api --> executor
+    api --> storage
+    api --> task
+    api --> notify
+    api --> web
+    executor --> storage
+    executor --> task
+    reporter --> task
+    reporter --> storage
+    storage --> task
+```
+
+---
+
+## 5. Task Execution Pipeline
+
+The following numbered steps trace a task from API submission to final state, with file and line
+references to the key logic in each step.
+
+1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`.
+   Task is written to SQLite in `PENDING` state.
+   (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`)
+
+2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates
+   the `PENDING → QUEUED` transition) then `executor.Pool.Submit`.
+   (`internal/api/server.go:460`, `internal/executor/executor.go:125`)
+
+3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot
+   (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`.
+   (`internal/executor/executor.go:102`)
+
+4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest
+   active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes
+   `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal.
+   (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`)
+
+5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite
+   every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state).
+   (`internal/executor/executor.go:642`)
+
+6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING`
+   status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are
+   immediately available for tailing.
+   (`internal/executor/executor.go:483`)
+
+7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument
+   list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume
+   execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first.
+   (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`)
+
+8. **Output streaming** — stdout is written to `<LogDir>/<execID>/stdout.log`; the runner
+   concurrently parses the `stream-json` lines for cost and session ID.
+   (`internal/executor/claude.go`)
+
+9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error
+   type to a final task state and calls `storage.DB.UpdateTaskState`.
+   (`internal/executor/executor.go:256`)
+
+   | Outcome | Final state |
+   |---|---|
+   | `runner.Run` → `nil`, top-level, no subtasks | `READY` |
+   | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` |
+   | `runner.Run` → `nil`, subtask | `COMPLETED` |
+   | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` |
+   | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` |
+   | `ctx.Err() == Canceled` | `CANCELLED` |
+   | quota exhausted | `BUDGET_EXCEEDED` |
+   | any other error | `FAILED` |
+
+10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults`
+    reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`.
+    (`internal/api/server.go:123`, `internal/api/server.go:129`)
+
+11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain,
+    `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task
+    fails and the sandbox is preserved for inspection.
+    (`internal/executor/claude.go:teardownSandbox`)
+
+12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or
+    `POST /api/tasks/{id}/reject` (`READY → PENDING`).
+    (`internal/api/server.go:487`, `internal/api/server.go:507`)
+
+---
+
+## 6. Task State Machine
+
+```mermaid
+stateDiagram-v2
+    [*] --> PENDING : task created
+
+    PENDING --> QUEUED : POST /run
+    PENDING --> CANCELLED : POST /cancel
+
+    QUEUED --> RUNNING : pool goroutine starts
+    QUEUED --> CANCELLED : POST /cancel
+
+    RUNNING --> READY : exit 0, top-level, no subtasks
+    RUNNING --> BLOCKED : exit 0, top-level, has subtasks
+    RUNNING --> BLOCKED : question.json written
+    RUNNING --> COMPLETED : exit 0, subtask
+    RUNNING --> FAILED : exit non-zero / stream error
+    RUNNING --> TIMED_OUT : context deadline exceeded
+    RUNNING --> CANCELLED : context cancelled
+    RUNNING --> BUDGET_EXCEEDED : quota exhausted
+
+    READY --> COMPLETED : POST /accept
+    READY --> PENDING : POST /reject
+
+    BLOCKED --> QUEUED : POST /answer
+    BLOCKED --> READY : all subtasks COMPLETED
+
+    FAILED --> QUEUED : POST /run (retry)
+    TIMED_OUT --> QUEUED : POST /resume
+    CANCELLED --> QUEUED : POST /run (restart)
+    BUDGET_EXCEEDED --> QUEUED : POST /run (retry)
+
+    COMPLETED --> [*]
+```
+
+**State definitions** (`internal/task/task.go:9`):
+
+| State | Meaning |
+|---|---|
+| `PENDING` | Created; not yet submitted for execution |
+| `QUEUED` | Submitted to pool; waiting for a goroutine slot |
+| `RUNNING` | Subprocess actively executing |
+| `READY` | Top-level task done; awaiting operator accept/reject |
+| `COMPLETED` | Fully done (only true terminal state) |
+| `FAILED` | Execution error; eligible for retry |
+| `TIMED_OUT` | Exceeded configured timeout; resumable |
+| `CANCELLED` | Explicitly cancelled by operator |
+| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` |
+| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks |
+
+`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state
+write. (`internal/task/task.go:113`)
+
+---
+
+## 7. WebSocket Broadcast Flow
+
+```mermaid
+sequenceDiagram
+    participant Runner as ClaudeRunner / GeminiRunner
+    participant Pool as executor.Pool
+    participant API as api.Server (forwardResults)
+    participant Hub as api.Hub
+    participant Client1 as WebSocket client 1
+    participant Client2 as WebSocket client 2
+
+    Runner->>Pool: runner.Run() returns
+    Pool->>Pool: handleRunResult() — sets exec.Status
+    Pool->>SQLite: UpdateTaskState + UpdateExecution
+    Pool->>Pool: resultCh <- &Result{...}
+    API->>Pool: <-pool.Results()
+    API->>API: marshal task_completed JSON event
+    API->>Hub: hub.Broadcast(data)
+    Hub->>Client1: ws.Write(data)
+    Hub->>Client2: ws.Write(data)
+    API->>Notifier: notifier.Notify(Event{...}) [if set]
+```
+
+**Event payload** (JSON):
+```json
+{
+  "type":      "task_completed",
+  "task_id":   "<uuid>",
+  "status":    "READY | COMPLETED | FAILED | ...",
+  "exit_code": 0,
+  "cost_usd":  0.042,
+  "error":     "",
+  "timestamp": "2026-03-11T12:00:00Z"
+}
+```
+
+The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses
+an interactive question tool (currently unused in the primary file-based `BLOCKED` flow).
+
+WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is
+configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections.
+(`internal/api/websocket.go`)
+
+---
+
+## 8. Subtask / Parent-Task Dependency Resolution
+
+Claudomator supports two distinct dependency mechanisms:
+
+### 8a. `depends_on` — explicit task ordering
+
+Tasks declare `depends_on: [<task-id>, ...]` in their YAML or creation payload. When the
+executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed
+tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`,
+`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED`
+immediately. (`internal/executor/executor.go:642`)
+
+### 8b. Parent / subtask blocking (`BLOCKED` state)
+
+A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it)
+transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows
+an agent to dispatch subtasks and then wait for them.
+
+```
+Parent task runner exits 0
+    ├── no subtasks → READY (normal review gate)
+    └── has subtasks → BLOCKED (waiting for subtasks)
+                          │
+            Each subtask completes → COMPLETED
+                          │
+            All subtasks COMPLETED?
+                  ├── yes → maybeUnblockParent() → parent READY
+                  └── no  → parent stays BLOCKED
+```
+
+`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It
+loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls
+`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`)
+
+The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects
+a `question.json` file in the execution log directory, it returns a `*BlockedError` containing
+the question JSON and session ID. The pool stores the question in the `tasks.question_json`
+column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer`
+resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying
+`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`,
+`internal/api/server.go:221`)
+
+---
+
+## 9. External Go Dependencies
+
+| Module | Version | Purpose |
+|---|---|---|
+| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) |
+| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs |
+| `github.com/spf13/cobra` | v1.10.2 | CLI command framework |
+| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing |
+| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server |
+| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing |
+
+---
+
+## 10. Related Documentation
+
+### Architectural Decision Records
+
+| ADR | Title | Summary |
+|---|---|---|
+| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale |
+| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases |
+| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist |
+| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier |
+| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation |
+
+### Package-Level Docs
+
+Per-package design notes live in [`docs/packages/`](packages/) (in progress).
+
+### Task YAML Reference
+
+```yaml
+name: "My Task"
+agent:
+  type: "claude"               # "claude" | "gemini"; optional — load balancer may override
+  model: "sonnet"              # model tier hint; optional — classifier may override
+  instructions: |
+    Do something useful.
+  project_dir: "/workspace/myproject"   # if set, runs in a git sandbox
+  max_budget_usd: 1.00
+  permission_mode: "bypassPermissions"  # default
+  allowed_tools: ["Bash", "Read"]
+  context_files: ["README.md"]
+timeout: "15m"
+priority: "normal"             # high | normal | low
+tags: ["ci", "backend"]
+depends_on: ["<task-uuid>"]    # explicit ordering
+parent_task_id: "<task-uuid>"  # set by parent agent when creating subtasks
+```
+
+Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`.
+
+### Storage Schema
+
+Two tables auto-migrated on `storage.Open()`:
+
+- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`,
+  `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`,
+  `question_json`, `rejection_comment`, `created_at`, `updated_at`
+- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`,
+  `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`,
+  `resume_session_id`, `resume_answer`
+
+Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`,
+`executions.status`, `executions.start_time`.
diff --git a/docs/packages/executor.md b/docs/packages/executor.md
new file mode 100644
index 0000000..29d6a0c
--- /dev/null
+++ b/docs/packages/executor.md
@@ -0,0 +1,447 @@
+# Package: executor
+
+`internal/executor` — Bounded goroutine pool that drives agent subprocesses.
+
+---
+
+## 1. Overview
+
+The `executor` package is Claudomator's task-running engine. Its central type, `Pool`, maintains a bounded set of concurrent worker goroutines. Each goroutine picks a task from an internal work queue, selects the appropriate `Runner` implementation (Claude or Gemini), and launches the agent as an OS subprocess. When a subprocess finishes the pool classifies the outcome, updates storage, and emits a `Result` on a channel that the API layer consumes.
+
+Key responsibilities:
+
+- **Concurrency control** — never exceed `maxConcurrent` simultaneous workers.
+- **Load-balanced agent selection** — prefer the agent type with the fewest active tasks; skip rate-limited agents.
+- **Dynamic model selection** — the `Classifier` picks the cheapest model that fits the task's complexity.
+- **Dependency gating** — poll storage until all `depends_on` tasks reach COMPLETED before starting a worker.
+- **Question / block / resume cycle** — detect when an agent needs user input, park the task as BLOCKED, and re-run it with the user's answer when one arrives.
+- **Rate-limit back-pressure** — exponential backoff within a run; per-agent cooldown across runs.
+- **Sandbox isolation** — optional `project_dir` support clones the project into a temp directory, runs the agent there, then merges committed changes back.
+
+---
+
+## 2. Pool
+
+### Struct fields
+
+```go
+type Pool struct {
+    maxConcurrent   int                          // global concurrency ceiling
+    runners         map[string]Runner            // "claude" → ClaudeRunner, "gemini" → GeminiRunner
+    store           Store                        // storage interface (GetTask, UpdateTaskState, …)
+    logger          *slog.Logger
+    depPollInterval time.Duration                // how often waitForDependencies polls; default 5 s
+
+    mu             sync.Mutex
+    active         int                          // number of goroutines currently in execute/executeResume
+    activePerAgent map[string]int               // per-agent-type active count (used for load balancing)
+    rateLimited    map[string]time.Time         // agentType → cooldown deadline
+    cancels        map[string]context.CancelFunc // taskID → cancellation function
+
+    resultCh chan *Result    // emits results; buffered at maxConcurrent*2
+    workCh   chan workItem   // internal work queue; buffered at maxConcurrent*10+100
+    doneCh   chan struct{}   // signals when a worker slot is freed; buffered at maxConcurrent
+
+    Questions  *QuestionRegistry
+    Classifier *Classifier
+}
+```
+
+### Concurrency model
+
+The pool enforces a **global** ceiling (`maxConcurrent`) rather than per-agent-type limits. Within that ceiling, `activePerAgent` tracks how many goroutines are running for each agent type; this counter is the input to the load balancer, not an additional hard limit.
+
+A single long-lived `dispatch()` goroutine serialises the slot-allocation decision. It blocks on `doneCh` when all `maxConcurrent` slots are taken, then launches a worker goroutine as soon as any slot is freed. This prevents `Submit()` from blocking the caller while still honouring the concurrency ceiling.
+
+```
+workCh (buffered) ──► dispatch goroutine ──► execute goroutine (×N)
+                                │                     │
+                                └── waits on doneCh ◄─┘ (on exit)
+```
+
+### Dependency polling loop
+
+When a task has a non-empty `DependsOn` list, `waitForDependencies()` is called inside the worker goroutine (after agent selection but before subprocess launch). It polls `store.GetTask()` for each dependency at `depPollInterval` intervals (default 5 s). It returns:
+
+- `nil` — all dependencies reached COMPLETED.
+- error — a dependency reached a terminal failure state (FAILED, TIMED_OUT, CANCELLED, BUDGET_EXCEEDED) or the context was cancelled.
+
+---
+
+## 3. Pool.Submit()
+
+```go
+func (p *Pool) Submit(ctx context.Context, t *task.Task) error
+func (p *Pool) SubmitResume(ctx context.Context, t *task.Task, exec *storage.Execution) error
+```
+
+`Submit()` is non-blocking: it sends a `workItem` to `workCh`. If `workCh` is full (capacity `maxConcurrent*10+100`) it returns an error immediately rather than blocking the caller.
+
+`SubmitResume()` is used to re-queue a BLOCKED or TIMED_OUT task. The provided `exec` must have `ResumeSessionID` and `ResumeAnswer` set; the pool routes it through `executeResume()` instead of `execute()`, skipping agent selection and classification.
+
+**Goroutine lifecycle** (normal path via `execute()`):
+
+1. `dispatch` dequeues the `workItem` and waits for a free slot.
+2. `dispatch` increments `p.active` under the mutex and launches `go execute(ctx, t)`.
+3. `execute` selects the agent, optionally classifies the model, and increments `activePerAgent`.
+4. `execute` waits for dependencies, creates the execution record in storage, applies a timeout, and calls `runner.Run()`.
+5. On return, `execute` calls `handleRunResult()`, decrements `p.active` and `activePerAgent`, and sends a token to `doneCh`.
+6. `dispatch` unblocks and processes the next `workItem`.
+
+---
+
+## 4. Runner Interface
+
+```go
+type Runner interface {
+    Run(ctx context.Context, t *task.Task, exec *storage.Execution) error
+}
+
+// Optional extension — lets the pool persist log paths before execution starts.
+type LogPather interface {
+    ExecLogDir(execID string) string
+}
+```
+
+`Run()` is responsible for the full lifecycle of a single subprocess invocation, including log file management and cost extraction. It returns:
+
+- `nil` — task succeeded.
+- `*BlockedError` — agent wrote a question file; pool transitions task to BLOCKED.
+- any other error — pool classifies as FAILED, TIMED_OUT, CANCELLED, or BUDGET_EXCEEDED based on context and error text.
+
+The pool selects a runner by looking up `t.Agent.Type` in the `runners` map. If `t.Agent.Type` is empty it falls back to `"claude"`.
+
+---
+
+## 5. ClaudeRunner
+
+```go
+type ClaudeRunner struct {
+    BinaryPath string // defaults to "claude"
+    Logger     *slog.Logger
+    LogDir     string // base directory for per-execution log directories
+    APIURL     string // injected as CLAUDOMATOR_API_URL env var
+}
+```
+
+### Subprocess invocation
+
+`Run()` calls `buildArgs()` to construct the `claude` CLI invocation, then passes it through `runWithBackoff()` (up to 3 attempts on rate-limit errors, 5 s base delay, exponential backoff, capped at 5 min).
+
+**New execution flags:**
+
+| Flag | Value |
+|---|---|
+| `-p` | task instructions (optionally prefixed with planning preamble) |
+| `--session-id` | execution UUID (doubles as the session UUID for resumability) |
+| `--output-format stream-json` | machine-readable JSONL output |
+| `--verbose` | include cost/token fields |
+| `--model` | from `t.Agent.Model` (if set) |
+| `--max-budget-usd` | from `t.Agent.MaxBudgetUSD` (if > 0) |
+| `--permission-mode` | `bypassPermissions` by default; overridable per task |
+| `--append-system-prompt` | from `t.Agent.SystemPromptAppend` |
+| `--allowedTools` | repeated for each entry in `t.Agent.AllowedTools` |
+| `--disallowedTools` | repeated for each entry in `t.Agent.DisallowedTools` |
+| `--add-dir` | repeated for each entry in `t.Agent.ContextFiles` |
+
+**Resume execution flags** (when `exec.ResumeSessionID != ""`):
+
+| Flag | Value |
+|---|---|
+| `-p` | `exec.ResumeAnswer` (the user's answer) |
+| `--resume` | `exec.ResumeSessionID` (original session to continue) |
+| `--output-format stream-json` | |
+| `--verbose` | |
+| `--permission-mode` | same default/override as above |
+| `--model` | from `t.Agent.Model` (if set) |
+
+**Environment variables** injected into every subprocess:
+
+```
+CLAUDOMATOR_API_URL          — base URL for subtask creation and question answers
+CLAUDOMATOR_TASK_ID          — parent task ID (used in subtask POST bodies)
+CLAUDOMATOR_QUESTION_FILE    — path where the agent writes a question JSON
+```
+
+### Stream-JSON output parsing
+
+`execOnce()` attaches an `os.Pipe` to `cmd.Stdout` and spawns a goroutine that calls `parseStream()`. The goroutine tees the stream to the log file while scanning for JSONL events:
+
+| Event type | Action |
+|---|---|
+| `rate_limit_event` with `status: rejected` | sets `streamErr` to a rate-limit error |
+| `assistant` with `error: rate_limit` | sets `streamErr` to a rate-limit error |
+| `result` with `is_error: true` | sets `streamErr` to a task-failure error |
+| `result` with `total_cost_usd` | captures total cost |
+| `user` containing denied `tool_result` | sets `streamErr` to a permission-denial error |
+| any message with `cost_usd` | updates running cost (legacy field) |
+
+`costUSD` and `streamErr` are read after `cmd.Wait()` returns and the goroutine drains.
+
+### Cost and token extraction
+
+`exec.CostUSD` is set from `total_cost_usd` (preferred) or `cost_usd` (legacy) in the stream. Token counts are not extracted at the runner level; they are available in the raw stdout log.
+
+### Project sandbox enforcement
+
+When `t.Agent.ProjectDir` is set on a new (non-resume) execution:
+
+1. `setupSandbox()` ensures `projectDir` is a git repository (initialising with an empty commit if needed), then `git clone --no-hardlinks` it into a `claudomator-sandbox-*` temp directory. It prefers a remote named `"local"` over `"origin"` as the clone source.
+2. The subprocess runs with its working directory set to the sandbox.
+3. On success, `teardownSandbox()` verifies the sandbox has no uncommitted changes, counts commits ahead of `origin/HEAD`, and pushes them to the bare repo. If the push is rejected due to a concurrent task, it fetches and rebases then retries once.
+4. The sandbox is removed on success. On any failure the sandbox is preserved and its path is included in the error message.
+
+Resume executions skip sandboxing and run directly in `projectDir` so the agent can pick up its previous session state.
+
+### Question file mechanism
+
+The agent preamble instructs the agent to write a JSON file to `$CLAUDOMATOR_QUESTION_FILE` and exit immediately when it needs user input. After `execOnce()` returns successfully (exit code 0), `Run()` attempts to read that file:
+
+- If the file exists: it is consumed (`os.Remove`), and `Run()` returns `&BlockedError{QuestionJSON: ..., SessionID: e.SessionID}`.
+- If the file is absent: `Run()` returns `nil` (success).
+
+---
+
+## 6. GeminiRunner
+
+```go
+type GeminiRunner struct {
+    BinaryPath string // defaults to "gemini"
+    Logger     *slog.Logger
+    LogDir     string
+    APIURL     string
+}
+```
+
+`GeminiRunner` follows the same pattern as `ClaudeRunner` with these differences:
+
+| Aspect | ClaudeRunner | GeminiRunner |
+|---|---|---|
+| Binary | `claude` | `gemini` |
+| Flag structure | `claude -p <instructions> --session-id … --output-format stream-json …` | `gemini <instructions> --output-format stream-json …` (instructions are the first positional argument) |
+| Session/resume | `--session-id` for new; `--resume` for resume | Session ID stored on `exec` but resume flag handling is a stub (not yet implemented) |
+| Rate-limit retries | `runWithBackoff` (up to 3 retries) | Single attempt; no retry loop |
+| Sandbox | Full clone + teardown + push | Not implemented; runs directly in `projectDir` |
+| Stream parsing | `parseStream()` | `parseStream()` (shared implementation; assumes compatible JSONL format) |
+| Question file | Same mechanism | Same mechanism |
+
+`GeminiRunner` implements `LogPather` identically to `ClaudeRunner`.
+
+---
+
+## 7. Classifier
+
+```go
+type Classifier struct {
+    GeminiBinaryPath string // defaults to "gemini"
+}
+
+type Classification struct {
+    AgentType string `json:"agent_type"`
+    Model     string `json:"model"`
+    Reason    string `json:"reason"`
+}
+
+type SystemStatus struct {
+    ActiveTasks map[string]int  // agentType → number of active tasks
+    RateLimited map[string]bool // agentType → true if currently rate-limited
+}
+```
+
+`Classify()` is called once per task, after load-balanced agent selection but before the subprocess starts. Its sole output is the `Model` field; `AgentType` is already fixed by the load balancer.
+
+**Decision factors baked into the prompt:**
+
+- The already-chosen agent type (Claude or Gemini).
+- The task `Name` and `Instructions`.
+- A hard-coded model tier list:
+  - Claude: `haiku-4-5` (cheap/fast) → `sonnet-4-6` (default) → `opus-4-6` (hard tasks only)
+  - Gemini: `gemini-2.5-flash-lite` (cheap) → `gemini-2.5-flash` (default) → `gemini-2.5-pro` (hard tasks only)
+
+**Implementation**: `Classify()` invokes `gemini --prompt <prompt> --model gemini-2.5-flash-lite --output-format json`. It then strips markdown code fences and "Loaded cached credentials" noise before unmarshalling the JSON response into a `Classification`. If classification fails, the pool logs the error and proceeds with whatever model was already on `t.Agent.Model`.
+
+---
+
+## 8. QuestionRegistry
+
+```go
+type QuestionRegistry struct {
+    mu        sync.Mutex
+    questions map[string]*PendingQuestion // keyed by toolUseID
+}
+
+type PendingQuestion struct {
+    TaskID    string
+    ToolUseID string
+    Input     json.RawMessage
+    AnswerCh  chan string // buffered(1); closed when answer delivered
+}
+```
+
+The registry tracks questions that are waiting for a human answer. It is populated via the stream-parsing path (`streamAndParseWithQuestions`) when an agent emits an `AskUserQuestion` tool_use event in the JSONL stream.
+
+| Method | Description |
+|---|---|
+| `Register(taskID, toolUseID, input)` | Adds a `PendingQuestion`; returns the buffered answer channel. |
+| `Answer(toolUseID, answer)` | Delivers an answer, removes the entry, returns `false` if not found. |
+| `Get(toolUseID)` | Returns the pending question or `nil`. |
+| `PendingForTask(taskID)` | Returns all pending questions for a task (used by the API to list them). |
+| `Remove(toolUseID)` | Removes without answering (used on task cancellation). |
+
+**How a blocked task is resumed:**
+
+1. The API receives a `POST /api/tasks/{id}/answer` with the user's answer.
+2. It calls `QuestionRegistry.Answer(toolUseID, answer)`, which unblocks any goroutine waiting on `AnswerCh`.
+3. Separately (for the file-based path), the API creates a resume `Execution` with `ResumeSessionID` and `ResumeAnswer`, then calls `Pool.SubmitResume()`.
+4. The pool runs `executeResume()`, which invokes `runner.Run()` with `--resume <session>` and the answer as `-p`.
+
+---
+
+## 9. Rate Limiter
+
+Rate limiting is implemented at two levels:
+
+### Per-invocation retry (within `ClaudeRunner.Run`)
+
+`runWithBackoff(ctx, maxRetries=3, baseDelay=5s, fn)` calls the subprocess, detects rate-limit errors (`isRateLimitError`), and retries with exponential backoff:
+
+```
+attempt 0: immediate
+attempt 1: 5 s delay  (or Retry-After header value if present)
+attempt 2: 10 s delay
+attempt 3: 20 s delay  (capped at 5 min)
+```
+
+`isRateLimitError` matches: `"rate limit"`, `"too many requests"`, `"429"`, `"overloaded"`.
+
+Quota-exhausted errors (`isQuotaExhausted`) — strings like `"hit your limit"` or `"status: rejected"` — are **not** retried; they propagate immediately to `handleRunResult`.
+
+### Per-agent-type pool cooldown
+
+When `handleRunResult` detects a rate-limit or quota error it records a cooldown deadline in `p.rateLimited[agentType]`:
+
+- Transient rate limit: 1 minute (or `Retry-After` value from the error message).
+- Quota exhausted: 5 hours.
+
+During agent selection in `execute()`, the load balancer skips agents whose cooldown deadline has not passed yet. If all agents are on cooldown, the agent with the fewest active tasks is chosen anyway (best-effort fallback).
+
+---
+
+## 10. BlockedError
+
+```go
+type BlockedError struct {
+    QuestionJSON string // raw JSON: {"text": "...", "options": [...]}
+    SessionID    string // claude session to resume once the user answers
+}
+```
+
+`BlockedError` is returned by `Runner.Run()` when the agent wrote a question to `$CLAUDOMATOR_QUESTION_FILE` and exited. It is **not** a fatal error — it is a signal that the task requires human input before it can continue.
+
+**Pool handling** (`handleRunResult`):
+
+```go
+var blockedErr *BlockedError
+if errors.As(err, &blockedErr) {
+    exec.Status = "BLOCKED"
+    store.UpdateTaskState(t.ID, task.StateBlocked)
+    store.UpdateTaskQuestion(t.ID, blockedErr.QuestionJSON)
+}
+```
+
+The pool does **not** emit a failure result; it still sends a `Result` to `resultCh` with `Err` set to the `BlockedError`. The API layer reads `exec.Status == "BLOCKED"` and exposes the question to the caller. Once the user answers, the API calls `Pool.SubmitResume()` and the task transitions back to RUNNING.
+
+---
+
+## 11. Result Struct and handleRunResult
+
+```go
+type Result struct {
+    TaskID    string
+    Execution *storage.Execution
+    Err       error
+}
+```
+
+`handleRunResult` is shared between `execute()` and `executeResume()`. It inspects the error returned by `Runner.Run()` and maps it to an execution status and task state:
+
+| Condition | exec.Status | task.State |
+|---|---|---|
+| `*BlockedError` | `BLOCKED` | `StateBlocked` |
+| `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` | `StateTimedOut` |
+| `ctx.Err() == Canceled` | `CANCELLED` | `StateCancelled` |
+| `isQuotaExhausted(err)` | `BUDGET_EXCEEDED` | `StateBudgetExceeded` |
+| any other non-nil error | `FAILED` | `StateFailed` |
+| nil, root task, subtasks exist | `BLOCKED` | `StateBlocked` (waits for subtasks) |
+| nil, root task, no subtasks | `READY` | `StateReady` |
+| nil, subtask | `COMPLETED` | `StateCompleted` |
+
+After a subtask completes, `maybeUnblockParent()` checks whether all sibling subtasks are also COMPLETED; if so, the parent transitions from BLOCKED to READY.
+
+---
+
+## Diagrams
+
+### Sequence: Pool.Submit → goroutine → Runner.Run → state update
+
+```mermaid
+sequenceDiagram
+    participant Caller
+    participant Pool
+    participant Dispatch as dispatch goroutine
+    participant Worker as execute goroutine
+    participant Classifier
+    participant Runner
+    participant Store
+
+    Caller->>Pool: Submit(ctx, task)
+    Pool->>Dispatch: workCh ← workItem
+    Dispatch->>Dispatch: wait for free slot (doneCh)
+    Dispatch->>Worker: go execute(ctx, task)
+
+    Worker->>Pool: snapshot activePerAgent / rateLimited
+    Pool-->>Worker: SystemStatus
+    Worker->>Worker: pickAgent(status) → agentType
+    Worker->>Classifier: Classify(taskName, instructions, status, agentType)
+    Classifier-->>Worker: Classification{Model}
+
+    Worker->>Store: waitForDependencies (poll every 5s)
+    Store-->>Worker: all deps COMPLETED
+
+    Worker->>Store: CreateExecution(exec)
+    Worker->>Store: UpdateTaskState(RUNNING)
+
+    Worker->>Runner: Run(ctx, task, exec)
+    Note over Runner: subprocess runs,<br/>stream parsed
+    Runner-->>Worker: error or nil
+
+    Worker->>Pool: handleRunResult(ctx, task, exec, err, agentType)
+    Pool->>Store: UpdateTaskState(new state)
+    Pool->>Store: UpdateExecution(exec)
+    Pool->>Caller: resultCh ← Result{TaskID, Execution, Err}
+
+    Worker->>Dispatch: doneCh ← token (slot freed)
+```
+
+### Flowchart: Question-handling flow
+
+```mermaid
+flowchart TD
+    A([Agent subprocess running]) --> B{Need user input?}
+    B -- No --> C([Task completes normally])
+    B -- Yes --> D["Write JSON to\n$CLAUDOMATOR_QUESTION_FILE\n{text, options}"]
+    D --> E[Agent exits with code 0]
+    E --> F[Runner reads question file]
+    F --> G["Return &BlockedError\n{QuestionJSON, SessionID}"]
+    G --> H[Pool.handleRunResult detects BlockedError]
+    H --> I[exec.Status = BLOCKED]
+    H --> J[Store.UpdateTaskState → StateBlocked]
+    H --> K[Store.UpdateTaskQuestion → QuestionJSON]
+    K --> L[API exposes question to user]
+    L --> M[User submits answer via API]
+    M --> N["API creates Execution:\nResumeSessionID = original SessionID\nResumeAnswer = user answer"]
+    N --> O[Pool.SubmitResume → workCh]
+    O --> P[dispatch launches executeResume goroutine]
+    P --> Q["Runner.Run with\n--resume ResumeSessionID\n-p ResumeAnswer"]
+    Q --> R([Claude session continues from where it left off])
+```
diff --git a/docs/packages/storage.md b/docs/packages/storage.md
new file mode 100644
index 0000000..2e53cea
--- /dev/null
+++ b/docs/packages/storage.md
@@ -0,0 +1,209 @@
+# Package: `internal/storage`
+
+## Overview
+
+The `storage` package provides the SQLite persistence layer for Claudomator. It exposes a `DB` struct that manages all reads and writes to the database, runs schema migrations automatically on `Open`, and enforces valid state transitions when updating task states.
+
+Key characteristics:
+- **Auto-migration**: `Open` creates all tables and indexes if they do not exist, then applies additive column migrations idempotently (duplicate-column errors are silently ignored).
+- **WAL mode**: The database is opened with `?_journal_mode=WAL&_busy_timeout=5000` for concurrent read safety and reduced lock contention.
+- **State enforcement**: `UpdateTaskState` performs the state change inside a transaction and calls `task.ValidTransition` before committing.
+- **Cascading deletes**: `DeleteTask` removes a task, all its descendant subtasks, and all their execution records in a single transaction using a recursive CTE.
+
+---
+
+## Schema
+
+### `tasks` table
+
+| Column           | Type      | Description                                              |
+|------------------|-----------|----------------------------------------------------------|
+| `id`             | `TEXT PK` | UUID; primary key.                                       |
+| `name`           | `TEXT`    | Human-readable task name. Not null.                      |
+| `description`    | `TEXT`    | Optional longer description.                             |
+| `config_json`    | `TEXT`    | JSON blob — serialised `AgentConfig` struct.             |
+| `priority`       | `TEXT`    | `"high"`, `"normal"`, or `"low"`. Default `"normal"`.   |
+| `timeout_ns`     | `INTEGER` | Timeout as nanoseconds (`time.Duration`). Default `0`.   |
+| `retry_json`     | `TEXT`    | JSON blob — serialised `RetryConfig` struct.             |
+| `tags_json`      | `TEXT`    | JSON blob — serialised `[]string`.                       |
+| `depends_on_json`| `TEXT`    | JSON blob — serialised `[]string` of task IDs.          |
+| `parent_task_id` | `TEXT`    | ID of parent task; null for root tasks.                  |
+| `state`          | `TEXT`    | Current `State` value. Default `"PENDING"`.              |
+| `created_at`     | `DATETIME`| Creation timestamp (UTC).                                |
+| `updated_at`     | `DATETIME`| Last-update timestamp (UTC).                             |
+| `rejection_comment` | `TEXT` | Reviewer comment set by `RejectTask`; nullable.         |
+| `question_json`  | `TEXT`    | Pending agent question set by `UpdateTaskQuestion`; nullable. |
+
+**JSON blob columns**: `config_json`, `retry_json`, `tags_json`, `depends_on_json`.
+
+**Indexes**: `idx_tasks_state` (state), `idx_tasks_parent_task_id` (parent_task_id).
+
+### `executions` table
+
+| Column        | Type      | Description                                                   |
+|---------------|-----------|---------------------------------------------------------------|
+| `id`          | `TEXT PK` | UUID; primary key.                                            |
+| `task_id`     | `TEXT`    | Foreign key → `tasks.id`.                                     |
+| `start_time`  | `DATETIME`| When the execution started (UTC).                             |
+| `end_time`    | `DATETIME`| When the execution ended (UTC); null while running.           |
+| `exit_code`   | `INTEGER` | Process exit code; 0 = success.                               |
+| `status`      | `TEXT`    | Execution status string (mirrors task state at completion).   |
+| `stdout_path` | `TEXT`    | Path to the stdout log file under `~/.claudomator/executions/`. |
+| `stderr_path` | `TEXT`    | Path to the stderr log file under `~/.claudomator/executions/`. |
+| `artifact_dir`| `TEXT`    | Directory containing output artifacts.                        |
+| `cost_usd`    | `REAL`    | Total API spend for this execution in USD.                    |
+| `error_msg`   | `TEXT`    | Error description if the execution failed.                    |
+| `session_id`  | `TEXT`    | Claude `--session-id` value; used to resume interrupted runs. |
+
+**Indexes**: `idx_executions_status`, `idx_executions_task_id`, `idx_executions_start_time`.
+
+---
+
+## `DB` Struct Methods
+
+### Lifecycle
+
+#### `Open(path string) (*DB, error)`
+Opens the SQLite database at `path`, runs auto-migration, and returns a `*DB`. The connection string appends `?_journal_mode=WAL&_busy_timeout=5000`.
+
+#### `(*DB) Close() error`
+Closes the underlying database connection.
+
+---
+
+### Task CRUD
+
+#### `(*DB) CreateTask(t *task.Task) error`
+Inserts a new task. `AgentConfig`, `RetryConfig`, `Tags`, and `DependsOn` are serialised as JSON blobs; `Timeout` is stored as nanoseconds.
+
+#### `(*DB) GetTask(id string) (*task.Task, error)`
+Retrieves a single task by ID. Returns an error wrapping `sql.ErrNoRows` if not found.
+
+#### `(*DB) ListTasks(filter TaskFilter) ([]*task.Task, error)`
+Returns tasks matching `filter`, ordered by `created_at DESC`. See [TaskFilter](#taskfilter).
+
+#### `(*DB) ListSubtasks(parentID string) ([]*task.Task, error)`
+Returns all tasks whose `parent_task_id` equals `parentID`, ordered by `created_at ASC`.
+
+#### `(*DB) UpdateTask(id string, u TaskUpdate) error`
+Replaces editable fields (`Name`, `Description`, `Config`, `Priority`, `TimeoutNS`, `Retry`, `Tags`, `DependsOn`) and resets `state` to `PENDING`. Returns an error if the task does not exist.
+
+#### `(*DB) DeleteTask(id string) error`
+Deletes a task, all its descendant subtasks (via recursive CTE), and all their execution records in a single transaction. Returns an error if the task does not exist.
+
+---
+
+### State Management
+
+#### `(*DB) UpdateTaskState(id string, newState task.State) error`
+Atomically updates a task's state inside a transaction. Calls `task.ValidTransition` to reject illegal moves before applying the change.
+
+#### `(*DB) RejectTask(id, comment string) error`
+Sets the task's state to `PENDING` and stores `comment` in `rejection_comment`. Does not go through the state-machine validator (direct update).
+
+#### `(*DB) ResetTaskForRetry(id string) (*task.Task, error)`
+Validates that the current state can transition to `QUEUED`, clears `Agent.Type` and `Agent.Model` so the task can be re-classified, sets `state = QUEUED`, and persists the changes in a transaction. Returns the updated task.
+
+#### `(*DB) UpdateTaskQuestion(taskID, questionJSON string) error`
+Stores a JSON-encoded agent question on the task. Pass an empty string to clear the question after it has been answered.
+
+---
+
+### Execution CRUD
+
+#### `(*DB) CreateExecution(e *Execution) error`
+Inserts a new execution record.
+
+#### `(*DB) GetExecution(id string) (*Execution, error)`
+Retrieves a single execution by ID.
+
+#### `(*DB) ListExecutions(taskID string) ([]*Execution, error)`
+Returns all executions for a task, ordered by `start_time DESC`.
+
+#### `(*DB) GetLatestExecution(taskID string) (*Execution, error)`
+Returns the most recent execution for a task.
+
+#### `(*DB) UpdateExecution(e *Execution) error`
+Updates a completed execution record (end time, exit code, status, cost, paths, session ID).
+
+#### `(*DB) ListRecentExecutions(since time.Time, limit int, taskID string) ([]*RecentExecution, error)`
+Returns executions since `since`, joined with the task name, ordered by `start_time DESC`. If `taskID` is non-empty only executions for that task are included. Returns `RecentExecution` values (see below).
+
+---
+
+## `TaskFilter`
+
+```go
+type TaskFilter struct {
+    State task.State
+    Limit int
+}
+```
+
+| Field   | Type         | Description                                           |
+|---------|--------------|-------------------------------------------------------|
+| `State` | `task.State` | If non-empty, only tasks in this state are returned.  |
+| `Limit` | `int`        | Maximum number of results. `0` means no limit.        |
+
+---
+
+## `Execution` Struct
+
+```go
+type Execution struct {
+    ID              string
+    TaskID          string
+    StartTime       time.Time
+    EndTime         time.Time
+    ExitCode        int
+    Status          string
+    StdoutPath      string
+    StderrPath      string
+    ArtifactDir     string
+    CostUSD         float64
+    ErrorMsg        string
+    SessionID       string   // persisted: claude --session-id for resume
+    ResumeSessionID string   // in-memory only: set when creating a resume execution
+    ResumeAnswer    string   // in-memory only: human answer forwarded to agent
+}
+```
+
+| Field             | Persisted | Description                                                           |
+|-------------------|-----------|-----------------------------------------------------------------------|
+| `ID`              | Yes       | UUID; primary key.                                                    |
+| `TaskID`          | Yes       | ID of the owning task.                                                |
+| `StartTime`       | Yes       | When execution started (UTC).                                         |
+| `EndTime`         | Yes       | When execution ended (UTC); zero while still running.                 |
+| `ExitCode`        | Yes       | Process exit code (0 = success).                                      |
+| `Status`          | Yes       | Status string at completion.                                          |
+| `StdoutPath`      | Yes       | Absolute path to the stdout log file.                                 |
+| `StderrPath`      | Yes       | Absolute path to the stderr log file.                                 |
+| `ArtifactDir`     | Yes       | Directory containing output artifacts produced by the agent.         |
+| `CostUSD`         | Yes       | Total API spend for this run in USD.                                  |
+| `ErrorMsg`        | Yes       | Error description if the run failed.                                  |
+| `SessionID`       | Yes       | Claude session ID; stored to allow resuming an interrupted session.   |
+| `ResumeSessionID` | No        | Session to resume; set in-memory when creating a resume execution.    |
+| `ResumeAnswer`    | No        | Human answer to a blocked question; forwarded to the agent in-memory. |
+
+---
+
+## File Layout
+
+All runtime data lives under `~/.claudomator/`:
+
+```
+~/.claudomator/
+├── claudomator.db          # SQLite database (WAL mode)
+└── executions/             # Execution log files
+    ├── <execution-id>.stdout
+    └── <execution-id>.stderr
+```
+
+The paths are configured by `internal/config`:
+
+| Config field | Value                                |
+|--------------|--------------------------------------|
+| `DBPath`     | `~/.claudomator/claudomator.db`      |
+| `LogDir`     | `~/.claudomator/executions/`         |
+
+Both directories are created automatically on first run if they do not exist.
diff --git a/docs/packages/task.md b/docs/packages/task.md
new file mode 100644
index 0000000..3502f8a
--- /dev/null
+++ b/docs/packages/task.md
@@ -0,0 +1,332 @@
+# Package: `internal/task`
+
+## Overview
+
+The `task` package defines the core domain model for Claudomator. A **Task** represents a discrete unit of work to be executed by a Claude agent — described by natural-language instructions, optional budget and timeout constraints, retry policy, and scheduling metadata. Tasks are authored as YAML files and progress through a well-defined lifecycle from `PENDING` to a terminal state.
+
+---
+
+## Task Struct
+
+```go
+type Task struct {
+    ID               string      `yaml:"id"`
+    ParentTaskID     string      `yaml:"parent_task_id"`
+    Name             string      `yaml:"name"`
+    Description      string      `yaml:"description"`
+    Agent            AgentConfig `yaml:"agent"`
+    Timeout          Duration    `yaml:"timeout"`
+    Retry            RetryConfig `yaml:"retry"`
+    Priority         Priority    `yaml:"priority"`
+    Tags             []string    `yaml:"tags"`
+    DependsOn        []string    `yaml:"depends_on"`
+    State            State       `yaml:"-"`
+    RejectionComment string      `yaml:"-"`
+    QuestionJSON     string      `yaml:"-"`
+    CreatedAt        time.Time   `yaml:"-"`
+    UpdatedAt        time.Time   `yaml:"-"`
+}
+```
+
+| Field              | Type          | Purpose                                                              | Default / Notes                          |
+|--------------------|---------------|----------------------------------------------------------------------|------------------------------------------|
+| `ID`               | `string`      | Unique identifier (UUID).                                            | Auto-generated by `Parse` if empty.      |
+| `ParentTaskID`     | `string`      | ID of the parent task; empty for top-level tasks.                   | Optional.                                |
+| `Name`             | `string`      | Human-readable name. **Required.**                                   | Must be non-empty.                       |
+| `Description`      | `string`      | Longer explanation shown in the UI.                                  | Optional.                                |
+| `Agent`            | `AgentConfig` | Configuration passed to the Claude agent executor.                  | See [AgentConfig](#agentconfig-struct).  |
+| `Timeout`          | `Duration`    | Maximum wall-clock time for a single execution attempt.              | `0` means no limit.                      |
+| `Retry`            | `RetryConfig` | Retry policy applied when a run fails.                               | See [RetryConfig](#retryconfig-struct).  |
+| `Priority`         | `Priority`    | Scheduling priority: `high`, `normal`, or `low`.                    | Default: `normal`.                       |
+| `Tags`             | `[]string`    | Arbitrary labels for filtering and grouping.                         | Optional; defaults to `[]`.              |
+| `DependsOn`        | `[]string`    | IDs of tasks that must reach `COMPLETED` before this one is queued. | Optional; defaults to `[]`.              |
+| `State`            | `State`       | Current lifecycle state. Not read from YAML (`yaml:"-"`).           | Set to `PENDING` by `Parse`.             |
+| `RejectionComment` | `string`      | Free-text comment left by a reviewer when rejecting a task.         | Not in YAML; populated by storage layer. |
+| `QuestionJSON`     | `string`      | JSON-encoded question posted by the agent while `BLOCKED`.          | Not in YAML; populated by storage layer. |
+| `CreatedAt`        | `time.Time`   | Timestamp set when the task is parsed.                               | Auto-set by `Parse`.                     |
+| `UpdatedAt`        | `time.Time`   | Timestamp updated on every state change.                             | Auto-set by `Parse`; updated by DB.      |
+
+Fields tagged `yaml:"-"` are runtime-only and are never parsed from YAML task files.
+
+---
+
+## AgentConfig Struct
+
+```go
+type AgentConfig struct {
+    Type               string   `yaml:"type"`
+    Model              string   `yaml:"model"`
+    ContextFiles       []string `yaml:"context_files"`
+    Instructions       string   `yaml:"instructions"`
+    ProjectDir         string   `yaml:"project_dir"`
+    MaxBudgetUSD       float64  `yaml:"max_budget_usd"`
+    PermissionMode     string   `yaml:"permission_mode"`
+    AllowedTools       []string `yaml:"allowed_tools"`
+    DisallowedTools    []string `yaml:"disallowed_tools"`
+    SystemPromptAppend string   `yaml:"system_prompt_append"`
+    AdditionalArgs     []string `yaml:"additional_args"`
+    SkipPlanning       bool     `yaml:"skip_planning"`
+}
+```
+
+| Field                | Type       | Purpose                                                                                                                         |
+|----------------------|------------|---------------------------------------------------------------------------------------------------------------------------------|
+| `Type`               | `string`   | Agent type classifier. Cleared on retry by `ResetTaskForRetry` so the task can be re-classified.                               |
+| `Model`              | `string`   | Claude model ID (e.g. `"claude-sonnet-4-6"`). Cleared on retry alongside `Type`.                                               |
+| `ContextFiles`       | `[]string` | Paths to files pre-loaded as context for the agent.                                                                             |
+| `Instructions`       | `string`   | Natural-language task instructions. **Required.**                                                                               |
+| `ProjectDir`         | `string`   | Working directory the agent operates in.                                                                                        |
+| `MaxBudgetUSD`       | `float64`  | API spend cap in USD. `0` means no limit. Must be ≥ 0.                                                                         |
+| `PermissionMode`     | `string`   | Claude permission mode. Valid values: `default`, `acceptEdits`, `bypassPermissions`, `plan`, `dontAsk`, `delegate`.            |
+| `AllowedTools`       | `[]string` | Explicit tool whitelist passed to the agent.                                                                                    |
+| `DisallowedTools`    | `[]string` | Explicit tool blacklist passed to the agent.                                                                                    |
+| `SystemPromptAppend` | `string`   | Text appended to the Claude system prompt.                                                                                      |
+| `AdditionalArgs`     | `[]string` | Extra CLI arguments forwarded verbatim to the Claude executable.                                                                |
+| `SkipPlanning`       | `bool`     | When `true`, the agent skips the planning phase.                                                                                |
+
+---
+
+## RetryConfig Struct
+
+```go
+type RetryConfig struct {
+    MaxAttempts int    `yaml:"max_attempts"`
+    Backoff     string `yaml:"backoff"` // "linear" or "exponential"
+}
+```
+
+| Field         | Type     | Purpose                                                    | Default         |
+|---------------|----------|------------------------------------------------------------|-----------------|
+| `MaxAttempts` | `int`    | Total number of attempts (first run + retries). Min: 1.   | `1`             |
+| `Backoff`     | `string` | Wait strategy between retries: `linear` or `exponential`. | `"exponential"` |
+
+---
+
+## YAML Task File Format
+
+A task file may contain a single task or a batch (see next section). Every supported field is shown below with annotations.
+
+```yaml
+# Unique identifier. Omit to have Claudomator generate a UUID.
+id: "fix-login-bug"
+
+# Display name. Required.
+name: "Fix login redirect bug"
+
+# Optional longer description.
+description: "Users are redirected to /home instead of /dashboard after login."
+
+agent:
+  # Agent type for routing/classification. Optional; auto-assigned if blank.
+  type: "claude"
+
+  # Claude model ID. Optional; auto-assigned if blank.
+  model: "claude-sonnet-4-6"
+
+  # Task instructions passed to the agent. Required.
+  instructions: |
+    Fix the post-login redirect in src/auth/login.go so that users are
+    sent to /dashboard instead of /home. Add a regression test.
+
+  # Working directory for the agent process.
+  project_dir: "/workspace/myapp"
+
+  # Files loaded into the agent's context before execution starts.
+  context_files:
+    - "src/auth/login.go"
+    - "docs/adr/0003-auth.md"
+
+  # USD spending cap. 0 = no limit.
+  max_budget_usd: 1.00
+
+  # Claude permission mode.
+  # Values: default | acceptEdits | bypassPermissions | plan | dontAsk | delegate
+  permission_mode: "acceptEdits"
+
+  # Tool whitelist. Empty = all tools allowed.
+  allowed_tools:
+    - "Read"
+    - "Edit"
+    - "Bash"
+
+  # Tool blacklist.
+  disallowed_tools:
+    - "WebFetch"
+
+  # Appended to the agent's system prompt.
+  system_prompt_append: "Always write tests before implementation."
+
+  # Extra arguments forwarded verbatim to the Claude executable.
+  additional_args:
+    - "--verbose"
+
+  # Skip the planning phase.
+  skip_planning: false
+
+# Maximum wall-clock time per attempt. Uses Go duration syntax: "30m", "1h", "45s".
+# 0 or omitted means no limit.
+timeout: "30m"
+
+retry:
+  # Total attempts including the first run. Must be >= 1.
+  max_attempts: 3
+  # "linear" or "exponential"
+  backoff: "exponential"
+
+# Scheduling priority: high | normal | low
+priority: "normal"
+
+# Arbitrary labels for filtering.
+tags:
+  - "bug"
+  - "auth"
+
+# IDs of tasks that must reach COMPLETED before this task is queued.
+depends_on:
+  - "setup-test-db"
+```
+
+---
+
+## Batch File Format
+
+A batch file wraps multiple task definitions under a `tasks:` key. All top-level task fields are supported per entry.
+
+```yaml
+tasks:
+  - name: "Step 1 — scaffold"
+    agent:
+      instructions: "Create the initial project structure."
+    priority: "high"
+
+  - name: "Step 2 — implement"
+    agent:
+      instructions: "Implement the feature described in docs/feature.md."
+    depends_on:
+      - "step-1-id"
+
+  - name: "Step 3 — test"
+    agent:
+      instructions: "Write and run integration tests."
+    depends_on:
+      - "step-2-id"
+    retry:
+      max_attempts: 2
+      backoff: "linear"
+```
+
+`ParseFile` / `Parse` detect the batch format automatically: if unmarshaling into `BatchFile` succeeds and produces at least one task, the batch path is used; otherwise single-task parsing is attempted.
+
+---
+
+## State Constants
+
+```go
+type State string
+
+const (
+    StatePending        State = "PENDING"
+    StateQueued         State = "QUEUED"
+    StateRunning        State = "RUNNING"
+    StateReady          State = "READY"
+    StateCompleted      State = "COMPLETED"
+    StateFailed         State = "FAILED"
+    StateTimedOut       State = "TIMED_OUT"
+    StateCancelled      State = "CANCELLED"
+    StateBudgetExceeded State = "BUDGET_EXCEEDED"
+    StateBlocked        State = "BLOCKED"
+)
+```
+
+| Constant              | String value      | Description                                                                                           |
+|-----------------------|-------------------|-------------------------------------------------------------------------------------------------------|
+| `StatePending`        | `PENDING`         | Newly created; awaiting acceptance for scheduling.                                                    |
+| `StateQueued`         | `QUEUED`          | Accepted by the scheduler and waiting for an available executor.                                      |
+| `StateRunning`        | `RUNNING`         | An agent is actively executing the task.                                                              |
+| `StateReady`          | `READY`           | Agent finished execution; output is awaiting human review/approval.                                   |
+| `StateCompleted`      | `COMPLETED`       | Approved and finished. **Terminal** — no further transitions.                                         |
+| `StateFailed`         | `FAILED`          | Execution failed (non-zero exit or unrecoverable error). Eligible for retry → `QUEUED`.              |
+| `StateTimedOut`       | `TIMED_OUT`       | Execution exceeded the configured `timeout`. Eligible for retry or resume → `QUEUED`.                |
+| `StateCancelled`      | `CANCELLED`       | Manually cancelled. Can be restarted by transitioning back to `QUEUED`.                               |
+| `StateBudgetExceeded` | `BUDGET_EXCEEDED` | The `max_budget_usd` cap was reached during execution. Eligible for retry → `QUEUED`.                |
+| `StateBlocked`        | `BLOCKED`         | The running agent posted a question requiring a human response before it can continue.                |
+
+---
+
+## State Machine
+
+### Valid Transitions
+
+| From                | To                  | Condition                                                         |
+|---------------------|---------------------|-------------------------------------------------------------------|
+| `PENDING`           | `QUEUED`            | Task accepted for scheduling.                                     |
+| `PENDING`           | `CANCELLED`         | Manually cancelled before queuing.                                |
+| `QUEUED`            | `RUNNING`           | An executor picks up the task.                                    |
+| `QUEUED`            | `CANCELLED`         | Manually cancelled while waiting in queue.                        |
+| `RUNNING`           | `READY`             | Agent finished; output ready for human review.                    |
+| `RUNNING`           | `COMPLETED`         | Agent finished successfully with no review required.              |
+| `RUNNING`           | `FAILED`            | Agent exited with an error.                                       |
+| `RUNNING`           | `TIMED_OUT`         | Wall-clock timeout exceeded.                                      |
+| `RUNNING`           | `CANCELLED`         | Manually cancelled mid-execution.                                 |
+| `RUNNING`           | `BUDGET_EXCEEDED`   | API spend reached `max_budget_usd`.                               |
+| `RUNNING`           | `BLOCKED`           | Agent posted a question requiring human input before continuing.  |
+| `READY`             | `COMPLETED`         | Reviewer approves the output.                                     |
+| `READY`             | `PENDING`           | Reviewer rejects the output; task sent back for revision.         |
+| `FAILED`            | `QUEUED`            | Retry: task re-queued for another attempt.                        |
+| `TIMED_OUT`         | `QUEUED`            | Retry or resume: task re-queued.                                  |
+| `CANCELLED`         | `QUEUED`            | Restart: task re-queued from scratch.                             |
+| `BUDGET_EXCEEDED`   | `QUEUED`            | Retry with adjusted budget.                                       |
+| `BLOCKED`           | `QUEUED`            | Answer provided; task re-queued to resume.                        |
+| `BLOCKED`           | `READY`             | Question resolved; output is ready for review.                    |
+| `COMPLETED`         | *(none)*            | Terminal state — no outgoing transitions.                         |
+
+---
+
+## Key Functions
+
+### `ParseFile`
+
+```go
+func ParseFile(path string) ([]Task, error)
+```
+
+Reads a YAML file at `path` and returns the parsed tasks. Supports both single-task and batch formats. Initialises defaults (UUID, priority, retry policy, initial state, timestamps) on every returned task.
+
+### `Parse`
+
+```go
+func Parse(data []byte) ([]Task, error)
+```
+
+Same as `ParseFile` but operates on raw YAML bytes instead of a file path.
+
+### `ValidTransition`
+
+```go
+func ValidTransition(from, to State) bool
+```
+
+Returns `true` if transitioning from `from` to `to` is permitted by the state machine. Used by `storage.DB.UpdateTaskState` to enforce valid transitions inside a transaction.
+
+### `Validate`
+
+```go
+func Validate(t *Task) error
+```
+
+Checks a task for constraint violations. Returns a `*ValidationError` (implementing `error`) listing all violations, or `nil` if the task is valid. All violations are collected before returning so callers receive the full list at once.
+
+---
+
+## Validation Rules
+
+| Rule                                                                         | Error message                                                      |
+|------------------------------------------------------------------------------|--------------------------------------------------------------------|
+| `name` must be non-empty.                                                    | `name is required`                                                 |
+| `agent.instructions` must be non-empty.                                      | `agent.instructions is required`                                   |
+| `agent.max_budget_usd` must be ≥ 0.                                          | `agent.max_budget_usd must be non-negative`                        |
+| `timeout` must be ≥ 0.                                                       | `timeout must be non-negative`                                     |
+| `retry.max_attempts` must be ≥ 1.                                            | `retry.max_attempts must be at least 1`                            |
+| `retry.backoff`, if set, must be `linear` or `exponential`.                  | `retry.backoff must be 'linear' or 'exponential'`                  |
+| `priority`, if set, must be `high`, `normal`, or `low`.                      | `invalid priority "…"; must be high, normal, or low`               |
+| `agent.permission_mode`, if set, must be one of the six recognised values.   | `invalid permission_mode "…"`                                      |