diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/RAW_NARRATIVE.md | 390 | ||||
| -rw-r--r-- | docs/adr/002-task-state-machine.md | 4 | ||||
| -rw-r--r-- | docs/adr/004-multi-agent-routing-and-classification.md | 107 | ||||
| -rw-r--r-- | docs/adr/005-sandbox-execution-model.md | 130 | ||||
| -rw-r--r-- | docs/architecture.md | 392 | ||||
| -rw-r--r-- | docs/packages/executor.md | 447 | ||||
| -rw-r--r-- | docs/packages/storage.md | 209 | ||||
| -rw-r--r-- | docs/packages/task.md | 332 |
8 files changed, 2010 insertions, 1 deletions
diff --git a/docs/RAW_NARRATIVE.md b/docs/RAW_NARRATIVE.md new file mode 100644 index 0000000..a9c1492 --- /dev/null +++ b/docs/RAW_NARRATIVE.md @@ -0,0 +1,390 @@ +# Claudomator: Development Narrative + +This document is a chronological engineering history of the Claudomator project, +reconstructed from the git log, ADRs, and source code. + +--- + +## 1. Initial commit — core scaffolding (2e2b218) + +The project started with a single commit that established the full skeleton: +task model, executor, API server, CLI, storage layer, and reporter. The Go module +was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a +`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model, +instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was +chosen as the storage backend (see ADR-001). The executor pool used a bounded +goroutine model. The API server was plain `net/http` with no external framework. +The CLI was Cobra. + +## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d) + +Early housekeeping: added JSON struct tags to all exported types, renamed the Go +module to its final identifier, and set up the `.gitignore` to exclude the compiled +binary and local Claude settings. + +## 3. Verbose flag, logs CLI command (0377c06, f27d4f7) + +Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand +for tailing execution output. + +## 4. Embedded web UI and HTTP wiring (135d8eb) + +The first web UI was embedded into the binary using `go:embed`. This made the +binary fully self-contained: no separate static file server was needed. + +## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007) + +Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to +the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the +`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational +subtask plumbing. + +## 6. Dependency waiting and planning preamble (f527972) + +The executor gained dependency waiting: tasks with `depends_on` now block in a +polling loop until all dependencies reach `COMPLETED`. Any dependency entering a +terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`) +immediately fails the waiting task. + +The planning preamble was also introduced here — a system prompt prefix injected +into every task's instructions that explains to the agent how to write question +files, how to break tasks into subtasks via the `claudomator` CLI, and how to +commit all changes in git sandboxes. + +## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740) + +The API gained several new endpoints: +- `POST /api/elaborate` — calls Claude to expand a brief task description into + structured YAML. +- `GET /api/executions/{id}/stream` — live-streams the execution log. +- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed). +- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task. + +## 8. Web UI: tabs, new task modal, templates panel (e8d1b80) + +The web UI got a tabbed layout (Running / Done / Templates), a modal for creating +new tasks with AI-drafted instructions, and a templates panel. This was the first +version of the UI that matched the current design. + +## 9. READY state for human-in-the-loop review (6511d6e) + +A critical design point: when a top-level task's runner exits successfully, the +task does not immediately go to `COMPLETED`. Instead it transitions to `READY`, +meaning it paused for the operator to review the agent's output and explicitly +accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`. +`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`. + +This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY +and go directly to `COMPLETED` — only the root task requires human sign-off. + +## 10. Fix working_dir failures, hardcoded /root removed (3962597) + +Early deployments hardcoded `/root` as the base path for `working_dir`. This was +removed. `working_dir` is now validated to exist before the subprocess starts. + +## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4) + +Added the `scripts/` directory with `debug-execution` (inspects a specific +execution's logs) and `deploy` (builds and deploys the binary to the production +server). Added a CLI `start` command and the `version` package. + +## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444) + +A batch of features rescued from a detached-work branch: +- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json` + file before exiting. The pool detects this and transitions the task to `BLOCKED`, + storing the question for the user. `POST /api/tasks/{id}/answer` resumes the + Claude session with the user's answer injected as the next message. +- **Rate limiting**: the pool tracks which agents are rate-limited and when. + `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from + 5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline. +- **Start-next-task script**: a shell script that picks the highest-priority pending + task and starts it. + +## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3) + +The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state +and a "Start Next" button in the header that triggers the `start-next-task` script. + +## 14. Stream-level failure detection when claude exits 0 (4c0ee5c) + +Claude can exit 0 even when the task actually failed — for example when the +permission mode denies a tool_use and Claude exits politely. `parseStream` was +updated to detect `is_error: true` in the result message and +`tool_result.is_error: true` with permission-denial text, returning an error in +both cases so the task goes to `FAILED` rather than silently succeeding. + +## 15. Persist log paths at CreateExecution time (f8b5f25) + +Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the +execution record at `UpdateExecution` time (after the subprocess finished). This +prevented live log tailing. Introduced the `LogPather` interface: runners that +implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling +`CreateExecution`, making them available for streaming before the process ends. + +## 16. bypassPermissions as executor default (a33211d) + +`permission_mode` defaults to `bypassPermissions` when not set in the task YAML. +This was a deliberate trade-off: unattended automation needs to proceed without +tool-use confirmation prompts. Operators can override per-task via `permission_mode`. + +## 17. Cancel endpoint and pool cancel mechanism (3672981) + +`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map +from taskID to context cancel functions. Cancellation sends a SIGKILL to the +entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and +bash children that the claude subprocess spawned. + +## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace) + +The full BLOCKED cycle was wired end-to-end: +1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits. +2. Runner detects the file and returns `*BlockedError`. +3. Pool transitions task to `BLOCKED` and stores the question JSON. +4. User answers via `POST /api/tasks/{id}/answer`. +5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID` + and `ResumeAnswer`. +6. Runner invokes `claude --resume <session-id> -p <answer>`. + +A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`, +causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution` +returned no session ID. + +## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c) + +Resume executions now use `context.Background()` instead of inheriting a potentially +stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled +tasks can be manually restarted. + +## 20. git sandbox execution, project_dir rename (1f36e23) + +The `working_dir` field was renamed to `project_dir` across all layers (task YAML, +storage, API, UI). When `project_dir` is set, the runner no longer executes +directly in that directory. Instead it: + +1. Detects whether `project_dir` is a git repo (initialising one if not). +2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks` + to avoid permission issues with mixed-owner `.git/objects`). +3. Runs the agent in the sandbox clone. +4. After the agent exits, verifies no uncommitted changes remain and pushes + new commits to the canonical bare repo. +5. Removes the sandbox. + +On BLOCKED, the sandbox is preserved so the agent can resume where it left off +in the same working tree. + +Concurrent push conflicts (two sandboxes pushing at the same time) are handled +by a fetch-rebase-retry sequence. + +## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2) + +`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If +the transition is not allowed by the state machine, the function returns an error +and no write occurs. This is the enforcement point for the state machine invariants. + +## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97) + +The previous pool rejected `Submit` when all slots were taken. This was replaced +with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted +while the pool is at capacity are buffered in the channel and picked up as soon +as a slot opens. `Submit` now only returns an error if the channel itself is full +(which requires an enormous backlog). + +## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376) + +Several API reliability improvements: +- WebSocket connections now require an API token (if `SetAPIToken` was called) and + are capped at a configurable maximum number of clients. A ping/pong keepalive + prevents stale connections from accumulating. +- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate` + endpoints to prevent abuse. +- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of + individual handlers per script, a single handler dispatches to registered scripts + by name. + +## 24. API: extend executions and log streaming endpoints (7914153) + +`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs` +was added for fetching completed log files. Live streaming via SSE and the log +tail endpoint were polished. + +## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6) + +CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared +HTTP client, a default server URL (`http://localhost:8484`). Added the `report` +CLI subcommand for fetching execution summaries from the server. + +## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822) + +This was a major refactor over several commits: +1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`, + `"gemini"`, etc.). +2. `Pool` was changed from holding a single `ClaudeRunner` to holding a + `map[string]Runner` — one runner per agent type. +3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the + `gemini` CLI. +4. The storage layer, API handlers, elaborate/validate endpoints, and all tests + were updated to use `AgentConfig`. +5. The web UI was updated to expose agent type selection. + +## 27. Gemini-based task classification and explicit load balancing (406247b) + +`Classifier` and `pickAgent` were introduced to automate agent and model selection: + +- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available + (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active + if all agents are rate-limited. +- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick + the best model for the task. This is intentionally model-picks-model: use a fast, + cheap classifier to avoid wasting expensive tokens. + +After this commit the flow is: `execute()` → pick agent → call classifier → set +`t.Agent.Type` and `t.Agent.Model` → dispatch to runner. + +## 28. ADR-003: Security Model (93a4c85) + +The security model was documented formally: no auth, permissive CORS, `bypassPermissions` +as default, and the known risk inventory (see `docs/adr/003-security-model.md`). + +## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30) + +Running tasks became the default view. A "Running view" showing currently running +tasks alongside the 24h execution history was added. Agent type and model were +surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h. + +## 30. Quota exhaustion detection from stream (076c0fa) + +Previously, quota exhaustion (the 5-hour usage limit) was treated identically to +generic failures. `isQuotaExhausted` was introduced to distinguish it: quota +exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the +agent, rather than failing the task with a generic error. + +## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac) + +The sandbox teardown strategy was revised: instead of pushing directly into the +working copy (which fails for non-bare repos), the sandbox pushes to a bare repo +(`remote "local"` or `remote "origin"`) and the working copy is pulled separately +by the developer. This avoids permission errors from mixed-owner `.git/objects`. +The `--no-hardlinks` clone flag was added to prevent object sharing. + +## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46) + +When a top-level task exits the runner successfully but has subtasks, it transitions +to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new +`maybeUnblockParent` function is called every time a subtask completes: if all +siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is +presented for operator review. + +## 33. Stale RUNNING task recovery on server startup (9159572) + +`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for +tasks still in `RUNNING` state (left over from a previous server crash) and marks +them `FAILED`, closing their open execution records. This prevents stuck tasks +after server restarts. + +## 34. API: configurable mockRunner, async error-path tests (b33566b) + +The `api` test suite was hardened with a configurable `mockRunner` that can be +injected into the test server. Async error paths (runner returns an error, DB +update fails mid-execution) were now exercised in tests. + +## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409) + +Several storage correctness fixes: +- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`, + `idx_executions_task_id`, and `idx_executions_start_time` indexes were added. +- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions; + tests were added to catch this. +- `DeleteTask` was made atomic using a recursive CTE to delete the task and all + its subtasks in a single transaction. + +## 36. API: validate ?state= param, standardize operation response shapes (933af81) + +`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation +responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised +to return `{"status": "ok"}` on success. + +## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c) + +Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through +classification again so they pick up the latest agent/model selection logic. The +post-run error classification block was extracted into `handleRunResult` — a shared +helper called by both `execute` and `executeResume` — eliminating 60+ lines of +duplication. + +## 38. Legacy Claude field removed (b4371d0, a782bbf) + +The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir` +shim were removed. The schema is now fully generic. + +## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e) + +A documented invariant was added to the `execOnce` goroutine that kills the +subprocess process group: it cannot block indefinitely. Tests were added to verify +no goroutine leak occurs when a task is cancelled. + +## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b) + +The classifier now skips calling itself if the selected agent is rate-limited, +avoiding a redundant Gemini API call when the rate-limited agent is already known. +The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`, +`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`, +`gemini-2.5-flash`, `gemini-2.5-pro`). + +## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b) + +Two map leak bugs were fixed in the pool: +- `activePerAgent[agentType]` was decremented but never deleted when the count hit + zero, so inactive agents accumulated as dead entries. +- Expired `rateLimited[agentType]` entries were not deleted, so the map grew + unboundedly over long runs. + +## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624) + +The sandbox teardown removed the `git pull` into the working copy (which was failing +due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to +detect `"fetch first"` and `"non-fast-forward"` as the rejection signals. + +## 43. Explicit load balancing separated from classification (e033504) + +Previously the `Classifier` both picked the agent and selected the model. This was +split: `pickAgent` is deterministic code that picks the agent from the registered +runners using the load-balancing algorithm. The `Classifier` only picks the model +for the already-selected agent. This makes load balancing reliable and fast even +when the Gemini classifier is unavailable. + +## 44. Session ID fix on second block-and-resume cycle (65c7638) + +A bug was found where the second BLOCKED→answer→resume cycle passed the wrong +`--resume` session ID to Claude. The fix ensures that resume executions propagate +the original session ID rather than the new execution's UUID. + +## 45. validTransitions promoted to package-level var (3226af3) + +`validTransitions` was promoted to a package-level variable in `internal/task/task.go` +for clarity and potential reuse outside the package. ADR-002 was updated to reflect +the current state machine including the `BLOCKED→READY` transition for parent tasks. + +--- + +## Feature Summary (current state) + +| Feature | Status | +|---|---| +| Task YAML parsing, batch files | Done | +| SQLite persistence | Done | +| REST API (CRUD + lifecycle) | Done | +| WebSocket real-time events | Done | +| Claude subprocess execution | Done | +| Gemini subprocess execution | Done | +| Explicit load balancing (pickAgent) | Done | +| Gemini-based model classification | Done | +| BLOCKED / question-answer / resume | Done | +| git sandbox isolation | Done | +| Subtask creation and unblocking | Done | +| READY state / human accept-reject | Done | +| Rate-limit and quota tracking | Done | +| Stale RUNNING recovery on startup | Done | +| Per-IP rate limiter on elaborate | Done | +| Web UI (PWA) | Done | +| Push notifications (PWA) | Planned | diff --git a/docs/adr/002-task-state-machine.md b/docs/adr/002-task-state-machine.md index 1d41619..310c337 100644 --- a/docs/adr/002-task-state-machine.md +++ b/docs/adr/002-task-state-machine.md @@ -23,7 +23,7 @@ execution (subprocess), user interaction (review, Q&A), retries, and cancellatio | `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` (terminal) | | `BLOCKED` | Agent paused and wrote a question file; awaiting user answer | -Terminal states with no outgoing transitions: `COMPLETED`, `CANCELLED`, `BUDGET_EXCEEDED`. +True terminal state (no outgoing transitions): `COMPLETED`. All other non-success states (`CANCELLED`, `FAILED`, `TIMED_OUT`, `BUDGET_EXCEEDED`) may transition back to `QUEUED` to restart or retry. ## State Transition Diagram @@ -77,6 +77,8 @@ Terminal states with no outgoing transitions: `COMPLETED`, `CANCELLED`, `BUDGET_ | `READY` | `PENDING` | `POST /api/tasks/{id}/reject` (with optional comment) | | `FAILED` | `QUEUED` | Retry (manual re-run via `POST /api/tasks/{id}/run`) | | `TIMED_OUT` | `QUEUED` | `POST /api/tasks/{id}/resume` (resumes with session ID) | +| `CANCELLED` | `QUEUED` | Restart (manual re-run via `POST /api/tasks/{id}/run`) | +| `BUDGET_EXCEEDED` | `QUEUED` | Retry (manual re-run via `POST /api/tasks/{id}/run`) | | `BLOCKED` | `QUEUED` | `POST /api/tasks/{id}/answer` (resumes with user answer) | | `BLOCKED` | `READY` | All subtasks reached `COMPLETED` (parent task unblocked by subtask completion watcher) | diff --git a/docs/adr/004-multi-agent-routing-and-classification.md b/docs/adr/004-multi-agent-routing-and-classification.md new file mode 100644 index 0000000..7afb10d --- /dev/null +++ b/docs/adr/004-multi-agent-routing-and-classification.md @@ -0,0 +1,107 @@ +# ADR-004: Multi-Agent Routing and Gemini-Based Classification + +## Status +Accepted + +## Context + +Claudomator started as a Claude-only system. As Gemini became a viable coding +agent, the architecture needed to support multiple agent backends without requiring +operators to manually select an agent or model for each task. + +Two distinct problems needed solving: + +1. **Which agent should run this task?** — Claude and Gemini have different API + quotas and rate limits. When Claude is rate-limited, tasks should flow to + Gemini automatically. +2. **Which model tier should the agent use?** — Both agents offer a spectrum from + fast/cheap to slow/powerful models. Using the wrong tier wastes money or + produces inferior results. + +## Decision + +The two problems are solved independently: + +### Agent selection: explicit load balancing in code (`pickAgent`) + +`pickAgent(SystemStatus)` selects the agent with the fewest active tasks, +preferring non-rate-limited agents. The algorithm is: + +1. First pass: consider only non-rate-limited agents; pick the one with the + fewest active tasks (alphabetical tie-break for determinism). +2. Fallback: if all agents are rate-limited, pick the least-active regardless + of rate-limit status. + +This is deterministic code, not an AI call. It runs in-process with no I/O and +cannot fail in ways that would block task execution. + +### Model selection: Gemini-based classifier (`Classifier`) + +Once the agent is selected, `Classifier.Classify` invokes the Gemini CLI with +`gemini-2.5-flash-lite` to select the best model tier for the task. The classifier +receives the task name, instructions, and the required agent type, and returns +a `Classification` with `agent_type`, `model`, and `reason`. + +The classifier uses a cheap, fast model for classification to minimise the cost +overhead. The response is parsed from JSON, with fallback handling for markdown +code blocks and credential noise in the output. + +### Separation of concerns + +These two decisions were initially merged (the classifier picked both agent and +model). They were separated in commit `e033504` because: + +- Load balancing must be reliable even when the Gemini API is unavailable. +- Classifier failures are non-fatal: if classification fails, the pool logs the + error and proceeds with the agent's default model. + +### Re-classification on manual restart + +When an operator manually restarts a task from a non-`QUEUED` state (e.g. `FAILED`, +`CANCELLED`), the task goes through `execute()` again and is re-classified. This +ensures restarts pick up any changes to agent availability or rate-limit status. + +## Rationale + +- **AI-picks-model**: the model selection decision is genuinely complex and + subjective. Using an AI classifier avoids hardcoding heuristics that would need + constant tuning. +- **Code-picks-agent**: load balancing is a scheduling problem with measurable + inputs (active task counts, rate-limit deadlines). Delegating this to an AI + would introduce unnecessary non-determinism and latency. +- **Gemini for classification**: using Gemini to classify Claude tasks (and vice + versa) prevents circular dependencies. Using the cheapest available Gemini model + keeps classification cost negligible. + +## Alternatives Considered + +- **Operator always picks agent and model**: too much manual overhead. Operators + should be able to submit tasks without knowing which agent is currently + rate-limited. +- **Single classifier picks both agent and model**: rejected after operational + experience showed that load balancing needs to work even when the Gemini API + is unavailable or returning errors. +- **Round-robin agent selection**: simpler but does not account for rate limits + or imbalanced task durations. + +## Consequences + +- Agent selection is deterministic and testable without mocking AI APIs. +- Classification failures are logged but non-fatal; the task runs with the + agent's default model. +- The classifier adds ~1–2 seconds of latency to task start (one Gemini API call). +- Tasks with `agent.type` pre-set in YAML still go through load balancing; + `pickAgent` may override the requested type if the requested type is not a + registered runner. This is by design: the operator's type hint is overridden + by the load balancer to ensure tasks are always routable. + +## Relevant Code Locations + +| Concern | File | +|---|---| +| `pickAgent` | `internal/executor/executor.go` | +| `Classifier` | `internal/executor/classifier.go` | +| Load balancing in `execute()` | `internal/executor/executor.go` | +| Re-classification gate | `internal/api/server.go` (handleRunTask) | +| `pickAgent` tests | `internal/executor/executor_test.go` | +| `Classifier` mock test | `internal/executor/classifier_test.go` | diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md new file mode 100644 index 0000000..b374561 --- /dev/null +++ b/docs/adr/005-sandbox-execution-model.md @@ -0,0 +1,130 @@ +# ADR-005: Git Sandbox Execution Model + +## Status +Accepted + +## Context + +Tasks that modify source code need a safe execution environment. Running agents +directly in the canonical working copy creates several problems: + +1. **Concurrent corruption**: multiple agents running in the same directory stomp + on each other's changes. +2. **Partial work leaks**: if a task is cancelled mid-run, half-written files + remain in the working tree, blocking other work. +3. **No rollback**: a failed agent may leave the codebase in a broken state with + no clean way to recover without manual `git reset`. +4. **Audit trail**: changes made by an agent should be visible as discrete, + attributable git commits — not as an anonymous blob of working-tree noise. + +## Decision + +When a task has `agent.project_dir` set, `ClaudeRunner.Run` executes the agent +in an isolated git clone (a "sandbox") rather than in the project directory +directly. + +### Sandbox lifecycle + +``` +project_dir (canonical working copy) + | + | git clone --no-hardlinks <clone-source> /tmp/claudomator-sandbox-* + | + v +sandbox (temp clone) + | + | agent runs here; commits its changes + | + | git push (to bare repo "local" or "origin") + | +teardown: verify no uncommitted changes, remove sandbox dir +``` + +### Clone source + +`sandboxCloneSource` prefers a remote named `"local"` (a local bare repo). +If not present it falls back to the `"origin"` remote. Using a bare repo +accepts pushes cleanly; pushing to a non-bare working copy fails when the +receiving branch is checked out. + +### Uncommitted-change enforcement + +Before teardown, the runner runs `git status --porcelain` in the sandbox. +If any uncommitted changes are detected, the task is failed with an error +message listing the files. The sandbox is **preserved** so the operator +can inspect or recover the partial work. The error message includes the +sandbox path. + +### Concurrent push conflicts + +If two sandboxes try to push at the same time, the second push is rejected +(`"fetch first"` or `"non-fast-forward"` in the error output). The runner +detects these signals and performs a fetch → rebase → retry sequence, up to +a fixed retry limit, before giving up. + +### BLOCKED state and sandbox preservation + +When an agent exits with a `question.json` file (entering the `BLOCKED` +state), the sandbox is **not** torn down. The preserved sandbox allows the +resumed execution to pick up the same working tree state, including any +in-progress file changes made before the agent asked its question. + +Resume executions (`SubmitResume`) skip sandbox setup entirely and run +directly in `project_dir`, passing `--resume <session-id>` to the agent +so Claude can continue its previous conversation. + +### Session ID propagation on resume + +A subtle bug was found and fixed: when a resumed execution is itself blocked +again (a second BLOCKED→answer→resume cycle), the new execution record must +carry the **original** `ResumeSessionID`, not the new execution's own UUID. +If the wrong ID is used, `claude --resume` fails with "No conversation found". +The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as +`e.SessionID` rather than `e.ID`. + +## Rationale + +- **`--no-hardlinks`**: git defaults to hardlinking objects between clone and + source when both are on the same filesystem. This causes permission errors + when the source is owned by a different user (e.g. `www-data` vs. `root`). + The flag forces a full copy. +- **Bare repo for push target**: non-bare repos reject pushes to checked-out + branches. A bare repo (`git init --bare`) accepts all pushes safely. +- **Preserve sandbox on failure**: partial agent work may be valuable for + debugging or resumption. Destroying it immediately on failure was considered + and rejected. +- **Agent must commit**: requiring the agent to commit all changes before + exiting ensures git history is always the source of truth. The enforcement + check (uncommitted files → FAILED) makes this invariant observable. + +## Alternatives Considered + +- **Run directly in working copy**: rejected because of concurrent corruption + and partial-work leakage. +- **Copy files instead of git clone**: rejected because the agent needs a + working git history (for `git log`, `git blame`, and to push commits back). +- **Docker/container isolation**: considered for stronger isolation but + rejected due to operational complexity, dependency on container runtime, + and inability to use the host's claude/gemini credentials. + +## Consequences + +- Tasks without `project_dir` are unaffected; they run in whatever working + directory the server process inherited. +- If a sandbox's push repeatedly fails (e.g. due to a bare repo that is + itself broken), the task is failed with the sandbox preserved. +- If `/tmp` runs out of space (many large sandboxes), tasks will fail at + clone time. This is a known operational risk with no current mitigation. +- The `project_dir` field in task YAML must point to a git repository with + a configured `"local"` or `"origin"` remote that accepts pushes. + +## Relevant Code Locations + +| Concern | File | +|---|---| +| Sandbox setup/teardown | `internal/executor/claude.go` | +| `setupSandbox`, `teardownSandbox` | `internal/executor/claude.go` | +| `sandboxCloneSource` | `internal/executor/claude.go` | +| Resume skips sandbox | `internal/executor/claude.go` (Run) | +| Session ID propagation fix | `internal/executor/claude.go` (Run) | +| Sandbox tests | `internal/executor/claude_test.go` | diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..27c7601 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,392 @@ +# Claudomator — System Architecture + +## 1. System Purpose and Design Goals + +Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude, +Gemini), and reports results. Its primary use case is unattended, automated execution of agent +tasks with real-time status streaming to a mobile Progressive Web App. + +**Design goals:** + +- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite. +- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning. +- **Durability** — all task state survives server restarts (SQLite + WAL mode). +- **Real-time feedback** — WebSocket pushes task completion events to connected clients. +- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven + model-tier selection via a cheap Gemini classifier. +- **Git-isolated execution** — tasks that modify source code run in a temporary git clone + (sandbox) to prevent concurrent corruption and partial-work leakage. +- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before + reaching `COMPLETED`. + +See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale. + +--- + +## 2. High-Level Architecture + +```mermaid +flowchart TD + CLI["CLI\ncmd/claudomator\n(cobra)"] + API["HTTP API\ninternal/api\nREST + WebSocket"] + Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"] + Classifier["Gemini Classifier\ninternal/executor.Classifier"] + ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"] + GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"] + Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"] + Subprocess["AI subprocess\nclaude -p / gemini"] + SQLite["SQLite\ntasks.db"] + LogFiles["Log Files\n~/.claudomator/executions/"] + Hub["WebSocket Hub\ninternal/api.Hub"] + Clients["Clients\nbrowser / mobile PWA"] + Notifier["Notifier\ninternal/notify"] + + CLI -->|run / serve| API + CLI -->|direct YAML run| Pool + API -->|POST /api/tasks/run| Pool + Pool -->|pickAgent + Classify| Classifier + Pool -->|Submit / SubmitResume| ClaudeRunner + Pool -->|Submit / SubmitResume| GeminiRunner + ClaudeRunner -->|git clone| Sandbox + Sandbox -->|claude -p| Subprocess + GeminiRunner -->|gemini| Subprocess + Subprocess -->|stream-json stdout| LogFiles + Pool -->|UpdateTaskState| SQLite + Pool -->|resultCh| API + API -->|Broadcast| Hub + Hub -->|fan-out| Clients + API -->|Notify| Notifier +``` + +--- + +## 3. Component Table + +| Package | Role | Key Exported Types | +|---|---|---| +| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` | +| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` | +| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` | +| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` | +| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` | +| `internal/config` | TOML config loading; data-directory layout | `Config` | +| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` | +| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` | +| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) | +| `version` | Build-time version string | `Version` | + +--- + +## 4. Package Dependency Graph + +```mermaid +graph LR + cli["internal/cli"] + api["internal/api"] + executor["internal/executor"] + storage["internal/storage"] + task["internal/task"] + config["internal/config"] + reporter["internal/reporter"] + notify["internal/notify"] + web["web"] + version["version"] + + cli --> api + cli --> executor + cli --> storage + cli --> config + cli --> version + api --> executor + api --> storage + api --> task + api --> notify + api --> web + executor --> storage + executor --> task + reporter --> task + reporter --> storage + storage --> task +``` + +--- + +## 5. Task Execution Pipeline + +The following numbered steps trace a task from API submission to final state, with file and line +references to the key logic in each step. + +1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`. + Task is written to SQLite in `PENDING` state. + (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`) + +2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates + the `PENDING → QUEUED` transition) then `executor.Pool.Submit`. + (`internal/api/server.go:460`, `internal/executor/executor.go:125`) + +3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot + (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`. + (`internal/executor/executor.go:102`) + +4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest + active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes + `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal. + (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`) + +5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite + every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state). + (`internal/executor/executor.go:642`) + +6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING` + status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are + immediately available for tailing. + (`internal/executor/executor.go:483`) + +7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument + list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume + execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first. + (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`) + +8. **Output streaming** — stdout is written to `<LogDir>/<execID>/stdout.log`; the runner + concurrently parses the `stream-json` lines for cost and session ID. + (`internal/executor/claude.go`) + +9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error + type to a final task state and calls `storage.DB.UpdateTaskState`. + (`internal/executor/executor.go:256`) + + | Outcome | Final state | + |---|---| + | `runner.Run` → `nil`, top-level, no subtasks | `READY` | + | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` | + | `runner.Run` → `nil`, subtask | `COMPLETED` | + | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` | + | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` | + | `ctx.Err() == Canceled` | `CANCELLED` | + | quota exhausted | `BUDGET_EXCEEDED` | + | any other error | `FAILED` | + +10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults` + reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`. + (`internal/api/server.go:123`, `internal/api/server.go:129`) + +11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain, + `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task + fails and the sandbox is preserved for inspection. + (`internal/executor/claude.go:teardownSandbox`) + +12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or + `POST /api/tasks/{id}/reject` (`READY → PENDING`). + (`internal/api/server.go:487`, `internal/api/server.go:507`) + +--- + +## 6. Task State Machine + +```mermaid +stateDiagram-v2 + [*] --> PENDING : task created + + PENDING --> QUEUED : POST /run + PENDING --> CANCELLED : POST /cancel + + QUEUED --> RUNNING : pool goroutine starts + QUEUED --> CANCELLED : POST /cancel + + RUNNING --> READY : exit 0, top-level, no subtasks + RUNNING --> BLOCKED : exit 0, top-level, has subtasks + RUNNING --> BLOCKED : question.json written + RUNNING --> COMPLETED : exit 0, subtask + RUNNING --> FAILED : exit non-zero / stream error + RUNNING --> TIMED_OUT : context deadline exceeded + RUNNING --> CANCELLED : context cancelled + RUNNING --> BUDGET_EXCEEDED : quota exhausted + + READY --> COMPLETED : POST /accept + READY --> PENDING : POST /reject + + BLOCKED --> QUEUED : POST /answer + BLOCKED --> READY : all subtasks COMPLETED + + FAILED --> QUEUED : POST /run (retry) + TIMED_OUT --> QUEUED : POST /resume + CANCELLED --> QUEUED : POST /run (restart) + BUDGET_EXCEEDED --> QUEUED : POST /run (retry) + + COMPLETED --> [*] +``` + +**State definitions** (`internal/task/task.go:9`): + +| State | Meaning | +|---|---| +| `PENDING` | Created; not yet submitted for execution | +| `QUEUED` | Submitted to pool; waiting for a goroutine slot | +| `RUNNING` | Subprocess actively executing | +| `READY` | Top-level task done; awaiting operator accept/reject | +| `COMPLETED` | Fully done (only true terminal state) | +| `FAILED` | Execution error; eligible for retry | +| `TIMED_OUT` | Exceeded configured timeout; resumable | +| `CANCELLED` | Explicitly cancelled by operator | +| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` | +| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks | + +`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state +write. (`internal/task/task.go:113`) + +--- + +## 7. WebSocket Broadcast Flow + +```mermaid +sequenceDiagram + participant Runner as ClaudeRunner / GeminiRunner + participant Pool as executor.Pool + participant API as api.Server (forwardResults) + participant Hub as api.Hub + participant Client1 as WebSocket client 1 + participant Client2 as WebSocket client 2 + + Runner->>Pool: runner.Run() returns + Pool->>Pool: handleRunResult() — sets exec.Status + Pool->>SQLite: UpdateTaskState + UpdateExecution + Pool->>Pool: resultCh <- &Result{...} + API->>Pool: <-pool.Results() + API->>API: marshal task_completed JSON event + API->>Hub: hub.Broadcast(data) + Hub->>Client1: ws.Write(data) + Hub->>Client2: ws.Write(data) + API->>Notifier: notifier.Notify(Event{...}) [if set] +``` + +**Event payload** (JSON): +```json +{ + "type": "task_completed", + "task_id": "<uuid>", + "status": "READY | COMPLETED | FAILED | ...", + "exit_code": 0, + "cost_usd": 0.042, + "error": "", + "timestamp": "2026-03-11T12:00:00Z" +} +``` + +The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses +an interactive question tool (currently unused in the primary file-based `BLOCKED` flow). + +WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is +configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections. +(`internal/api/websocket.go`) + +--- + +## 8. Subtask / Parent-Task Dependency Resolution + +Claudomator supports two distinct dependency mechanisms: + +### 8a. `depends_on` — explicit task ordering + +Tasks declare `depends_on: [<task-id>, ...]` in their YAML or creation payload. When the +executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed +tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`, +`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED` +immediately. (`internal/executor/executor.go:642`) + +### 8b. Parent / subtask blocking (`BLOCKED` state) + +A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it) +transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows +an agent to dispatch subtasks and then wait for them. + +``` +Parent task runner exits 0 + ├── no subtasks → READY (normal review gate) + └── has subtasks → BLOCKED (waiting for subtasks) + │ + Each subtask completes → COMPLETED + │ + All subtasks COMPLETED? + ├── yes → maybeUnblockParent() → parent READY + └── no → parent stays BLOCKED +``` + +`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It +loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls +`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`) + +The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects +a `question.json` file in the execution log directory, it returns a `*BlockedError` containing +the question JSON and session ID. The pool stores the question in the `tasks.question_json` +column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer` +resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying +`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`, +`internal/api/server.go:221`) + +--- + +## 9. External Go Dependencies + +| Module | Version | Purpose | +|---|---|---| +| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) | +| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs | +| `github.com/spf13/cobra` | v1.10.2 | CLI command framework | +| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing | +| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server | +| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing | + +--- + +## 10. Related Documentation + +### Architectural Decision Records + +| ADR | Title | Summary | +|---|---|---| +| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale | +| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases | +| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist | +| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier | +| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation | + +### Package-Level Docs + +Per-package design notes live in [`docs/packages/`](packages/) (in progress). + +### Task YAML Reference + +```yaml +name: "My Task" +agent: + type: "claude" # "claude" | "gemini"; optional — load balancer may override + model: "sonnet" # model tier hint; optional — classifier may override + instructions: | + Do something useful. + project_dir: "/workspace/myproject" # if set, runs in a git sandbox + max_budget_usd: 1.00 + permission_mode: "bypassPermissions" # default + allowed_tools: ["Bash", "Read"] + context_files: ["README.md"] +timeout: "15m" +priority: "normal" # high | normal | low +tags: ["ci", "backend"] +depends_on: ["<task-uuid>"] # explicit ordering +parent_task_id: "<task-uuid>" # set by parent agent when creating subtasks +``` + +Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`. + +### Storage Schema + +Two tables auto-migrated on `storage.Open()`: + +- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`, + `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`, + `question_json`, `rejection_comment`, `created_at`, `updated_at` +- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`, + `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`, + `resume_session_id`, `resume_answer` + +Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`, +`executions.status`, `executions.start_time`. diff --git a/docs/packages/executor.md b/docs/packages/executor.md new file mode 100644 index 0000000..29d6a0c --- /dev/null +++ b/docs/packages/executor.md @@ -0,0 +1,447 @@ +# Package: executor + +`internal/executor` — Bounded goroutine pool that drives agent subprocesses. + +--- + +## 1. Overview + +The `executor` package is Claudomator's task-running engine. Its central type, `Pool`, maintains a bounded set of concurrent worker goroutines. Each goroutine picks a task from an internal work queue, selects the appropriate `Runner` implementation (Claude or Gemini), and launches the agent as an OS subprocess. When a subprocess finishes the pool classifies the outcome, updates storage, and emits a `Result` on a channel that the API layer consumes. + +Key responsibilities: + +- **Concurrency control** — never exceed `maxConcurrent` simultaneous workers. +- **Load-balanced agent selection** — prefer the agent type with the fewest active tasks; skip rate-limited agents. +- **Dynamic model selection** — the `Classifier` picks the cheapest model that fits the task's complexity. +- **Dependency gating** — poll storage until all `depends_on` tasks reach COMPLETED before starting a worker. +- **Question / block / resume cycle** — detect when an agent needs user input, park the task as BLOCKED, and re-run it with the user's answer when one arrives. +- **Rate-limit back-pressure** — exponential backoff within a run; per-agent cooldown across runs. +- **Sandbox isolation** — optional `project_dir` support clones the project into a temp directory, runs the agent there, then merges committed changes back. + +--- + +## 2. Pool + +### Struct fields + +```go +type Pool struct { + maxConcurrent int // global concurrency ceiling + runners map[string]Runner // "claude" → ClaudeRunner, "gemini" → GeminiRunner + store Store // storage interface (GetTask, UpdateTaskState, …) + logger *slog.Logger + depPollInterval time.Duration // how often waitForDependencies polls; default 5 s + + mu sync.Mutex + active int // number of goroutines currently in execute/executeResume + activePerAgent map[string]int // per-agent-type active count (used for load balancing) + rateLimited map[string]time.Time // agentType → cooldown deadline + cancels map[string]context.CancelFunc // taskID → cancellation function + + resultCh chan *Result // emits results; buffered at maxConcurrent*2 + workCh chan workItem // internal work queue; buffered at maxConcurrent*10+100 + doneCh chan struct{} // signals when a worker slot is freed; buffered at maxConcurrent + + Questions *QuestionRegistry + Classifier *Classifier +} +``` + +### Concurrency model + +The pool enforces a **global** ceiling (`maxConcurrent`) rather than per-agent-type limits. Within that ceiling, `activePerAgent` tracks how many goroutines are running for each agent type; this counter is the input to the load balancer, not an additional hard limit. + +A single long-lived `dispatch()` goroutine serialises the slot-allocation decision. It blocks on `doneCh` when all `maxConcurrent` slots are taken, then launches a worker goroutine as soon as any slot is freed. This prevents `Submit()` from blocking the caller while still honouring the concurrency ceiling. + +``` +workCh (buffered) ──► dispatch goroutine ──► execute goroutine (×N) + │ │ + └── waits on doneCh ◄─┘ (on exit) +``` + +### Dependency polling loop + +When a task has a non-empty `DependsOn` list, `waitForDependencies()` is called inside the worker goroutine (after agent selection but before subprocess launch). It polls `store.GetTask()` for each dependency at `depPollInterval` intervals (default 5 s). It returns: + +- `nil` — all dependencies reached COMPLETED. +- error — a dependency reached a terminal failure state (FAILED, TIMED_OUT, CANCELLED, BUDGET_EXCEEDED) or the context was cancelled. + +--- + +## 3. Pool.Submit() + +```go +func (p *Pool) Submit(ctx context.Context, t *task.Task) error +func (p *Pool) SubmitResume(ctx context.Context, t *task.Task, exec *storage.Execution) error +``` + +`Submit()` is non-blocking: it sends a `workItem` to `workCh`. If `workCh` is full (capacity `maxConcurrent*10+100`) it returns an error immediately rather than blocking the caller. + +`SubmitResume()` is used to re-queue a BLOCKED or TIMED_OUT task. The provided `exec` must have `ResumeSessionID` and `ResumeAnswer` set; the pool routes it through `executeResume()` instead of `execute()`, skipping agent selection and classification. + +**Goroutine lifecycle** (normal path via `execute()`): + +1. `dispatch` dequeues the `workItem` and waits for a free slot. +2. `dispatch` increments `p.active` under the mutex and launches `go execute(ctx, t)`. +3. `execute` selects the agent, optionally classifies the model, and increments `activePerAgent`. +4. `execute` waits for dependencies, creates the execution record in storage, applies a timeout, and calls `runner.Run()`. +5. On return, `execute` calls `handleRunResult()`, decrements `p.active` and `activePerAgent`, and sends a token to `doneCh`. +6. `dispatch` unblocks and processes the next `workItem`. + +--- + +## 4. Runner Interface + +```go +type Runner interface { + Run(ctx context.Context, t *task.Task, exec *storage.Execution) error +} + +// Optional extension — lets the pool persist log paths before execution starts. +type LogPather interface { + ExecLogDir(execID string) string +} +``` + +`Run()` is responsible for the full lifecycle of a single subprocess invocation, including log file management and cost extraction. It returns: + +- `nil` — task succeeded. +- `*BlockedError` — agent wrote a question file; pool transitions task to BLOCKED. +- any other error — pool classifies as FAILED, TIMED_OUT, CANCELLED, or BUDGET_EXCEEDED based on context and error text. + +The pool selects a runner by looking up `t.Agent.Type` in the `runners` map. If `t.Agent.Type` is empty it falls back to `"claude"`. + +--- + +## 5. ClaudeRunner + +```go +type ClaudeRunner struct { + BinaryPath string // defaults to "claude" + Logger *slog.Logger + LogDir string // base directory for per-execution log directories + APIURL string // injected as CLAUDOMATOR_API_URL env var +} +``` + +### Subprocess invocation + +`Run()` calls `buildArgs()` to construct the `claude` CLI invocation, then passes it through `runWithBackoff()` (up to 3 attempts on rate-limit errors, 5 s base delay, exponential backoff, capped at 5 min). + +**New execution flags:** + +| Flag | Value | +|---|---| +| `-p` | task instructions (optionally prefixed with planning preamble) | +| `--session-id` | execution UUID (doubles as the session UUID for resumability) | +| `--output-format stream-json` | machine-readable JSONL output | +| `--verbose` | include cost/token fields | +| `--model` | from `t.Agent.Model` (if set) | +| `--max-budget-usd` | from `t.Agent.MaxBudgetUSD` (if > 0) | +| `--permission-mode` | `bypassPermissions` by default; overridable per task | +| `--append-system-prompt` | from `t.Agent.SystemPromptAppend` | +| `--allowedTools` | repeated for each entry in `t.Agent.AllowedTools` | +| `--disallowedTools` | repeated for each entry in `t.Agent.DisallowedTools` | +| `--add-dir` | repeated for each entry in `t.Agent.ContextFiles` | + +**Resume execution flags** (when `exec.ResumeSessionID != ""`): + +| Flag | Value | +|---|---| +| `-p` | `exec.ResumeAnswer` (the user's answer) | +| `--resume` | `exec.ResumeSessionID` (original session to continue) | +| `--output-format stream-json` | | +| `--verbose` | | +| `--permission-mode` | same default/override as above | +| `--model` | from `t.Agent.Model` (if set) | + +**Environment variables** injected into every subprocess: + +``` +CLAUDOMATOR_API_URL — base URL for subtask creation and question answers +CLAUDOMATOR_TASK_ID — parent task ID (used in subtask POST bodies) +CLAUDOMATOR_QUESTION_FILE — path where the agent writes a question JSON +``` + +### Stream-JSON output parsing + +`execOnce()` attaches an `os.Pipe` to `cmd.Stdout` and spawns a goroutine that calls `parseStream()`. The goroutine tees the stream to the log file while scanning for JSONL events: + +| Event type | Action | +|---|---| +| `rate_limit_event` with `status: rejected` | sets `streamErr` to a rate-limit error | +| `assistant` with `error: rate_limit` | sets `streamErr` to a rate-limit error | +| `result` with `is_error: true` | sets `streamErr` to a task-failure error | +| `result` with `total_cost_usd` | captures total cost | +| `user` containing denied `tool_result` | sets `streamErr` to a permission-denial error | +| any message with `cost_usd` | updates running cost (legacy field) | + +`costUSD` and `streamErr` are read after `cmd.Wait()` returns and the goroutine drains. + +### Cost and token extraction + +`exec.CostUSD` is set from `total_cost_usd` (preferred) or `cost_usd` (legacy) in the stream. Token counts are not extracted at the runner level; they are available in the raw stdout log. + +### Project sandbox enforcement + +When `t.Agent.ProjectDir` is set on a new (non-resume) execution: + +1. `setupSandbox()` ensures `projectDir` is a git repository (initialising with an empty commit if needed), then `git clone --no-hardlinks` it into a `claudomator-sandbox-*` temp directory. It prefers a remote named `"local"` over `"origin"` as the clone source. +2. The subprocess runs with its working directory set to the sandbox. +3. On success, `teardownSandbox()` verifies the sandbox has no uncommitted changes, counts commits ahead of `origin/HEAD`, and pushes them to the bare repo. If the push is rejected due to a concurrent task, it fetches and rebases then retries once. +4. The sandbox is removed on success. On any failure the sandbox is preserved and its path is included in the error message. + +Resume executions skip sandboxing and run directly in `projectDir` so the agent can pick up its previous session state. + +### Question file mechanism + +The agent preamble instructs the agent to write a JSON file to `$CLAUDOMATOR_QUESTION_FILE` and exit immediately when it needs user input. After `execOnce()` returns successfully (exit code 0), `Run()` attempts to read that file: + +- If the file exists: it is consumed (`os.Remove`), and `Run()` returns `&BlockedError{QuestionJSON: ..., SessionID: e.SessionID}`. +- If the file is absent: `Run()` returns `nil` (success). + +--- + +## 6. GeminiRunner + +```go +type GeminiRunner struct { + BinaryPath string // defaults to "gemini" + Logger *slog.Logger + LogDir string + APIURL string +} +``` + +`GeminiRunner` follows the same pattern as `ClaudeRunner` with these differences: + +| Aspect | ClaudeRunner | GeminiRunner | +|---|---|---| +| Binary | `claude` | `gemini` | +| Flag structure | `claude -p <instructions> --session-id … --output-format stream-json …` | `gemini <instructions> --output-format stream-json …` (instructions are the first positional argument) | +| Session/resume | `--session-id` for new; `--resume` for resume | Session ID stored on `exec` but resume flag handling is a stub (not yet implemented) | +| Rate-limit retries | `runWithBackoff` (up to 3 retries) | Single attempt; no retry loop | +| Sandbox | Full clone + teardown + push | Not implemented; runs directly in `projectDir` | +| Stream parsing | `parseStream()` | `parseStream()` (shared implementation; assumes compatible JSONL format) | +| Question file | Same mechanism | Same mechanism | + +`GeminiRunner` implements `LogPather` identically to `ClaudeRunner`. + +--- + +## 7. Classifier + +```go +type Classifier struct { + GeminiBinaryPath string // defaults to "gemini" +} + +type Classification struct { + AgentType string `json:"agent_type"` + Model string `json:"model"` + Reason string `json:"reason"` +} + +type SystemStatus struct { + ActiveTasks map[string]int // agentType → number of active tasks + RateLimited map[string]bool // agentType → true if currently rate-limited +} +``` + +`Classify()` is called once per task, after load-balanced agent selection but before the subprocess starts. Its sole output is the `Model` field; `AgentType` is already fixed by the load balancer. + +**Decision factors baked into the prompt:** + +- The already-chosen agent type (Claude or Gemini). +- The task `Name` and `Instructions`. +- A hard-coded model tier list: + - Claude: `haiku-4-5` (cheap/fast) → `sonnet-4-6` (default) → `opus-4-6` (hard tasks only) + - Gemini: `gemini-2.5-flash-lite` (cheap) → `gemini-2.5-flash` (default) → `gemini-2.5-pro` (hard tasks only) + +**Implementation**: `Classify()` invokes `gemini --prompt <prompt> --model gemini-2.5-flash-lite --output-format json`. It then strips markdown code fences and "Loaded cached credentials" noise before unmarshalling the JSON response into a `Classification`. If classification fails, the pool logs the error and proceeds with whatever model was already on `t.Agent.Model`. + +--- + +## 8. QuestionRegistry + +```go +type QuestionRegistry struct { + mu sync.Mutex + questions map[string]*PendingQuestion // keyed by toolUseID +} + +type PendingQuestion struct { + TaskID string + ToolUseID string + Input json.RawMessage + AnswerCh chan string // buffered(1); closed when answer delivered +} +``` + +The registry tracks questions that are waiting for a human answer. It is populated via the stream-parsing path (`streamAndParseWithQuestions`) when an agent emits an `AskUserQuestion` tool_use event in the JSONL stream. + +| Method | Description | +|---|---| +| `Register(taskID, toolUseID, input)` | Adds a `PendingQuestion`; returns the buffered answer channel. | +| `Answer(toolUseID, answer)` | Delivers an answer, removes the entry, returns `false` if not found. | +| `Get(toolUseID)` | Returns the pending question or `nil`. | +| `PendingForTask(taskID)` | Returns all pending questions for a task (used by the API to list them). | +| `Remove(toolUseID)` | Removes without answering (used on task cancellation). | + +**How a blocked task is resumed:** + +1. The API receives a `POST /api/tasks/{id}/answer` with the user's answer. +2. It calls `QuestionRegistry.Answer(toolUseID, answer)`, which unblocks any goroutine waiting on `AnswerCh`. +3. Separately (for the file-based path), the API creates a resume `Execution` with `ResumeSessionID` and `ResumeAnswer`, then calls `Pool.SubmitResume()`. +4. The pool runs `executeResume()`, which invokes `runner.Run()` with `--resume <session>` and the answer as `-p`. + +--- + +## 9. Rate Limiter + +Rate limiting is implemented at two levels: + +### Per-invocation retry (within `ClaudeRunner.Run`) + +`runWithBackoff(ctx, maxRetries=3, baseDelay=5s, fn)` calls the subprocess, detects rate-limit errors (`isRateLimitError`), and retries with exponential backoff: + +``` +attempt 0: immediate +attempt 1: 5 s delay (or Retry-After header value if present) +attempt 2: 10 s delay +attempt 3: 20 s delay (capped at 5 min) +``` + +`isRateLimitError` matches: `"rate limit"`, `"too many requests"`, `"429"`, `"overloaded"`. + +Quota-exhausted errors (`isQuotaExhausted`) — strings like `"hit your limit"` or `"status: rejected"` — are **not** retried; they propagate immediately to `handleRunResult`. + +### Per-agent-type pool cooldown + +When `handleRunResult` detects a rate-limit or quota error it records a cooldown deadline in `p.rateLimited[agentType]`: + +- Transient rate limit: 1 minute (or `Retry-After` value from the error message). +- Quota exhausted: 5 hours. + +During agent selection in `execute()`, the load balancer skips agents whose cooldown deadline has not passed yet. If all agents are on cooldown, the agent with the fewest active tasks is chosen anyway (best-effort fallback). + +--- + +## 10. BlockedError + +```go +type BlockedError struct { + QuestionJSON string // raw JSON: {"text": "...", "options": [...]} + SessionID string // claude session to resume once the user answers +} +``` + +`BlockedError` is returned by `Runner.Run()` when the agent wrote a question to `$CLAUDOMATOR_QUESTION_FILE` and exited. It is **not** a fatal error — it is a signal that the task requires human input before it can continue. + +**Pool handling** (`handleRunResult`): + +```go +var blockedErr *BlockedError +if errors.As(err, &blockedErr) { + exec.Status = "BLOCKED" + store.UpdateTaskState(t.ID, task.StateBlocked) + store.UpdateTaskQuestion(t.ID, blockedErr.QuestionJSON) +} +``` + +The pool does **not** emit a failure result; it still sends a `Result` to `resultCh` with `Err` set to the `BlockedError`. The API layer reads `exec.Status == "BLOCKED"` and exposes the question to the caller. Once the user answers, the API calls `Pool.SubmitResume()` and the task transitions back to RUNNING. + +--- + +## 11. Result Struct and handleRunResult + +```go +type Result struct { + TaskID string + Execution *storage.Execution + Err error +} +``` + +`handleRunResult` is shared between `execute()` and `executeResume()`. It inspects the error returned by `Runner.Run()` and maps it to an execution status and task state: + +| Condition | exec.Status | task.State | +|---|---|---| +| `*BlockedError` | `BLOCKED` | `StateBlocked` | +| `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` | `StateTimedOut` | +| `ctx.Err() == Canceled` | `CANCELLED` | `StateCancelled` | +| `isQuotaExhausted(err)` | `BUDGET_EXCEEDED` | `StateBudgetExceeded` | +| any other non-nil error | `FAILED` | `StateFailed` | +| nil, root task, subtasks exist | `BLOCKED` | `StateBlocked` (waits for subtasks) | +| nil, root task, no subtasks | `READY` | `StateReady` | +| nil, subtask | `COMPLETED` | `StateCompleted` | + +After a subtask completes, `maybeUnblockParent()` checks whether all sibling subtasks are also COMPLETED; if so, the parent transitions from BLOCKED to READY. + +--- + +## Diagrams + +### Sequence: Pool.Submit → goroutine → Runner.Run → state update + +```mermaid +sequenceDiagram + participant Caller + participant Pool + participant Dispatch as dispatch goroutine + participant Worker as execute goroutine + participant Classifier + participant Runner + participant Store + + Caller->>Pool: Submit(ctx, task) + Pool->>Dispatch: workCh ← workItem + Dispatch->>Dispatch: wait for free slot (doneCh) + Dispatch->>Worker: go execute(ctx, task) + + Worker->>Pool: snapshot activePerAgent / rateLimited + Pool-->>Worker: SystemStatus + Worker->>Worker: pickAgent(status) → agentType + Worker->>Classifier: Classify(taskName, instructions, status, agentType) + Classifier-->>Worker: Classification{Model} + + Worker->>Store: waitForDependencies (poll every 5s) + Store-->>Worker: all deps COMPLETED + + Worker->>Store: CreateExecution(exec) + Worker->>Store: UpdateTaskState(RUNNING) + + Worker->>Runner: Run(ctx, task, exec) + Note over Runner: subprocess runs,<br/>stream parsed + Runner-->>Worker: error or nil + + Worker->>Pool: handleRunResult(ctx, task, exec, err, agentType) + Pool->>Store: UpdateTaskState(new state) + Pool->>Store: UpdateExecution(exec) + Pool->>Caller: resultCh ← Result{TaskID, Execution, Err} + + Worker->>Dispatch: doneCh ← token (slot freed) +``` + +### Flowchart: Question-handling flow + +```mermaid +flowchart TD + A([Agent subprocess running]) --> B{Need user input?} + B -- No --> C([Task completes normally]) + B -- Yes --> D["Write JSON to\n$CLAUDOMATOR_QUESTION_FILE\n{text, options}"] + D --> E[Agent exits with code 0] + E --> F[Runner reads question file] + F --> G["Return &BlockedError\n{QuestionJSON, SessionID}"] + G --> H[Pool.handleRunResult detects BlockedError] + H --> I[exec.Status = BLOCKED] + H --> J[Store.UpdateTaskState → StateBlocked] + H --> K[Store.UpdateTaskQuestion → QuestionJSON] + K --> L[API exposes question to user] + L --> M[User submits answer via API] + M --> N["API creates Execution:\nResumeSessionID = original SessionID\nResumeAnswer = user answer"] + N --> O[Pool.SubmitResume → workCh] + O --> P[dispatch launches executeResume goroutine] + P --> Q["Runner.Run with\n--resume ResumeSessionID\n-p ResumeAnswer"] + Q --> R([Claude session continues from where it left off]) +``` diff --git a/docs/packages/storage.md b/docs/packages/storage.md new file mode 100644 index 0000000..2e53cea --- /dev/null +++ b/docs/packages/storage.md @@ -0,0 +1,209 @@ +# Package: `internal/storage` + +## Overview + +The `storage` package provides the SQLite persistence layer for Claudomator. It exposes a `DB` struct that manages all reads and writes to the database, runs schema migrations automatically on `Open`, and enforces valid state transitions when updating task states. + +Key characteristics: +- **Auto-migration**: `Open` creates all tables and indexes if they do not exist, then applies additive column migrations idempotently (duplicate-column errors are silently ignored). +- **WAL mode**: The database is opened with `?_journal_mode=WAL&_busy_timeout=5000` for concurrent read safety and reduced lock contention. +- **State enforcement**: `UpdateTaskState` performs the state change inside a transaction and calls `task.ValidTransition` before committing. +- **Cascading deletes**: `DeleteTask` removes a task, all its descendant subtasks, and all their execution records in a single transaction using a recursive CTE. + +--- + +## Schema + +### `tasks` table + +| Column | Type | Description | +|------------------|-----------|----------------------------------------------------------| +| `id` | `TEXT PK` | UUID; primary key. | +| `name` | `TEXT` | Human-readable task name. Not null. | +| `description` | `TEXT` | Optional longer description. | +| `config_json` | `TEXT` | JSON blob — serialised `AgentConfig` struct. | +| `priority` | `TEXT` | `"high"`, `"normal"`, or `"low"`. Default `"normal"`. | +| `timeout_ns` | `INTEGER` | Timeout as nanoseconds (`time.Duration`). Default `0`. | +| `retry_json` | `TEXT` | JSON blob — serialised `RetryConfig` struct. | +| `tags_json` | `TEXT` | JSON blob — serialised `[]string`. | +| `depends_on_json`| `TEXT` | JSON blob — serialised `[]string` of task IDs. | +| `parent_task_id` | `TEXT` | ID of parent task; null for root tasks. | +| `state` | `TEXT` | Current `State` value. Default `"PENDING"`. | +| `created_at` | `DATETIME`| Creation timestamp (UTC). | +| `updated_at` | `DATETIME`| Last-update timestamp (UTC). | +| `rejection_comment` | `TEXT` | Reviewer comment set by `RejectTask`; nullable. | +| `question_json` | `TEXT` | Pending agent question set by `UpdateTaskQuestion`; nullable. | + +**JSON blob columns**: `config_json`, `retry_json`, `tags_json`, `depends_on_json`. + +**Indexes**: `idx_tasks_state` (state), `idx_tasks_parent_task_id` (parent_task_id). + +### `executions` table + +| Column | Type | Description | +|---------------|-----------|---------------------------------------------------------------| +| `id` | `TEXT PK` | UUID; primary key. | +| `task_id` | `TEXT` | Foreign key → `tasks.id`. | +| `start_time` | `DATETIME`| When the execution started (UTC). | +| `end_time` | `DATETIME`| When the execution ended (UTC); null while running. | +| `exit_code` | `INTEGER` | Process exit code; 0 = success. | +| `status` | `TEXT` | Execution status string (mirrors task state at completion). | +| `stdout_path` | `TEXT` | Path to the stdout log file under `~/.claudomator/executions/`. | +| `stderr_path` | `TEXT` | Path to the stderr log file under `~/.claudomator/executions/`. | +| `artifact_dir`| `TEXT` | Directory containing output artifacts. | +| `cost_usd` | `REAL` | Total API spend for this execution in USD. | +| `error_msg` | `TEXT` | Error description if the execution failed. | +| `session_id` | `TEXT` | Claude `--session-id` value; used to resume interrupted runs. | + +**Indexes**: `idx_executions_status`, `idx_executions_task_id`, `idx_executions_start_time`. + +--- + +## `DB` Struct Methods + +### Lifecycle + +#### `Open(path string) (*DB, error)` +Opens the SQLite database at `path`, runs auto-migration, and returns a `*DB`. The connection string appends `?_journal_mode=WAL&_busy_timeout=5000`. + +#### `(*DB) Close() error` +Closes the underlying database connection. + +--- + +### Task CRUD + +#### `(*DB) CreateTask(t *task.Task) error` +Inserts a new task. `AgentConfig`, `RetryConfig`, `Tags`, and `DependsOn` are serialised as JSON blobs; `Timeout` is stored as nanoseconds. + +#### `(*DB) GetTask(id string) (*task.Task, error)` +Retrieves a single task by ID. Returns an error wrapping `sql.ErrNoRows` if not found. + +#### `(*DB) ListTasks(filter TaskFilter) ([]*task.Task, error)` +Returns tasks matching `filter`, ordered by `created_at DESC`. See [TaskFilter](#taskfilter). + +#### `(*DB) ListSubtasks(parentID string) ([]*task.Task, error)` +Returns all tasks whose `parent_task_id` equals `parentID`, ordered by `created_at ASC`. + +#### `(*DB) UpdateTask(id string, u TaskUpdate) error` +Replaces editable fields (`Name`, `Description`, `Config`, `Priority`, `TimeoutNS`, `Retry`, `Tags`, `DependsOn`) and resets `state` to `PENDING`. Returns an error if the task does not exist. + +#### `(*DB) DeleteTask(id string) error` +Deletes a task, all its descendant subtasks (via recursive CTE), and all their execution records in a single transaction. Returns an error if the task does not exist. + +--- + +### State Management + +#### `(*DB) UpdateTaskState(id string, newState task.State) error` +Atomically updates a task's state inside a transaction. Calls `task.ValidTransition` to reject illegal moves before applying the change. + +#### `(*DB) RejectTask(id, comment string) error` +Sets the task's state to `PENDING` and stores `comment` in `rejection_comment`. Does not go through the state-machine validator (direct update). + +#### `(*DB) ResetTaskForRetry(id string) (*task.Task, error)` +Validates that the current state can transition to `QUEUED`, clears `Agent.Type` and `Agent.Model` so the task can be re-classified, sets `state = QUEUED`, and persists the changes in a transaction. Returns the updated task. + +#### `(*DB) UpdateTaskQuestion(taskID, questionJSON string) error` +Stores a JSON-encoded agent question on the task. Pass an empty string to clear the question after it has been answered. + +--- + +### Execution CRUD + +#### `(*DB) CreateExecution(e *Execution) error` +Inserts a new execution record. + +#### `(*DB) GetExecution(id string) (*Execution, error)` +Retrieves a single execution by ID. + +#### `(*DB) ListExecutions(taskID string) ([]*Execution, error)` +Returns all executions for a task, ordered by `start_time DESC`. + +#### `(*DB) GetLatestExecution(taskID string) (*Execution, error)` +Returns the most recent execution for a task. + +#### `(*DB) UpdateExecution(e *Execution) error` +Updates a completed execution record (end time, exit code, status, cost, paths, session ID). + +#### `(*DB) ListRecentExecutions(since time.Time, limit int, taskID string) ([]*RecentExecution, error)` +Returns executions since `since`, joined with the task name, ordered by `start_time DESC`. If `taskID` is non-empty only executions for that task are included. Returns `RecentExecution` values (see below). + +--- + +## `TaskFilter` + +```go +type TaskFilter struct { + State task.State + Limit int +} +``` + +| Field | Type | Description | +|---------|--------------|-------------------------------------------------------| +| `State` | `task.State` | If non-empty, only tasks in this state are returned. | +| `Limit` | `int` | Maximum number of results. `0` means no limit. | + +--- + +## `Execution` Struct + +```go +type Execution struct { + ID string + TaskID string + StartTime time.Time + EndTime time.Time + ExitCode int + Status string + StdoutPath string + StderrPath string + ArtifactDir string + CostUSD float64 + ErrorMsg string + SessionID string // persisted: claude --session-id for resume + ResumeSessionID string // in-memory only: set when creating a resume execution + ResumeAnswer string // in-memory only: human answer forwarded to agent +} +``` + +| Field | Persisted | Description | +|-------------------|-----------|-----------------------------------------------------------------------| +| `ID` | Yes | UUID; primary key. | +| `TaskID` | Yes | ID of the owning task. | +| `StartTime` | Yes | When execution started (UTC). | +| `EndTime` | Yes | When execution ended (UTC); zero while still running. | +| `ExitCode` | Yes | Process exit code (0 = success). | +| `Status` | Yes | Status string at completion. | +| `StdoutPath` | Yes | Absolute path to the stdout log file. | +| `StderrPath` | Yes | Absolute path to the stderr log file. | +| `ArtifactDir` | Yes | Directory containing output artifacts produced by the agent. | +| `CostUSD` | Yes | Total API spend for this run in USD. | +| `ErrorMsg` | Yes | Error description if the run failed. | +| `SessionID` | Yes | Claude session ID; stored to allow resuming an interrupted session. | +| `ResumeSessionID` | No | Session to resume; set in-memory when creating a resume execution. | +| `ResumeAnswer` | No | Human answer to a blocked question; forwarded to the agent in-memory. | + +--- + +## File Layout + +All runtime data lives under `~/.claudomator/`: + +``` +~/.claudomator/ +├── claudomator.db # SQLite database (WAL mode) +└── executions/ # Execution log files + ├── <execution-id>.stdout + └── <execution-id>.stderr +``` + +The paths are configured by `internal/config`: + +| Config field | Value | +|--------------|--------------------------------------| +| `DBPath` | `~/.claudomator/claudomator.db` | +| `LogDir` | `~/.claudomator/executions/` | + +Both directories are created automatically on first run if they do not exist. diff --git a/docs/packages/task.md b/docs/packages/task.md new file mode 100644 index 0000000..3502f8a --- /dev/null +++ b/docs/packages/task.md @@ -0,0 +1,332 @@ +# Package: `internal/task` + +## Overview + +The `task` package defines the core domain model for Claudomator. A **Task** represents a discrete unit of work to be executed by a Claude agent — described by natural-language instructions, optional budget and timeout constraints, retry policy, and scheduling metadata. Tasks are authored as YAML files and progress through a well-defined lifecycle from `PENDING` to a terminal state. + +--- + +## Task Struct + +```go +type Task struct { + ID string `yaml:"id"` + ParentTaskID string `yaml:"parent_task_id"` + Name string `yaml:"name"` + Description string `yaml:"description"` + Agent AgentConfig `yaml:"agent"` + Timeout Duration `yaml:"timeout"` + Retry RetryConfig `yaml:"retry"` + Priority Priority `yaml:"priority"` + Tags []string `yaml:"tags"` + DependsOn []string `yaml:"depends_on"` + State State `yaml:"-"` + RejectionComment string `yaml:"-"` + QuestionJSON string `yaml:"-"` + CreatedAt time.Time `yaml:"-"` + UpdatedAt time.Time `yaml:"-"` +} +``` + +| Field | Type | Purpose | Default / Notes | +|--------------------|---------------|----------------------------------------------------------------------|------------------------------------------| +| `ID` | `string` | Unique identifier (UUID). | Auto-generated by `Parse` if empty. | +| `ParentTaskID` | `string` | ID of the parent task; empty for top-level tasks. | Optional. | +| `Name` | `string` | Human-readable name. **Required.** | Must be non-empty. | +| `Description` | `string` | Longer explanation shown in the UI. | Optional. | +| `Agent` | `AgentConfig` | Configuration passed to the Claude agent executor. | See [AgentConfig](#agentconfig-struct). | +| `Timeout` | `Duration` | Maximum wall-clock time for a single execution attempt. | `0` means no limit. | +| `Retry` | `RetryConfig` | Retry policy applied when a run fails. | See [RetryConfig](#retryconfig-struct). | +| `Priority` | `Priority` | Scheduling priority: `high`, `normal`, or `low`. | Default: `normal`. | +| `Tags` | `[]string` | Arbitrary labels for filtering and grouping. | Optional; defaults to `[]`. | +| `DependsOn` | `[]string` | IDs of tasks that must reach `COMPLETED` before this one is queued. | Optional; defaults to `[]`. | +| `State` | `State` | Current lifecycle state. Not read from YAML (`yaml:"-"`). | Set to `PENDING` by `Parse`. | +| `RejectionComment` | `string` | Free-text comment left by a reviewer when rejecting a task. | Not in YAML; populated by storage layer. | +| `QuestionJSON` | `string` | JSON-encoded question posted by the agent while `BLOCKED`. | Not in YAML; populated by storage layer. | +| `CreatedAt` | `time.Time` | Timestamp set when the task is parsed. | Auto-set by `Parse`. | +| `UpdatedAt` | `time.Time` | Timestamp updated on every state change. | Auto-set by `Parse`; updated by DB. | + +Fields tagged `yaml:"-"` are runtime-only and are never parsed from YAML task files. + +--- + +## AgentConfig Struct + +```go +type AgentConfig struct { + Type string `yaml:"type"` + Model string `yaml:"model"` + ContextFiles []string `yaml:"context_files"` + Instructions string `yaml:"instructions"` + ProjectDir string `yaml:"project_dir"` + MaxBudgetUSD float64 `yaml:"max_budget_usd"` + PermissionMode string `yaml:"permission_mode"` + AllowedTools []string `yaml:"allowed_tools"` + DisallowedTools []string `yaml:"disallowed_tools"` + SystemPromptAppend string `yaml:"system_prompt_append"` + AdditionalArgs []string `yaml:"additional_args"` + SkipPlanning bool `yaml:"skip_planning"` +} +``` + +| Field | Type | Purpose | +|----------------------|------------|---------------------------------------------------------------------------------------------------------------------------------| +| `Type` | `string` | Agent type classifier. Cleared on retry by `ResetTaskForRetry` so the task can be re-classified. | +| `Model` | `string` | Claude model ID (e.g. `"claude-sonnet-4-6"`). Cleared on retry alongside `Type`. | +| `ContextFiles` | `[]string` | Paths to files pre-loaded as context for the agent. | +| `Instructions` | `string` | Natural-language task instructions. **Required.** | +| `ProjectDir` | `string` | Working directory the agent operates in. | +| `MaxBudgetUSD` | `float64` | API spend cap in USD. `0` means no limit. Must be ≥ 0. | +| `PermissionMode` | `string` | Claude permission mode. Valid values: `default`, `acceptEdits`, `bypassPermissions`, `plan`, `dontAsk`, `delegate`. | +| `AllowedTools` | `[]string` | Explicit tool whitelist passed to the agent. | +| `DisallowedTools` | `[]string` | Explicit tool blacklist passed to the agent. | +| `SystemPromptAppend` | `string` | Text appended to the Claude system prompt. | +| `AdditionalArgs` | `[]string` | Extra CLI arguments forwarded verbatim to the Claude executable. | +| `SkipPlanning` | `bool` | When `true`, the agent skips the planning phase. | + +--- + +## RetryConfig Struct + +```go +type RetryConfig struct { + MaxAttempts int `yaml:"max_attempts"` + Backoff string `yaml:"backoff"` // "linear" or "exponential" +} +``` + +| Field | Type | Purpose | Default | +|---------------|----------|------------------------------------------------------------|-----------------| +| `MaxAttempts` | `int` | Total number of attempts (first run + retries). Min: 1. | `1` | +| `Backoff` | `string` | Wait strategy between retries: `linear` or `exponential`. | `"exponential"` | + +--- + +## YAML Task File Format + +A task file may contain a single task or a batch (see next section). Every supported field is shown below with annotations. + +```yaml +# Unique identifier. Omit to have Claudomator generate a UUID. +id: "fix-login-bug" + +# Display name. Required. +name: "Fix login redirect bug" + +# Optional longer description. +description: "Users are redirected to /home instead of /dashboard after login." + +agent: + # Agent type for routing/classification. Optional; auto-assigned if blank. + type: "claude" + + # Claude model ID. Optional; auto-assigned if blank. + model: "claude-sonnet-4-6" + + # Task instructions passed to the agent. Required. + instructions: | + Fix the post-login redirect in src/auth/login.go so that users are + sent to /dashboard instead of /home. Add a regression test. + + # Working directory for the agent process. + project_dir: "/workspace/myapp" + + # Files loaded into the agent's context before execution starts. + context_files: + - "src/auth/login.go" + - "docs/adr/0003-auth.md" + + # USD spending cap. 0 = no limit. + max_budget_usd: 1.00 + + # Claude permission mode. + # Values: default | acceptEdits | bypassPermissions | plan | dontAsk | delegate + permission_mode: "acceptEdits" + + # Tool whitelist. Empty = all tools allowed. + allowed_tools: + - "Read" + - "Edit" + - "Bash" + + # Tool blacklist. + disallowed_tools: + - "WebFetch" + + # Appended to the agent's system prompt. + system_prompt_append: "Always write tests before implementation." + + # Extra arguments forwarded verbatim to the Claude executable. + additional_args: + - "--verbose" + + # Skip the planning phase. + skip_planning: false + +# Maximum wall-clock time per attempt. Uses Go duration syntax: "30m", "1h", "45s". +# 0 or omitted means no limit. +timeout: "30m" + +retry: + # Total attempts including the first run. Must be >= 1. + max_attempts: 3 + # "linear" or "exponential" + backoff: "exponential" + +# Scheduling priority: high | normal | low +priority: "normal" + +# Arbitrary labels for filtering. +tags: + - "bug" + - "auth" + +# IDs of tasks that must reach COMPLETED before this task is queued. +depends_on: + - "setup-test-db" +``` + +--- + +## Batch File Format + +A batch file wraps multiple task definitions under a `tasks:` key. All top-level task fields are supported per entry. + +```yaml +tasks: + - name: "Step 1 — scaffold" + agent: + instructions: "Create the initial project structure." + priority: "high" + + - name: "Step 2 — implement" + agent: + instructions: "Implement the feature described in docs/feature.md." + depends_on: + - "step-1-id" + + - name: "Step 3 — test" + agent: + instructions: "Write and run integration tests." + depends_on: + - "step-2-id" + retry: + max_attempts: 2 + backoff: "linear" +``` + +`ParseFile` / `Parse` detect the batch format automatically: if unmarshaling into `BatchFile` succeeds and produces at least one task, the batch path is used; otherwise single-task parsing is attempted. + +--- + +## State Constants + +```go +type State string + +const ( + StatePending State = "PENDING" + StateQueued State = "QUEUED" + StateRunning State = "RUNNING" + StateReady State = "READY" + StateCompleted State = "COMPLETED" + StateFailed State = "FAILED" + StateTimedOut State = "TIMED_OUT" + StateCancelled State = "CANCELLED" + StateBudgetExceeded State = "BUDGET_EXCEEDED" + StateBlocked State = "BLOCKED" +) +``` + +| Constant | String value | Description | +|-----------------------|-------------------|-------------------------------------------------------------------------------------------------------| +| `StatePending` | `PENDING` | Newly created; awaiting acceptance for scheduling. | +| `StateQueued` | `QUEUED` | Accepted by the scheduler and waiting for an available executor. | +| `StateRunning` | `RUNNING` | An agent is actively executing the task. | +| `StateReady` | `READY` | Agent finished execution; output is awaiting human review/approval. | +| `StateCompleted` | `COMPLETED` | Approved and finished. **Terminal** — no further transitions. | +| `StateFailed` | `FAILED` | Execution failed (non-zero exit or unrecoverable error). Eligible for retry → `QUEUED`. | +| `StateTimedOut` | `TIMED_OUT` | Execution exceeded the configured `timeout`. Eligible for retry or resume → `QUEUED`. | +| `StateCancelled` | `CANCELLED` | Manually cancelled. Can be restarted by transitioning back to `QUEUED`. | +| `StateBudgetExceeded` | `BUDGET_EXCEEDED` | The `max_budget_usd` cap was reached during execution. Eligible for retry → `QUEUED`. | +| `StateBlocked` | `BLOCKED` | The running agent posted a question requiring a human response before it can continue. | + +--- + +## State Machine + +### Valid Transitions + +| From | To | Condition | +|---------------------|---------------------|-------------------------------------------------------------------| +| `PENDING` | `QUEUED` | Task accepted for scheduling. | +| `PENDING` | `CANCELLED` | Manually cancelled before queuing. | +| `QUEUED` | `RUNNING` | An executor picks up the task. | +| `QUEUED` | `CANCELLED` | Manually cancelled while waiting in queue. | +| `RUNNING` | `READY` | Agent finished; output ready for human review. | +| `RUNNING` | `COMPLETED` | Agent finished successfully with no review required. | +| `RUNNING` | `FAILED` | Agent exited with an error. | +| `RUNNING` | `TIMED_OUT` | Wall-clock timeout exceeded. | +| `RUNNING` | `CANCELLED` | Manually cancelled mid-execution. | +| `RUNNING` | `BUDGET_EXCEEDED` | API spend reached `max_budget_usd`. | +| `RUNNING` | `BLOCKED` | Agent posted a question requiring human input before continuing. | +| `READY` | `COMPLETED` | Reviewer approves the output. | +| `READY` | `PENDING` | Reviewer rejects the output; task sent back for revision. | +| `FAILED` | `QUEUED` | Retry: task re-queued for another attempt. | +| `TIMED_OUT` | `QUEUED` | Retry or resume: task re-queued. | +| `CANCELLED` | `QUEUED` | Restart: task re-queued from scratch. | +| `BUDGET_EXCEEDED` | `QUEUED` | Retry with adjusted budget. | +| `BLOCKED` | `QUEUED` | Answer provided; task re-queued to resume. | +| `BLOCKED` | `READY` | Question resolved; output is ready for review. | +| `COMPLETED` | *(none)* | Terminal state — no outgoing transitions. | + +--- + +## Key Functions + +### `ParseFile` + +```go +func ParseFile(path string) ([]Task, error) +``` + +Reads a YAML file at `path` and returns the parsed tasks. Supports both single-task and batch formats. Initialises defaults (UUID, priority, retry policy, initial state, timestamps) on every returned task. + +### `Parse` + +```go +func Parse(data []byte) ([]Task, error) +``` + +Same as `ParseFile` but operates on raw YAML bytes instead of a file path. + +### `ValidTransition` + +```go +func ValidTransition(from, to State) bool +``` + +Returns `true` if transitioning from `from` to `to` is permitted by the state machine. Used by `storage.DB.UpdateTaskState` to enforce valid transitions inside a transaction. + +### `Validate` + +```go +func Validate(t *Task) error +``` + +Checks a task for constraint violations. Returns a `*ValidationError` (implementing `error`) listing all violations, or `nil` if the task is valid. All violations are collected before returning so callers receive the full list at once. + +--- + +## Validation Rules + +| Rule | Error message | +|------------------------------------------------------------------------------|--------------------------------------------------------------------| +| `name` must be non-empty. | `name is required` | +| `agent.instructions` must be non-empty. | `agent.instructions is required` | +| `agent.max_budget_usd` must be ≥ 0. | `agent.max_budget_usd must be non-negative` | +| `timeout` must be ≥ 0. | `timeout must be non-negative` | +| `retry.max_attempts` must be ≥ 1. | `retry.max_attempts must be at least 1` | +| `retry.backoff`, if set, must be `linear` or `exponential`. | `retry.backoff must be 'linear' or 'exponential'` | +| `priority`, if set, must be `high`, `normal`, or `low`. | `invalid priority "…"; must be high, normal, or low` | +| `agent.permission_mode`, if set, must be one of the six recognised values. | `invalid permission_mode "…"` | |
