# Claudomator: Development Narrative This document is a chronological engineering history of the Claudomator project, reconstructed from the git log, ADRs, and source code. --- ## 1. Initial commit — core scaffolding (2e2b218) The project started with a single commit that established the full skeleton: task model, executor, API server, CLI, storage layer, and reporter. The Go module was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a `ClaudeConfig` field (later renamed to `AgentConfig`) holding the model, instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was chosen as the storage backend (see ADR-001). The executor pool used a bounded goroutine model. The API server was plain `net/http` with no external framework. The CLI was Cobra. ## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d) Early housekeeping: added JSON struct tags to all exported types, renamed the Go module to its final identifier, and set up the `.gitignore` to exclude the compiled binary and local Claude settings. ## 3. Verbose flag, logs CLI command (0377c06, f27d4f7) Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand for tailing execution output. ## 4. Embedded web UI and HTTP wiring (135d8eb) The first web UI was embedded into the binary using `go:embed`. This made the binary fully self-contained: no separate static file server was needed. ## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007) Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the `Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational subtask plumbing. ## 6. Dependency waiting and planning preamble (f527972) The executor gained dependency waiting: tasks with `depends_on` now block in a polling loop until all dependencies reach `COMPLETED`. Any dependency entering a terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`) immediately fails the waiting task. The planning preamble was also introduced here — a system prompt prefix injected into every task's instructions that explains to the agent how to write question files, how to break tasks into subtasks via the `claudomator` CLI, and how to commit all changes in git sandboxes. ## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740) The API gained several new endpoints: - `POST /api/elaborate` — calls Claude to expand a brief task description into structured YAML. - `GET /api/executions/{id}/stream` — live-streams the execution log. - `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed). - `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task. ## 8. Web UI: tabs, new task modal, templates panel (e8d1b80) The web UI got a tabbed layout (Running / Done / Templates), a modal for creating new tasks with AI-drafted instructions, and a templates panel. This was the first version of the UI that matched the current design. ## 9. READY state for human-in-the-loop review (6511d6e) A critical design point: when a top-level task's runner exits successfully, the task does not immediately go to `COMPLETED`. Instead it transitions to `READY`, meaning it paused for the operator to review the agent's output and explicitly accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`. `READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`. This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY and go directly to `COMPLETED` — only the root task requires human sign-off. ## 10. Fix working_dir failures, hardcoded /root removed (3962597) Early deployments hardcoded `/root` as the base path for `working_dir`. This was removed. `working_dir` is now validated to exist before the subprocess starts. ## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4) Added the `scripts/` directory with `debug-execution` (inspects a specific execution's logs) and `deploy` (builds and deploys the binary to the production server). Added a CLI `start` command and the `version` package. ## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444) A batch of features rescued from a detached-work branch: - **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json` file before exiting. The pool detects this and transitions the task to `BLOCKED`, storing the question for the user. `POST /api/tasks/{id}/answer` resumes the Claude session with the user's answer injected as the next message. - **Rate limiting**: the pool tracks which agents are rate-limited and when. `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from 5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline. - **Start-next-task script**: a shell script that picks the highest-priority pending task and starts it. ## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3) The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state and a "Start Next" button in the header that triggers the `start-next-task` script. ## 14. Stream-level failure detection when claude exits 0 (4c0ee5c) Claude can exit 0 even when the task actually failed — for example when the permission mode denies a tool_use and Claude exits politely. `parseStream` was updated to detect `is_error: true` in the result message and `tool_result.is_error: true` with permission-denial text, returning an error in both cases so the task goes to `FAILED` rather than silently succeeding. ## 15. Persist log paths at CreateExecution time (f8b5f25) Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the execution record at `UpdateExecution` time (after the subprocess finished). This prevented live log tailing. Introduced the `LogPather` interface: runners that implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling `CreateExecution`, making them available for streaming before the process ends. ## 16. bypassPermissions as executor default (a33211d) `permission_mode` defaults to `bypassPermissions` when not set in the task YAML. This was a deliberate trade-off: unattended automation needs to proceed without tool-use confirmation prompts. Operators can override per-task via `permission_mode`. ## 17. Cancel endpoint and pool cancel mechanism (3672981) `POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map from taskID to context cancel functions. Cancellation sends a SIGKILL to the entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and bash children that the claude subprocess spawned. ## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace) The full BLOCKED cycle was wired end-to-end: 1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits. 2. Runner detects the file and returns `*BlockedError`. 3. Pool transitions task to `BLOCKED` and stores the question JSON. 4. User answers via `POST /api/tasks/{id}/answer`. 5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID` and `ResumeAnswer`. 6. Runner invokes `claude --resume -p `. A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`, causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution` returned no session ID. ## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c) Resume executions now use `context.Background()` instead of inheriting a potentially stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled tasks can be manually restarted. ## 20. git sandbox execution, project_dir rename (1f36e23) The `working_dir` field was renamed to `project_dir` across all layers (task YAML, storage, API, UI). When `project_dir` is set, the runner no longer executes directly in that directory. Instead it: 1. Detects whether `project_dir` is a git repo (initialising one if not). 2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks` to avoid permission issues with mixed-owner `.git/objects`). 3. Runs the agent in the sandbox clone. 4. After the agent exits, verifies no uncommitted changes remain and pushes new commits to the canonical bare repo. 5. Removes the sandbox. On BLOCKED, the sandbox is preserved so the agent can resume where it left off in the same working tree. Concurrent push conflicts (two sandboxes pushing at the same time) are handled by a fetch-rebase-retry sequence. ## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2) `storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If the transition is not allowed by the state machine, the function returns an error and no write occurs. This is the enforcement point for the state machine invariants. ## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97) The previous pool rejected `Submit` when all slots were taken. This was replaced with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted while the pool is at capacity are buffered in the channel and picked up as soon as a slot opens. `Submit` now only returns an error if the channel itself is full (which requires an enormous backlog). ## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376) Several API reliability improvements: - WebSocket connections now require an API token (if `SetAPIToken` was called) and are capped at a configurable maximum number of clients. A ping/pong keepalive prevents stale connections from accumulating. - A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate` endpoints to prevent abuse. - The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of individual handlers per script, a single handler dispatches to registered scripts by name. ## 24. API: extend executions and log streaming endpoints (7914153) `GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs` was added for fetching completed log files. Live streaming via SSE and the log tail endpoint were polished. ## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6) CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared HTTP client, a default server URL (`http://localhost:8484`). Added the `report` CLI subcommand for fetching execution summaries from the server. ## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822) This was a major refactor over several commits: 1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`, `"gemini"`, etc.). 2. `Pool` was changed from holding a single `ClaudeRunner` to holding a `map[string]Runner` — one runner per agent type. 3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the `gemini` CLI. 4. The storage layer, API handlers, elaborate/validate endpoints, and all tests were updated to use `AgentConfig`. 5. The web UI was updated to expose agent type selection. ## 27. Gemini-based task classification and explicit load balancing (406247b) `Classifier` and `pickAgent` were introduced to automate agent and model selection: - **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active if all agents are rate-limited. - **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick the best model for the task. This is intentionally model-picks-model: use a fast, cheap classifier to avoid wasting expensive tokens. After this commit the flow is: `execute()` → pick agent → call classifier → set `t.Agent.Type` and `t.Agent.Model` → dispatch to runner. ## 28. ADR-003: Security Model (93a4c85) The security model was documented formally: no auth, permissive CORS, `bypassPermissions` as default, and the known risk inventory (see `docs/adr/003-security-model.md`). ## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30) Running tasks became the default view. A "Running view" showing currently running tasks alongside the 24h execution history was added. Agent type and model were surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h. ## 30. Quota exhaustion detection from stream (076c0fa) Previously, quota exhaustion (the 5-hour usage limit) was treated identically to generic failures. `isQuotaExhausted` was introduced to distinguish it: quota exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the agent, rather than failing the task with a generic error. ## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac) The sandbox teardown strategy was revised: instead of pushing directly into the working copy (which fails for non-bare repos), the sandbox pushes to a bare repo (`remote "local"` or `remote "origin"`) and the working copy is pulled separately by the developer. This avoids permission errors from mixed-owner `.git/objects`. The `--no-hardlinks` clone flag was added to prevent object sharing. ## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46) When a top-level task exits the runner successfully but has subtasks, it transitions to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new `maybeUnblockParent` function is called every time a subtask completes: if all siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is presented for operator review. ## 33. Stale RUNNING task recovery on server startup (9159572) `Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for tasks still in `RUNNING` state (left over from a previous server crash) and marks them `FAILED`, closing their open execution records. This prevents stuck tasks after server restarts. ## 34. API: configurable mockRunner, async error-path tests (b33566b) The `api` test suite was hardened with a configurable `mockRunner` that can be injected into the test server. Async error paths (runner returns an error, DB update fails mid-execution) were now exercised in tests. ## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409) Several storage correctness fixes: - `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`, `idx_executions_task_id`, and `idx_executions_start_time` indexes were added. - `ListRecentExecutions` had an off-by-one that caused it to miss recent executions; tests were added to catch this. - `DeleteTask` was made atomic using a recursive CTE to delete the task and all its subtasks in a single transaction. ## 36. API: validate ?state= param, standardize operation response shapes (933af81) `GET /api/tasks?state=XYZ` now validates the state value. All mutating operation responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised to return `{"status": "ok"}` on success. ## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c) Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through classification again so they pick up the latest agent/model selection logic. The post-run error classification block was extracted into `handleRunResult` — a shared helper called by both `execute` and `executeResume` — eliminating 60+ lines of duplication. ## 38. Legacy Claude field removed (b4371d0, a782bbf) The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir` shim were removed. The schema is now fully generic. ## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e) A documented invariant was added to the `execOnce` goroutine that kills the subprocess process group: it cannot block indefinitely. Tests were added to verify no goroutine leak occurs when a task is cancelled. ## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b) The classifier now skips calling itself if the selected agent is rate-limited, avoiding a redundant Gemini API call when the rate-limited agent is already known. The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`, `claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`, `gemini-2.5-flash`, `gemini-2.5-pro`). ## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b) Two map leak bugs were fixed in the pool: - `activePerAgent[agentType]` was decremented but never deleted when the count hit zero, so inactive agents accumulated as dead entries. - Expired `rateLimited[agentType]` entries were not deleted, so the map grew unboundedly over long runs. ## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624) The sandbox teardown removed the `git pull` into the working copy (which was failing due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to detect `"fetch first"` and `"non-fast-forward"` as the rejection signals. ## 43. Explicit load balancing separated from classification (e033504) Previously the `Classifier` both picked the agent and selected the model. This was split: `pickAgent` is deterministic code that picks the agent from the registered runners using the load-balancing algorithm. The `Classifier` only picks the model for the already-selected agent. This makes load balancing reliable and fast even when the Gemini classifier is unavailable. ## 44. Session ID fix on second block-and-resume cycle (65c7638) A bug was found where the second BLOCKED→answer→resume cycle passed the wrong `--resume` session ID to Claude. The fix ensures that resume executions propagate the original session ID rather than the new execution's UUID. ## 45. validTransitions promoted to package-level var (3226af3) `validTransitions` was promoted to a package-level variable in `internal/task/task.go` for clarity and potential reuse outside the package. ADR-002 was updated to reflect the current state machine including the `BLOCKED→READY` transition for parent tasks. --- ## Feature Summary (current state) | Feature | Status | |---|---| | Task YAML parsing, batch files | Done | | SQLite persistence | Done | | REST API (CRUD + lifecycle) | Done | | WebSocket real-time events | Done | | Claude subprocess execution | Done | | Gemini subprocess execution | Done | | Explicit load balancing (pickAgent) | Done | | Gemini-based model classification | Done | | BLOCKED / question-answer / resume | Done | | git sandbox isolation | Done | | Subtask creation and unblocking | Done | | READY state / human accept-reject | Done | | Rate-limit and quota tracking | Done | | Stale RUNNING recovery on startup | Done | | Per-IP rate limiter on elaborate | Done | | Web UI (PWA) | Done | | Push notifications (PWA) | Planned | --- 2026-03-16T00:56:20Z --- Converter sudoku to rust --- 2026-03-16T01:14:27Z --- For claudomator tasks that are ready, check the deployed server version against their fix commit --- 2026-03-16T01:17:00Z --- For every claudomator task that is ready, display on the task whether the currently deployed server includes the commit which fixes that task