From 2c8ec3e53a0f4c6f2d16e94a95fcdce706717091 Mon Sep 17 00:00:00 2001
From: Peter Stone <thepeterstone@gmail.com>
Date: Sun, 22 Mar 2026 23:48:03 +0000
Subject: chore: unify and centralize agent configuration in .agent/

---
 docs/RAW_NARRATIVE.md | 399 --------------------------------------------------
 1 file changed, 399 deletions(-)
 delete mode 100644 docs/RAW_NARRATIVE.md

(limited to 'docs/RAW_NARRATIVE.md')

diff --git a/docs/RAW_NARRATIVE.md b/docs/RAW_NARRATIVE.md
deleted file mode 100644
index 834d812..0000000
--- a/docs/RAW_NARRATIVE.md
+++ /dev/null
@@ -1,399 +0,0 @@
-# Claudomator: Development Narrative
-
-This document is a chronological engineering history of the Claudomator project,
-reconstructed from the git log, ADRs, and source code.
-
----
-
-## 1. Initial commit — core scaffolding (2e2b218)
-
-The project started with a single commit that established the full skeleton:
-task model, executor, API server, CLI, storage layer, and reporter. The Go module
-was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a
-`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model,
-instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was
-chosen as the storage backend (see ADR-001). The executor pool used a bounded
-goroutine model. The API server was plain `net/http` with no external framework.
-The CLI was Cobra.
-
-## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d)
-
-Early housekeeping: added JSON struct tags to all exported types, renamed the Go
-module to its final identifier, and set up the `.gitignore` to exclude the compiled
-binary and local Claude settings.
-
-## 3. Verbose flag, logs CLI command (0377c06, f27d4f7)
-
-Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand
-for tailing execution output.
-
-## 4. Embedded web UI and HTTP wiring (135d8eb)
-
-The first web UI was embedded into the binary using `go:embed`. This made the
-binary fully self-contained: no separate static file server was needed.
-
-## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007)
-
-Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to
-the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the
-`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational
-subtask plumbing.
-
-## 6. Dependency waiting and planning preamble (f527972)
-
-The executor gained dependency waiting: tasks with `depends_on` now block in a
-polling loop until all dependencies reach `COMPLETED`. Any dependency entering a
-terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`)
-immediately fails the waiting task.
-
-The planning preamble was also introduced here — a system prompt prefix injected
-into every task's instructions that explains to the agent how to write question
-files, how to break tasks into subtasks via the `claudomator` CLI, and how to
-commit all changes in git sandboxes.
-
-## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740)
-
-The API gained several new endpoints:
-- `POST /api/elaborate` — calls Claude to expand a brief task description into
-  structured YAML.
-- `GET /api/executions/{id}/stream` — live-streams the execution log.
-- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed).
-- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task.
-
-## 8. Web UI: tabs, new task modal, templates panel (e8d1b80)
-
-The web UI got a tabbed layout (Running / Done / Templates), a modal for creating
-new tasks with AI-drafted instructions, and a templates panel. This was the first
-version of the UI that matched the current design.
-
-## 9. READY state for human-in-the-loop review (6511d6e)
-
-A critical design point: when a top-level task's runner exits successfully, the
-task does not immediately go to `COMPLETED`. Instead it transitions to `READY`,
-meaning it paused for the operator to review the agent's output and explicitly
-accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`.
-`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`.
-
-This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY
-and go directly to `COMPLETED` — only the root task requires human sign-off.
-
-## 10. Fix working_dir failures, hardcoded /root removed (3962597)
-
-Early deployments hardcoded `/root` as the base path for `working_dir`. This was
-removed. `working_dir` is now validated to exist before the subprocess starts.
-
-## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4)
-
-Added the `scripts/` directory with `debug-execution` (inspects a specific
-execution's logs) and `deploy` (builds and deploys the binary to the production
-server). Added a CLI `start` command and the `version` package.
-
-## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444)
-
-A batch of features rescued from a detached-work branch:
-- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json`
-  file before exiting. The pool detects this and transitions the task to `BLOCKED`,
-  storing the question for the user. `POST /api/tasks/{id}/answer` resumes the
-  Claude session with the user's answer injected as the next message.
-- **Rate limiting**: the pool tracks which agents are rate-limited and when.
-  `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from
-  5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline.
-- **Start-next-task script**: a shell script that picks the highest-priority pending
-  task and starts it.
-
-## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3)
-
-The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state
-and a "Start Next" button in the header that triggers the `start-next-task` script.
-
-## 14. Stream-level failure detection when claude exits 0 (4c0ee5c)
-
-Claude can exit 0 even when the task actually failed — for example when the
-permission mode denies a tool_use and Claude exits politely. `parseStream` was
-updated to detect `is_error: true` in the result message and
-`tool_result.is_error: true` with permission-denial text, returning an error in
-both cases so the task goes to `FAILED` rather than silently succeeding.
-
-## 15. Persist log paths at CreateExecution time (f8b5f25)
-
-Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the
-execution record at `UpdateExecution` time (after the subprocess finished). This
-prevented live log tailing. Introduced the `LogPather` interface: runners that
-implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling
-`CreateExecution`, making them available for streaming before the process ends.
-
-## 16. bypassPermissions as executor default (a33211d)
-
-`permission_mode` defaults to `bypassPermissions` when not set in the task YAML.
-This was a deliberate trade-off: unattended automation needs to proceed without
-tool-use confirmation prompts. Operators can override per-task via `permission_mode`.
-
-## 17. Cancel endpoint and pool cancel mechanism (3672981)
-
-`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map
-from taskID to context cancel functions. Cancellation sends a SIGKILL to the
-entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and
-bash children that the claude subprocess spawned.
-
-## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace)
-
-The full BLOCKED cycle was wired end-to-end:
-1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits.
-2. Runner detects the file and returns `*BlockedError`.
-3. Pool transitions task to `BLOCKED` and stores the question JSON.
-4. User answers via `POST /api/tasks/{id}/answer`.
-5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID`
-   and `ResumeAnswer`.
-6. Runner invokes `claude --resume <session-id> -p <answer>`.
-
-A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`,
-causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution`
-returned no session ID.
-
-## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c)
-
-Resume executions now use `context.Background()` instead of inheriting a potentially
-stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled
-tasks can be manually restarted.
-
-## 20. git sandbox execution, project_dir rename (1f36e23)
-
-The `working_dir` field was renamed to `project_dir` across all layers (task YAML,
-storage, API, UI). When `project_dir` is set, the runner no longer executes
-directly in that directory. Instead it:
-
-1. Detects whether `project_dir` is a git repo (initialising one if not).
-2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks`
-   to avoid permission issues with mixed-owner `.git/objects`).
-3. Runs the agent in the sandbox clone.
-4. After the agent exits, verifies no uncommitted changes remain and pushes
-   new commits to the canonical bare repo.
-5. Removes the sandbox.
-
-On BLOCKED, the sandbox is preserved so the agent can resume where it left off
-in the same working tree.
-
-Concurrent push conflicts (two sandboxes pushing at the same time) are handled
-by a fetch-rebase-retry sequence.
-
-## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2)
-
-`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If
-the transition is not allowed by the state machine, the function returns an error
-and no write occurs. This is the enforcement point for the state machine invariants.
-
-## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97)
-
-The previous pool rejected `Submit` when all slots were taken. This was replaced
-with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted
-while the pool is at capacity are buffered in the channel and picked up as soon
-as a slot opens. `Submit` now only returns an error if the channel itself is full
-(which requires an enormous backlog).
-
-## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376)
-
-Several API reliability improvements:
-- WebSocket connections now require an API token (if `SetAPIToken` was called) and
-  are capped at a configurable maximum number of clients. A ping/pong keepalive
-  prevents stale connections from accumulating.
-- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate`
-  endpoints to prevent abuse.
-- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of
-  individual handlers per script, a single handler dispatches to registered scripts
-  by name.
-
-## 24. API: extend executions and log streaming endpoints (7914153)
-
-`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs`
-was added for fetching completed log files. Live streaming via SSE and the log
-tail endpoint were polished.
-
-## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6)
-
-CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared
-HTTP client, a default server URL (`http://localhost:8484`). Added the `report`
-CLI subcommand for fetching execution summaries from the server.
-
-## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822)
-
-This was a major refactor over several commits:
-1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`,
-   `"gemini"`, etc.).
-2. `Pool` was changed from holding a single `ClaudeRunner` to holding a
-   `map[string]Runner` — one runner per agent type.
-3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the
-   `gemini` CLI.
-4. The storage layer, API handlers, elaborate/validate endpoints, and all tests
-   were updated to use `AgentConfig`.
-5. The web UI was updated to expose agent type selection.
-
-## 27. Gemini-based task classification and explicit load balancing (406247b)
-
-`Classifier` and `pickAgent` were introduced to automate agent and model selection:
-
-- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available
-  (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active
-  if all agents are rate-limited.
-- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick
-  the best model for the task. This is intentionally model-picks-model: use a fast,
-  cheap classifier to avoid wasting expensive tokens.
-
-After this commit the flow is: `execute()` → pick agent → call classifier → set
-`t.Agent.Type` and `t.Agent.Model` → dispatch to runner.
-
-## 28. ADR-003: Security Model (93a4c85)
-
-The security model was documented formally: no auth, permissive CORS, `bypassPermissions`
-as default, and the known risk inventory (see `docs/adr/003-security-model.md`).
-
-## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30)
-
-Running tasks became the default view. A "Running view" showing currently running
-tasks alongside the 24h execution history was added. Agent type and model were
-surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h.
-
-## 30. Quota exhaustion detection from stream (076c0fa)
-
-Previously, quota exhaustion (the 5-hour usage limit) was treated identically to
-generic failures. `isQuotaExhausted` was introduced to distinguish it: quota
-exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the
-agent, rather than failing the task with a generic error.
-
-## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac)
-
-The sandbox teardown strategy was revised: instead of pushing directly into the
-working copy (which fails for non-bare repos), the sandbox pushes to a bare repo
-(`remote "local"` or `remote "origin"`) and the working copy is pulled separately
-by the developer. This avoids permission errors from mixed-owner `.git/objects`.
-The `--no-hardlinks` clone flag was added to prevent object sharing.
-
-## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46)
-
-When a top-level task exits the runner successfully but has subtasks, it transitions
-to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new
-`maybeUnblockParent` function is called every time a subtask completes: if all
-siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is
-presented for operator review.
-
-## 33. Stale RUNNING task recovery on server startup (9159572)
-
-`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for
-tasks still in `RUNNING` state (left over from a previous server crash) and marks
-them `FAILED`, closing their open execution records. This prevents stuck tasks
-after server restarts.
-
-## 34. API: configurable mockRunner, async error-path tests (b33566b)
-
-The `api` test suite was hardened with a configurable `mockRunner` that can be
-injected into the test server. Async error paths (runner returns an error, DB
-update fails mid-execution) were now exercised in tests.
-
-## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409)
-
-Several storage correctness fixes:
-- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`,
-  `idx_executions_task_id`, and `idx_executions_start_time` indexes were added.
-- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions;
-  tests were added to catch this.
-- `DeleteTask` was made atomic using a recursive CTE to delete the task and all
-  its subtasks in a single transaction.
-
-## 36. API: validate ?state= param, standardize operation response shapes (933af81)
-
-`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation
-responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised
-to return `{"status": "ok"}` on success.
-
-## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c)
-
-Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through
-classification again so they pick up the latest agent/model selection logic. The
-post-run error classification block was extracted into `handleRunResult` — a shared
-helper called by both `execute` and `executeResume` — eliminating 60+ lines of
-duplication.
-
-## 38. Legacy Claude field removed (b4371d0, a782bbf)
-
-The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir`
-shim were removed. The schema is now fully generic.
-
-## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e)
-
-A documented invariant was added to the `execOnce` goroutine that kills the
-subprocess process group: it cannot block indefinitely. Tests were added to verify
-no goroutine leak occurs when a task is cancelled.
-
-## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b)
-
-The classifier now skips calling itself if the selected agent is rate-limited,
-avoiding a redundant Gemini API call when the rate-limited agent is already known.
-The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`,
-`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`,
-`gemini-2.5-flash`, `gemini-2.5-pro`).
-
-## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b)
-
-Two map leak bugs were fixed in the pool:
-- `activePerAgent[agentType]` was decremented but never deleted when the count hit
-  zero, so inactive agents accumulated as dead entries.
-- Expired `rateLimited[agentType]` entries were not deleted, so the map grew
-  unboundedly over long runs.
-
-## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624)
-
-The sandbox teardown removed the `git pull` into the working copy (which was failing
-due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to
-detect `"fetch first"` and `"non-fast-forward"` as the rejection signals.
-
-## 43. Explicit load balancing separated from classification (e033504)
-
-Previously the `Classifier` both picked the agent and selected the model. This was
-split: `pickAgent` is deterministic code that picks the agent from the registered
-runners using the load-balancing algorithm. The `Classifier` only picks the model
-for the already-selected agent. This makes load balancing reliable and fast even
-when the Gemini classifier is unavailable.
-
-## 44. Session ID fix on second block-and-resume cycle (65c7638)
-
-A bug was found where the second BLOCKED→answer→resume cycle passed the wrong
-`--resume` session ID to Claude. The fix ensures that resume executions propagate
-the original session ID rather than the new execution's UUID.
-
-## 45. validTransitions promoted to package-level var (3226af3)
-
-`validTransitions` was promoted to a package-level variable in `internal/task/task.go`
-for clarity and potential reuse outside the package. ADR-002 was updated to reflect
-the current state machine including the `BLOCKED→READY` transition for parent tasks.
-
----
-
-## Feature Summary (current state)
-
-| Feature | Status |
-|---|---|
-| Task YAML parsing, batch files | Done |
-| SQLite persistence | Done |
-| REST API (CRUD + lifecycle) | Done |
-| WebSocket real-time events | Done |
-| Claude subprocess execution | Done |
-| Gemini subprocess execution | Done |
-| Explicit load balancing (pickAgent) | Done |
-| Gemini-based model classification | Done |
-| BLOCKED / question-answer / resume | Done |
-| git sandbox isolation | Done |
-| Subtask creation and unblocking | Done |
-| READY state / human accept-reject | Done |
-| Rate-limit and quota tracking | Done |
-| Stale RUNNING recovery on startup | Done |
-| Per-IP rate limiter on elaborate | Done |
-| Web UI (PWA) | Done |
-| Push notifications (PWA) | Planned |
-
---- 2026-03-16T00:56:20Z ---
-Converter sudoku to rust
-
---- 2026-03-16T01:14:27Z ---
-For claudomator tasks that are ready, check the deployed server version against their fix commit
-
---- 2026-03-16T01:17:00Z ---
-For every claudomator task that is ready, display on the task whether the currently deployed server includes the commit which fixes that task
-- 
cgit v1.2.3