From 2c8ec3e53a0f4c6f2d16e94a95fcdce706717091 Mon Sep 17 00:00:00 2001 From: Peter Stone Date: Sun, 22 Mar 2026 23:48:03 +0000 Subject: chore: unify and centralize agent configuration in .agent/ --- .agent/coding_standards.md | 17 ++ .agent/config.md | 50 ++++++ .agent/design.md | 392 ++++++++++++++++++++++++++++++++++++++++++++ .agent/mission.md | 15 ++ .agent/narrative.md | 399 +++++++++++++++++++++++++++++++++++++++++++++ .agent/preferences.md | 8 + .agent/worklog.md | 72 ++++++++ .gemini/GEMINI.md | 7 + CLAUDE.md | 231 +------------------------- SESSION_STATE.md | 72 -------- docs/RAW_NARRATIVE.md | 399 --------------------------------------------- docs/architecture.md | 392 -------------------------------------------- 12 files changed, 968 insertions(+), 1086 deletions(-) create mode 100644 .agent/coding_standards.md create mode 100644 .agent/config.md create mode 100644 .agent/design.md create mode 100644 .agent/mission.md create mode 100644 .agent/narrative.md create mode 100644 .agent/preferences.md create mode 100644 .agent/worklog.md create mode 100644 .gemini/GEMINI.md delete mode 100644 SESSION_STATE.md delete mode 100644 docs/RAW_NARRATIVE.md delete mode 100644 docs/architecture.md diff --git a/.agent/coding_standards.md b/.agent/coding_standards.md new file mode 100644 index 0000000..dc39d7f --- /dev/null +++ b/.agent/coding_standards.md @@ -0,0 +1,17 @@ +# Coding Standards + +Technical standards for the **Claudomator** project. + +## 1. Go (Backend) +- **CGo Dependency:** `go-sqlite3` requires a C compiler (`gcc`). +- **Concurrency:** Uses a bounded goroutine pool. Always test with `go test -race ./...`. +- **State Machine:** Follow `task.ValidTransition` for all state updates. +- **Sandboxing:** Task modifications happen in `/tmp/claudomator-sandbox-*`. + +## 2. Testing +- **Reproduction:** Always create a failing test case for bugs. +- **Race Detector:** Mandatory for `internal/executor` and `internal/api/hub` changes. + +## 3. Architecture +- **Single Binary:** Keep the binary self-contained using `go:embed` for web assets. +- **Durability:** Use SQLite with WAL mode. diff --git a/.agent/config.md b/.agent/config.md new file mode 100644 index 0000000..d8ca024 --- /dev/null +++ b/.agent/config.md @@ -0,0 +1,50 @@ +# Agent Configuration & Master Rulebook (.agent/config.md) + +This is the primary source of truth for all AI agents working on **Claudomator**. These instructions take absolute precedence over general defaults. + +## 1. Project Directory Structure (.agent/) + +| File | Purpose | +|------|---------| +| `config.md` | **Main Entry Point** — Rules, workflows, and core mandates. | +| `worklog.md` | **Session State** — Current focus, recently completed, and next steps. | +| `design.md` | **Architecture** — High-level design, component table, and state machine. | +| `coding_standards.md` | **Technical Standards** — Go idioms, concurrency, and testing. | +| `mission.md` | **Mission & Values** — Strategic goals and agent personality. | +| `narrative.md` | **Background** — Historical context and evolution of the project. | +| `preferences.md` | **User Prefs** — Living record of user-specific likes/dislikes. | + +## 2. Core Mandates + +### ULTRA-STRICT ROOT SAFETY PROTOCOL +1. **Inquiry-Only Default:** Treat every message as research/analysis unless it is an explicit, imperative command (Directive). +2. **Zero Unsolicited Implementation:** Never modify files, directories, or processes based on assumptions. +3. **Interactive Strategy Checkpoint:** Research first, present a strategy, and **WAIT** for an explicit "GO" before any system-changing tool call. +4. **No Destructive Assumptions:** Always verify state (`ps`, `ls`, `git status`) before proposing actions. +5. **Root-Awareness:** Prioritize system integrity and user confirmation over proactiveness. + +### Living Documentation Mandate +1. **Continuous Capture:** Agents MUST proactively update the files in `.agent/` as new decisions, patterns, or user preferences are revealed. +2. **No Stale Instructions:** If a workflow or technical standard evolves, the agent is responsible for reflecting that change in the Master Rulebook immediately. +3. **Worklog Integrity:** The `.agent/worklog.md` must be updated at the start and end of EVERY session. + +## 3. Workflows + +### Research -> Strategy -> Execution +1. **Research:** Map codebase, validate assumptions, reproduce bugs. +2. **Strategy:** Share a summary. Wait for approval if significant. +3. **Execution (Plan-Act-Validate): + - **Plan:** Define implementation and testing. + - **Act:** Apply surgical, idiomatic changes. + - **Validate:** Run `go test ./...`. Use `-race` for concurrency-heavy code. + +## 4. Essential Commands + +| Command | Action | +|---------|--------| +| `./claudomator serve` | Start API server | +| `go test ./...` | Run all tests | +| `go test -race ./...` | Run tests with race detector (recommended) | +| `./claudomator run ` | Run a task file directly | +| `./claudomator list` | List tasks via CLI | +| `./scripts/deploy` | Build and deploy binary | diff --git a/.agent/design.md b/.agent/design.md new file mode 100644 index 0000000..27c7601 --- /dev/null +++ b/.agent/design.md @@ -0,0 +1,392 @@ +# Claudomator — System Architecture + +## 1. System Purpose and Design Goals + +Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude, +Gemini), and reports results. Its primary use case is unattended, automated execution of agent +tasks with real-time status streaming to a mobile Progressive Web App. + +**Design goals:** + +- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite. +- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning. +- **Durability** — all task state survives server restarts (SQLite + WAL mode). +- **Real-time feedback** — WebSocket pushes task completion events to connected clients. +- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven + model-tier selection via a cheap Gemini classifier. +- **Git-isolated execution** — tasks that modify source code run in a temporary git clone + (sandbox) to prevent concurrent corruption and partial-work leakage. +- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before + reaching `COMPLETED`. + +See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale. + +--- + +## 2. High-Level Architecture + +```mermaid +flowchart TD + CLI["CLI\ncmd/claudomator\n(cobra)"] + API["HTTP API\ninternal/api\nREST + WebSocket"] + Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"] + Classifier["Gemini Classifier\ninternal/executor.Classifier"] + ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"] + GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"] + Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"] + Subprocess["AI subprocess\nclaude -p / gemini"] + SQLite["SQLite\ntasks.db"] + LogFiles["Log Files\n~/.claudomator/executions/"] + Hub["WebSocket Hub\ninternal/api.Hub"] + Clients["Clients\nbrowser / mobile PWA"] + Notifier["Notifier\ninternal/notify"] + + CLI -->|run / serve| API + CLI -->|direct YAML run| Pool + API -->|POST /api/tasks/run| Pool + Pool -->|pickAgent + Classify| Classifier + Pool -->|Submit / SubmitResume| ClaudeRunner + Pool -->|Submit / SubmitResume| GeminiRunner + ClaudeRunner -->|git clone| Sandbox + Sandbox -->|claude -p| Subprocess + GeminiRunner -->|gemini| Subprocess + Subprocess -->|stream-json stdout| LogFiles + Pool -->|UpdateTaskState| SQLite + Pool -->|resultCh| API + API -->|Broadcast| Hub + Hub -->|fan-out| Clients + API -->|Notify| Notifier +``` + +--- + +## 3. Component Table + +| Package | Role | Key Exported Types | +|---|---|---| +| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` | +| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` | +| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` | +| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` | +| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` | +| `internal/config` | TOML config loading; data-directory layout | `Config` | +| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` | +| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` | +| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) | +| `version` | Build-time version string | `Version` | + +--- + +## 4. Package Dependency Graph + +```mermaid +graph LR + cli["internal/cli"] + api["internal/api"] + executor["internal/executor"] + storage["internal/storage"] + task["internal/task"] + config["internal/config"] + reporter["internal/reporter"] + notify["internal/notify"] + web["web"] + version["version"] + + cli --> api + cli --> executor + cli --> storage + cli --> config + cli --> version + api --> executor + api --> storage + api --> task + api --> notify + api --> web + executor --> storage + executor --> task + reporter --> task + reporter --> storage + storage --> task +``` + +--- + +## 5. Task Execution Pipeline + +The following numbered steps trace a task from API submission to final state, with file and line +references to the key logic in each step. + +1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`. + Task is written to SQLite in `PENDING` state. + (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`) + +2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates + the `PENDING → QUEUED` transition) then `executor.Pool.Submit`. + (`internal/api/server.go:460`, `internal/executor/executor.go:125`) + +3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot + (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`. + (`internal/executor/executor.go:102`) + +4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest + active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes + `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal. + (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`) + +5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite + every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state). + (`internal/executor/executor.go:642`) + +6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING` + status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are + immediately available for tailing. + (`internal/executor/executor.go:483`) + +7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument + list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume + execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first. + (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`) + +8. **Output streaming** — stdout is written to `//stdout.log`; the runner + concurrently parses the `stream-json` lines for cost and session ID. + (`internal/executor/claude.go`) + +9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error + type to a final task state and calls `storage.DB.UpdateTaskState`. + (`internal/executor/executor.go:256`) + + | Outcome | Final state | + |---|---| + | `runner.Run` → `nil`, top-level, no subtasks | `READY` | + | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` | + | `runner.Run` → `nil`, subtask | `COMPLETED` | + | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` | + | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` | + | `ctx.Err() == Canceled` | `CANCELLED` | + | quota exhausted | `BUDGET_EXCEEDED` | + | any other error | `FAILED` | + +10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults` + reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`. + (`internal/api/server.go:123`, `internal/api/server.go:129`) + +11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain, + `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task + fails and the sandbox is preserved for inspection. + (`internal/executor/claude.go:teardownSandbox`) + +12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or + `POST /api/tasks/{id}/reject` (`READY → PENDING`). + (`internal/api/server.go:487`, `internal/api/server.go:507`) + +--- + +## 6. Task State Machine + +```mermaid +stateDiagram-v2 + [*] --> PENDING : task created + + PENDING --> QUEUED : POST /run + PENDING --> CANCELLED : POST /cancel + + QUEUED --> RUNNING : pool goroutine starts + QUEUED --> CANCELLED : POST /cancel + + RUNNING --> READY : exit 0, top-level, no subtasks + RUNNING --> BLOCKED : exit 0, top-level, has subtasks + RUNNING --> BLOCKED : question.json written + RUNNING --> COMPLETED : exit 0, subtask + RUNNING --> FAILED : exit non-zero / stream error + RUNNING --> TIMED_OUT : context deadline exceeded + RUNNING --> CANCELLED : context cancelled + RUNNING --> BUDGET_EXCEEDED : quota exhausted + + READY --> COMPLETED : POST /accept + READY --> PENDING : POST /reject + + BLOCKED --> QUEUED : POST /answer + BLOCKED --> READY : all subtasks COMPLETED + + FAILED --> QUEUED : POST /run (retry) + TIMED_OUT --> QUEUED : POST /resume + CANCELLED --> QUEUED : POST /run (restart) + BUDGET_EXCEEDED --> QUEUED : POST /run (retry) + + COMPLETED --> [*] +``` + +**State definitions** (`internal/task/task.go:9`): + +| State | Meaning | +|---|---| +| `PENDING` | Created; not yet submitted for execution | +| `QUEUED` | Submitted to pool; waiting for a goroutine slot | +| `RUNNING` | Subprocess actively executing | +| `READY` | Top-level task done; awaiting operator accept/reject | +| `COMPLETED` | Fully done (only true terminal state) | +| `FAILED` | Execution error; eligible for retry | +| `TIMED_OUT` | Exceeded configured timeout; resumable | +| `CANCELLED` | Explicitly cancelled by operator | +| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` | +| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks | + +`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state +write. (`internal/task/task.go:113`) + +--- + +## 7. WebSocket Broadcast Flow + +```mermaid +sequenceDiagram + participant Runner as ClaudeRunner / GeminiRunner + participant Pool as executor.Pool + participant API as api.Server (forwardResults) + participant Hub as api.Hub + participant Client1 as WebSocket client 1 + participant Client2 as WebSocket client 2 + + Runner->>Pool: runner.Run() returns + Pool->>Pool: handleRunResult() — sets exec.Status + Pool->>SQLite: UpdateTaskState + UpdateExecution + Pool->>Pool: resultCh <- &Result{...} + API->>Pool: <-pool.Results() + API->>API: marshal task_completed JSON event + API->>Hub: hub.Broadcast(data) + Hub->>Client1: ws.Write(data) + Hub->>Client2: ws.Write(data) + API->>Notifier: notifier.Notify(Event{...}) [if set] +``` + +**Event payload** (JSON): +```json +{ + "type": "task_completed", + "task_id": "", + "status": "READY | COMPLETED | FAILED | ...", + "exit_code": 0, + "cost_usd": 0.042, + "error": "", + "timestamp": "2026-03-11T12:00:00Z" +} +``` + +The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses +an interactive question tool (currently unused in the primary file-based `BLOCKED` flow). + +WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is +configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections. +(`internal/api/websocket.go`) + +--- + +## 8. Subtask / Parent-Task Dependency Resolution + +Claudomator supports two distinct dependency mechanisms: + +### 8a. `depends_on` — explicit task ordering + +Tasks declare `depends_on: [, ...]` in their YAML or creation payload. When the +executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed +tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`, +`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED` +immediately. (`internal/executor/executor.go:642`) + +### 8b. Parent / subtask blocking (`BLOCKED` state) + +A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it) +transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows +an agent to dispatch subtasks and then wait for them. + +``` +Parent task runner exits 0 + ├── no subtasks → READY (normal review gate) + └── has subtasks → BLOCKED (waiting for subtasks) + │ + Each subtask completes → COMPLETED + │ + All subtasks COMPLETED? + ├── yes → maybeUnblockParent() → parent READY + └── no → parent stays BLOCKED +``` + +`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It +loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls +`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`) + +The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects +a `question.json` file in the execution log directory, it returns a `*BlockedError` containing +the question JSON and session ID. The pool stores the question in the `tasks.question_json` +column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer` +resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying +`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`, +`internal/api/server.go:221`) + +--- + +## 9. External Go Dependencies + +| Module | Version | Purpose | +|---|---|---| +| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) | +| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs | +| `github.com/spf13/cobra` | v1.10.2 | CLI command framework | +| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing | +| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server | +| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing | + +--- + +## 10. Related Documentation + +### Architectural Decision Records + +| ADR | Title | Summary | +|---|---|---| +| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale | +| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases | +| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist | +| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier | +| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation | + +### Package-Level Docs + +Per-package design notes live in [`docs/packages/`](packages/) (in progress). + +### Task YAML Reference + +```yaml +name: "My Task" +agent: + type: "claude" # "claude" | "gemini"; optional — load balancer may override + model: "sonnet" # model tier hint; optional — classifier may override + instructions: | + Do something useful. + project_dir: "/workspace/myproject" # if set, runs in a git sandbox + max_budget_usd: 1.00 + permission_mode: "bypassPermissions" # default + allowed_tools: ["Bash", "Read"] + context_files: ["README.md"] +timeout: "15m" +priority: "normal" # high | normal | low +tags: ["ci", "backend"] +depends_on: [""] # explicit ordering +parent_task_id: "" # set by parent agent when creating subtasks +``` + +Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`. + +### Storage Schema + +Two tables auto-migrated on `storage.Open()`: + +- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`, + `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`, + `question_json`, `rejection_comment`, `created_at`, `updated_at` +- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`, + `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`, + `resume_session_id`, `resume_answer` + +Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`, +`executions.status`, `executions.start_time`. diff --git a/.agent/mission.md b/.agent/mission.md new file mode 100644 index 0000000..03fdcf1 --- /dev/null +++ b/.agent/mission.md @@ -0,0 +1,15 @@ +# Project Mission & Strategic Values + +## 1. Core Mission +**Autonomous Engineering Muscle.** Claudomator is the engine that executes complex, multi-repo engineering tasks with human-in-the-loop safety. + +## 2. Strategic Values +- **Durability:** Task state must survive restarts. +- **Isolation:** Execution happens in sandboxes to protect the host environment. +- **Real-time Feedback:** Operators should always know exactly what the agent is doing. +- **Human-in-the-Loop:** Top-level tasks require explicit acceptance. + +## 3. Agent Personality & Role +- **The Proactive Chief of Staff:** Anticipate gaps and propose subtasks. +- **Continuous Clarification:** A "GO" is not a mandate to stop asking questions. +- **Surgical Execution:** Minimal, idiomatic changes. diff --git a/.agent/narrative.md b/.agent/narrative.md new file mode 100644 index 0000000..834d812 --- /dev/null +++ b/.agent/narrative.md @@ -0,0 +1,399 @@ +# Claudomator: Development Narrative + +This document is a chronological engineering history of the Claudomator project, +reconstructed from the git log, ADRs, and source code. + +--- + +## 1. Initial commit — core scaffolding (2e2b218) + +The project started with a single commit that established the full skeleton: +task model, executor, API server, CLI, storage layer, and reporter. The Go module +was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a +`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model, +instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was +chosen as the storage backend (see ADR-001). The executor pool used a bounded +goroutine model. The API server was plain `net/http` with no external framework. +The CLI was Cobra. + +## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d) + +Early housekeeping: added JSON struct tags to all exported types, renamed the Go +module to its final identifier, and set up the `.gitignore` to exclude the compiled +binary and local Claude settings. + +## 3. Verbose flag, logs CLI command (0377c06, f27d4f7) + +Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand +for tailing execution output. + +## 4. Embedded web UI and HTTP wiring (135d8eb) + +The first web UI was embedded into the binary using `go:embed`. This made the +binary fully self-contained: no separate static file server was needed. + +## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007) + +Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to +the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the +`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational +subtask plumbing. + +## 6. Dependency waiting and planning preamble (f527972) + +The executor gained dependency waiting: tasks with `depends_on` now block in a +polling loop until all dependencies reach `COMPLETED`. Any dependency entering a +terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`) +immediately fails the waiting task. + +The planning preamble was also introduced here — a system prompt prefix injected +into every task's instructions that explains to the agent how to write question +files, how to break tasks into subtasks via the `claudomator` CLI, and how to +commit all changes in git sandboxes. + +## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740) + +The API gained several new endpoints: +- `POST /api/elaborate` — calls Claude to expand a brief task description into + structured YAML. +- `GET /api/executions/{id}/stream` — live-streams the execution log. +- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed). +- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task. + +## 8. Web UI: tabs, new task modal, templates panel (e8d1b80) + +The web UI got a tabbed layout (Running / Done / Templates), a modal for creating +new tasks with AI-drafted instructions, and a templates panel. This was the first +version of the UI that matched the current design. + +## 9. READY state for human-in-the-loop review (6511d6e) + +A critical design point: when a top-level task's runner exits successfully, the +task does not immediately go to `COMPLETED`. Instead it transitions to `READY`, +meaning it paused for the operator to review the agent's output and explicitly +accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`. +`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`. + +This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY +and go directly to `COMPLETED` — only the root task requires human sign-off. + +## 10. Fix working_dir failures, hardcoded /root removed (3962597) + +Early deployments hardcoded `/root` as the base path for `working_dir`. This was +removed. `working_dir` is now validated to exist before the subprocess starts. + +## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4) + +Added the `scripts/` directory with `debug-execution` (inspects a specific +execution's logs) and `deploy` (builds and deploys the binary to the production +server). Added a CLI `start` command and the `version` package. + +## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444) + +A batch of features rescued from a detached-work branch: +- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json` + file before exiting. The pool detects this and transitions the task to `BLOCKED`, + storing the question for the user. `POST /api/tasks/{id}/answer` resumes the + Claude session with the user's answer injected as the next message. +- **Rate limiting**: the pool tracks which agents are rate-limited and when. + `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from + 5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline. +- **Start-next-task script**: a shell script that picks the highest-priority pending + task and starts it. + +## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3) + +The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state +and a "Start Next" button in the header that triggers the `start-next-task` script. + +## 14. Stream-level failure detection when claude exits 0 (4c0ee5c) + +Claude can exit 0 even when the task actually failed — for example when the +permission mode denies a tool_use and Claude exits politely. `parseStream` was +updated to detect `is_error: true` in the result message and +`tool_result.is_error: true` with permission-denial text, returning an error in +both cases so the task goes to `FAILED` rather than silently succeeding. + +## 15. Persist log paths at CreateExecution time (f8b5f25) + +Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the +execution record at `UpdateExecution` time (after the subprocess finished). This +prevented live log tailing. Introduced the `LogPather` interface: runners that +implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling +`CreateExecution`, making them available for streaming before the process ends. + +## 16. bypassPermissions as executor default (a33211d) + +`permission_mode` defaults to `bypassPermissions` when not set in the task YAML. +This was a deliberate trade-off: unattended automation needs to proceed without +tool-use confirmation prompts. Operators can override per-task via `permission_mode`. + +## 17. Cancel endpoint and pool cancel mechanism (3672981) + +`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map +from taskID to context cancel functions. Cancellation sends a SIGKILL to the +entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and +bash children that the claude subprocess spawned. + +## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace) + +The full BLOCKED cycle was wired end-to-end: +1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits. +2. Runner detects the file and returns `*BlockedError`. +3. Pool transitions task to `BLOCKED` and stores the question JSON. +4. User answers via `POST /api/tasks/{id}/answer`. +5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID` + and `ResumeAnswer`. +6. Runner invokes `claude --resume -p `. + +A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`, +causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution` +returned no session ID. + +## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c) + +Resume executions now use `context.Background()` instead of inheriting a potentially +stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled +tasks can be manually restarted. + +## 20. git sandbox execution, project_dir rename (1f36e23) + +The `working_dir` field was renamed to `project_dir` across all layers (task YAML, +storage, API, UI). When `project_dir` is set, the runner no longer executes +directly in that directory. Instead it: + +1. Detects whether `project_dir` is a git repo (initialising one if not). +2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks` + to avoid permission issues with mixed-owner `.git/objects`). +3. Runs the agent in the sandbox clone. +4. After the agent exits, verifies no uncommitted changes remain and pushes + new commits to the canonical bare repo. +5. Removes the sandbox. + +On BLOCKED, the sandbox is preserved so the agent can resume where it left off +in the same working tree. + +Concurrent push conflicts (two sandboxes pushing at the same time) are handled +by a fetch-rebase-retry sequence. + +## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2) + +`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If +the transition is not allowed by the state machine, the function returns an error +and no write occurs. This is the enforcement point for the state machine invariants. + +## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97) + +The previous pool rejected `Submit` when all slots were taken. This was replaced +with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted +while the pool is at capacity are buffered in the channel and picked up as soon +as a slot opens. `Submit` now only returns an error if the channel itself is full +(which requires an enormous backlog). + +## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376) + +Several API reliability improvements: +- WebSocket connections now require an API token (if `SetAPIToken` was called) and + are capped at a configurable maximum number of clients. A ping/pong keepalive + prevents stale connections from accumulating. +- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate` + endpoints to prevent abuse. +- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of + individual handlers per script, a single handler dispatches to registered scripts + by name. + +## 24. API: extend executions and log streaming endpoints (7914153) + +`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs` +was added for fetching completed log files. Live streaming via SSE and the log +tail endpoint were polished. + +## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6) + +CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared +HTTP client, a default server URL (`http://localhost:8484`). Added the `report` +CLI subcommand for fetching execution summaries from the server. + +## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822) + +This was a major refactor over several commits: +1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`, + `"gemini"`, etc.). +2. `Pool` was changed from holding a single `ClaudeRunner` to holding a + `map[string]Runner` — one runner per agent type. +3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the + `gemini` CLI. +4. The storage layer, API handlers, elaborate/validate endpoints, and all tests + were updated to use `AgentConfig`. +5. The web UI was updated to expose agent type selection. + +## 27. Gemini-based task classification and explicit load balancing (406247b) + +`Classifier` and `pickAgent` were introduced to automate agent and model selection: + +- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available + (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active + if all agents are rate-limited. +- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick + the best model for the task. This is intentionally model-picks-model: use a fast, + cheap classifier to avoid wasting expensive tokens. + +After this commit the flow is: `execute()` → pick agent → call classifier → set +`t.Agent.Type` and `t.Agent.Model` → dispatch to runner. + +## 28. ADR-003: Security Model (93a4c85) + +The security model was documented formally: no auth, permissive CORS, `bypassPermissions` +as default, and the known risk inventory (see `docs/adr/003-security-model.md`). + +## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30) + +Running tasks became the default view. A "Running view" showing currently running +tasks alongside the 24h execution history was added. Agent type and model were +surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h. + +## 30. Quota exhaustion detection from stream (076c0fa) + +Previously, quota exhaustion (the 5-hour usage limit) was treated identically to +generic failures. `isQuotaExhausted` was introduced to distinguish it: quota +exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the +agent, rather than failing the task with a generic error. + +## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac) + +The sandbox teardown strategy was revised: instead of pushing directly into the +working copy (which fails for non-bare repos), the sandbox pushes to a bare repo +(`remote "local"` or `remote "origin"`) and the working copy is pulled separately +by the developer. This avoids permission errors from mixed-owner `.git/objects`. +The `--no-hardlinks` clone flag was added to prevent object sharing. + +## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46) + +When a top-level task exits the runner successfully but has subtasks, it transitions +to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new +`maybeUnblockParent` function is called every time a subtask completes: if all +siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is +presented for operator review. + +## 33. Stale RUNNING task recovery on server startup (9159572) + +`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for +tasks still in `RUNNING` state (left over from a previous server crash) and marks +them `FAILED`, closing their open execution records. This prevents stuck tasks +after server restarts. + +## 34. API: configurable mockRunner, async error-path tests (b33566b) + +The `api` test suite was hardened with a configurable `mockRunner` that can be +injected into the test server. Async error paths (runner returns an error, DB +update fails mid-execution) were now exercised in tests. + +## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409) + +Several storage correctness fixes: +- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`, + `idx_executions_task_id`, and `idx_executions_start_time` indexes were added. +- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions; + tests were added to catch this. +- `DeleteTask` was made atomic using a recursive CTE to delete the task and all + its subtasks in a single transaction. + +## 36. API: validate ?state= param, standardize operation response shapes (933af81) + +`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation +responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised +to return `{"status": "ok"}` on success. + +## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c) + +Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through +classification again so they pick up the latest agent/model selection logic. The +post-run error classification block was extracted into `handleRunResult` — a shared +helper called by both `execute` and `executeResume` — eliminating 60+ lines of +duplication. + +## 38. Legacy Claude field removed (b4371d0, a782bbf) + +The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir` +shim were removed. The schema is now fully generic. + +## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e) + +A documented invariant was added to the `execOnce` goroutine that kills the +subprocess process group: it cannot block indefinitely. Tests were added to verify +no goroutine leak occurs when a task is cancelled. + +## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b) + +The classifier now skips calling itself if the selected agent is rate-limited, +avoiding a redundant Gemini API call when the rate-limited agent is already known. +The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`, +`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`, +`gemini-2.5-flash`, `gemini-2.5-pro`). + +## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b) + +Two map leak bugs were fixed in the pool: +- `activePerAgent[agentType]` was decremented but never deleted when the count hit + zero, so inactive agents accumulated as dead entries. +- Expired `rateLimited[agentType]` entries were not deleted, so the map grew + unboundedly over long runs. + +## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624) + +The sandbox teardown removed the `git pull` into the working copy (which was failing +due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to +detect `"fetch first"` and `"non-fast-forward"` as the rejection signals. + +## 43. Explicit load balancing separated from classification (e033504) + +Previously the `Classifier` both picked the agent and selected the model. This was +split: `pickAgent` is deterministic code that picks the agent from the registered +runners using the load-balancing algorithm. The `Classifier` only picks the model +for the already-selected agent. This makes load balancing reliable and fast even +when the Gemini classifier is unavailable. + +## 44. Session ID fix on second block-and-resume cycle (65c7638) + +A bug was found where the second BLOCKED→answer→resume cycle passed the wrong +`--resume` session ID to Claude. The fix ensures that resume executions propagate +the original session ID rather than the new execution's UUID. + +## 45. validTransitions promoted to package-level var (3226af3) + +`validTransitions` was promoted to a package-level variable in `internal/task/task.go` +for clarity and potential reuse outside the package. ADR-002 was updated to reflect +the current state machine including the `BLOCKED→READY` transition for parent tasks. + +--- + +## Feature Summary (current state) + +| Feature | Status | +|---|---| +| Task YAML parsing, batch files | Done | +| SQLite persistence | Done | +| REST API (CRUD + lifecycle) | Done | +| WebSocket real-time events | Done | +| Claude subprocess execution | Done | +| Gemini subprocess execution | Done | +| Explicit load balancing (pickAgent) | Done | +| Gemini-based model classification | Done | +| BLOCKED / question-answer / resume | Done | +| git sandbox isolation | Done | +| Subtask creation and unblocking | Done | +| READY state / human accept-reject | Done | +| Rate-limit and quota tracking | Done | +| Stale RUNNING recovery on startup | Done | +| Per-IP rate limiter on elaborate | Done | +| Web UI (PWA) | Done | +| Push notifications (PWA) | Planned | + +--- 2026-03-16T00:56:20Z --- +Converter sudoku to rust + +--- 2026-03-16T01:14:27Z --- +For claudomator tasks that are ready, check the deployed server version against their fix commit + +--- 2026-03-16T01:17:00Z --- +For every claudomator task that is ready, display on the task whether the currently deployed server includes the commit which fixes that task diff --git a/.agent/preferences.md b/.agent/preferences.md new file mode 100644 index 0000000..20ddfd6 --- /dev/null +++ b/.agent/preferences.md @@ -0,0 +1,8 @@ +# User Preferences & Workflow Quirks + +This file is a living record of user-specific preferences. Agents must update it as new facts are revealed. + +## 1. Interaction & Workflow +- **Safety First:** Cautious and deliberate action is preferred. +- **Checkpoint Model:** Research -> Strategy -> GO. +- **Clarification:** Ask for help when hitting ambiguities during execution. diff --git a/.agent/worklog.md b/.agent/worklog.md new file mode 100644 index 0000000..6fb8033 --- /dev/null +++ b/.agent/worklog.md @@ -0,0 +1,72 @@ +# SESSION_STATE.md + +## Current Task Goal +ADR-007 implementation: Epic→Story→Task→Subtask hierarchy, project registry, Doot integration + +## Status: IN_PROGRESS + +--- + +## Completed Items + +| Step | Description | Test / Verification | +|------|-------------|---------------------| +| Phase 1 | Doot dead code removal: Bug struct, BugToAtom, bug store methods, bug handlers, bug routes, bugs.html template, TypeNote, AddMealToPlanner stub | `go test ./...` in /workspace/doot — all pass (2 pre-existing failures unrelated) | +| Phase 2 | Claudomator project registry: `task.Project` type, storage CRUD + UpsertProject, seed.go, API endpoints (GET/POST /api/projects, GET/PUT /api/projects/{id}), legacy AgentConfig.ProjectDir/RepositoryURL/SkipPlanning fields removed, container.go fallback removed, fallbackGitInit removed, processResult changestats extraction removed (pool-side only) | `TestCreateProject`, `TestListProjects`, `TestUpdateProject`, `TestProjects_CRUD` — all pass | + +--- + +## Next Steps (Claudomator tasks created) + +Phases 3–6 are queued as Claudomator tasks. See `ct task list` or the web UI. + +| Task ID | Phase | Status | Depends On | +|---------|-------|--------|------------| +| f8829d6f-b8b6-4ff2-9c1a-e55dd3ab300e | Phase 3: Stories data model | PENDING | — | +| c8a0dc6c-0605-4acb-a789-1155ad8824cb | Phase 4: Story execution and deploy | PENDING | Phase 3 | +| faf5a371-8f1c-46a3-bb74-b0df1f062dee | Phase 5: Story elaboration | PENDING | Phase 3 | +| f39af70f-72c5-4ac1-9522-83c2e11b37c9 | Phase 6: Doot — Claudomator integration | PENDING | Phase 3 | + +Instruction files: `scripts/.claude/phase{3,4,5,6}-*-instructions.txt` + +### Phase 3: Stories data model (claudomator repo) +- `internal/task/story.go` — Story struct + ValidStoryTransition +- `internal/storage/db.go` — stories table + story_id on tasks, CRUD + ListTasksByStory +- `internal/api/stories.go` — story API endpoints +- Tests: ValidStoryTransition, CRUD, depends_on auto-wire + +### Phase 4: Story execution and deploy (claudomator repo, depends Phase 3) +- `internal/executor/executor.go` — checkStoryCompletion → SHIPPABLE +- `internal/executor/container.go` — checkout story branch after clone +- `internal/api/stories.go` — POST /api/stories/{id}/branch + +### Phase 5: Story elaboration (claudomator repo, depends Phase 3) +- `internal/api/elaborate.go` — POST /api/stories/elaborate + approve +- SeedProjects called at server startup + +### Phase 6: Doot — Claudomator integration (doot repo, depends Phase 3) +- `internal/api/claudomator.go` — ClaudomatorClient +- `internal/models/atom.go` — StoryToAtom, SourceClaudomator +- `internal/handlers/atoms.go` — BuildUnifiedAtomList extended +- `cmd/dashboard/main.go` — wire ClaudomatorURL config + +--- + +## Key Files Changed (Phases 1-2) + +### Claudomator +- `internal/task/project.go` — new Project struct +- `internal/task/task.go` — removed Agent.ProjectDir, Agent.RepositoryURL, Agent.SkipPlanning +- `internal/storage/db.go` — projects table migration + CRUD +- `internal/storage/seed.go` — SeedProjects upserts claudomator + nav on startup +- `internal/api/projects.go` — project CRUD handlers +- `internal/api/server.go` — project routes; processResult no longer extracts changestats +- `internal/api/deployment.go` + `task_view.go` — use tk.RepositoryURL (was tk.Agent.ProjectDir) +- `internal/executor/container.go` — fallback logic removed; requires t.RepositoryURL + +### Doot +- Bug feature removed entirely (models, handlers, store, routes, template, migration) +- `migrations/018_drop_bugs.sql` — DROP TABLE IF EXISTS bugs +- `internal/api/interfaces.go` — AddMealToPlanner removed from PlanToEatAPI +- `internal/api/plantoeat.go` — AddMealToPlanner stub removed +- `internal/models/atom.go` — SourceBug, TypeBug, TypeNote, BugToAtom removed diff --git a/.gemini/GEMINI.md b/.gemini/GEMINI.md new file mode 100644 index 0000000..6f30096 --- /dev/null +++ b/.gemini/GEMINI.md @@ -0,0 +1,7 @@ +# Claudomator — Gemini CLI Instructions + +This repository uses a centralized agent configuration. + +**Primary Source of Truth:** ".agent/config.md" + +Refer to ".agent/config.md" before performing any tasks. These project-specific instructions take absolute precedence over general defaults. diff --git a/CLAUDE.md b/CLAUDE.md index 7ef8d63..5b01d7b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,227 +1,12 @@ -# CLAUDE.md +# Claudomator — Agent Instructions -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +This repository uses a centralized agent configuration. -Also check `~/.claude/CLAUDE.md` for user-level development standards (TDD workflow, git practices, session state management, etc.) that apply globally across all projects. +**Primary Source of Truth:** ".agent/config.md" -## Canonical Repository +## Quick Reference +- **Worklog:** ".agent/worklog.md" +- **Design:** ".agent/design.md" +- **Coding Standards:** ".agent/coding_standards.md" -**The canonical source of truth is `/workspace/claudomator`.** All development must happen here. -Do not work in any other directory unless explicitly instructed. Do not explore `/site/doot.terst.org/` for source files. - -## Build & Test Commands - -```bash -# Build -go build ./... - -# Run all tests -go test ./... - -# Run a single package's tests -go test ./internal/executor/... - -# Run a single test by name -go test ./internal/api/ -run TestServer_CreateTask_MissingName - -# Run with race detector (important for executor/pool tests) -go test -race ./... - -# Build the binary -go build -o claudomator ./cmd/claudomator/ -``` - -> **Note:** `go-sqlite3` uses CGo. A C compiler (`gcc`) must be present for builds and tests. - -## Running the Server - -```bash -# Initialize data directory -./claudomator init - -# Start API server (default :8484) -./claudomator serve - -# Run a task file directly (bypasses server) -./claudomator run ./test/fixtures/tasks/simple-task.yaml - -# List tasks via CLI -./claudomator list -``` - -Config defaults to `~/.claudomator/config.toml`. Data is stored in `~/.claudomator/` (SQLite DB + execution logs). - -## Architecture - -**Pipeline:** CLI/API → `executor.Pool` → `executor.ContainerRunner` → Docker container → SQLite + log files - -### Packages - -| Package | Role | -|---|---| -| `internal/task` | `Task` struct, YAML parsing, state machine, validation | -| `internal/executor` | `Pool` (bounded goroutine pool) + `ContainerRunner` (Docker-based executor) | -| `internal/storage` | SQLite wrapper; stores tasks and execution records | -| `internal/api` | HTTP server (REST + WebSocket via `internal/api.Hub`) | -| `internal/reporter` | Formats and emits execution results | -| `internal/config` | TOML config + data dir layout | -| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | - -### Key Data Flows - -**Task execution:** -1. Task created via `POST /api/tasks` or YAML file (`task.ParseFile`) -2. `POST /api/tasks/{id}/run` → `executor.Pool.Submit()` → goroutine in pool -3. `ContainerRunner.Run()` clones `repository_url`, runs `docker run claudomator-agent:latest` -4. Agent runs `claude -p` inside the container; stdout streamed to `executions//stdout.log` -5. On success, runner pushes commits back to the remote; execution result written to SQLite + WebSocket broadcast - -**State machine** (`task.ValidTransition`): -`PENDING` → `QUEUED` → `RUNNING` → `COMPLETED | FAILED | TIMED_OUT | CANCELLED | BUDGET_EXCEEDED` -Failed tasks can retry: `FAILED` → `QUEUED` - -**WebSocket:** `Hub` fans out task completion events to all connected clients. `Server.StartHub()` must be called after creating the server. - -### Task YAML Format - -```yaml -name: "My Task" -claude: - model: "sonnet" - instructions: | - Do something useful. - working_dir: "/path/to/project" - max_budget_usd: 1.00 - permission_mode: "default" - allowed_tools: ["Bash", "Read"] -timeout: "15m" -priority: "normal" # high | normal | low -tags: ["ci"] -``` - -Batch files wrap multiple tasks under a `tasks:` key. - -### Storage Schema - -Two tables: `tasks` (with `config_json`, `retry_json`, `tags_json`, `depends_on_json` as JSON blobs) and `executions` (with paths to log files). Schema is auto-migrated on `storage.Open()`. - -## Features - -### Changestats - -After each task execution, Claudomator extracts git diff statistics from the execution's stdout log. If the log contains a git `diff --stat` summary line (e.g. `5 files changed, 127 insertions(+), 43 deletions(-)`), the stats are parsed and stored in the `executions.changestats_json` column via `storage.DB.UpdateExecutionChangestats`. - -**Extraction points:** -- `internal/executor.Pool.handleRunResult` — calls `task.ParseChangestatFromFile(exec.StdoutPath)` after every execution; stores via `Store.UpdateExecutionChangestats`. -- `internal/api.Server.processResult` — also extracts changestats when the API server processes a result (same file, idempotent second write). - -**Parser location:** `internal/task/changestats.go` — exported functions `ParseChangestatFromOutput` and `ParseChangestatFromFile` usable by any package without creating circular imports. - -**Frontend display:** `web/app.js` renders a `.changestats-badge` on COMPLETED/READY task cards and in execution history rows. - -## GitHub Webhook Integration - -Claudomator can automatically create tasks when CI builds fail on GitHub. - -### Endpoint - -`POST /api/webhooks/github` - -Accepts `check_run` and `workflow_run` events from GitHub. Returns `{"task_id": "..."}` (200) when a task is created, or 204 when the event is ignored. - -### Config (`~/.claudomator/config.toml`) - -```toml -# Optional: HMAC-SHA256 secret set in the GitHub webhook settings. -# If omitted, signature validation is skipped. -webhook_secret = "your-github-webhook-secret" - -# Projects for matching incoming webhook repository names to local directories. -[[projects]] -name = "myrepo" -dir = "/workspace/myrepo" - -[[projects]] -name = "other-service" -dir = "/workspace/other-service" -``` - -### Matching logic - -The handler matches the webhook's `repository.name` against each project's `name` and the basename of its `dir` (case-insensitive substring). If no match is found and only one project is configured, that project is used as a fallback. - -### GitHub webhook setup - -In your GitHub repository → Settings → Webhooks → Add webhook: -- **Payload URL:** `https:///api/webhooks/github` -- **Content type:** `application/json` -- **Secret:** value of `webhook_secret` in config (or leave blank if not configured) -- **Events:** select *Workflow runs* and *Check runs* - -### Task creation - -A task is created for: -- `check_run` events with `action: completed` and `conclusion: failure` -- `workflow_run` events with `action: completed` and `conclusion: failure` or `timed_out` - -Tasks are tagged `["ci", "auto"]`, capped at $3 USD, and use tools: Read, Edit, Bash, Glob, Grep. - -## System Maintenance (Cron) - -The following crontab entries are required for system operation and must be maintained for the root user: - -```crontab -# Sync Claude and Gemini credentials every 10 minutes -*/10 * * * * /workspace/claudomator/scripts/sync-credentials - -# Start the next queued task every 20 minutes -*/20 * * * * /workspace/claudomator/scripts/start-next-task >> /var/log/claudomator-cron.log 2>&1 -``` - -> **Note:** These requirements are critical for agent authentication and automated task progression. - -## Agent Tooling (`ct` CLI) - -Agents running inside containers have access to `ct`, a pre-built CLI for interacting with the Claudomator API. It is installed at `/usr/local/bin/ct` in the container image. **Use `ct` to create and manage subtasks — do not attempt raw `curl` API calls.** - -### Environment (injected automatically) - -| Variable | Purpose | -|---|---| -| `CLAUDOMATOR_API_URL` | Base URL of the Claudomator API (e.g. `http://host.docker.internal:8484`) | -| `CLAUDOMATOR_TASK_ID` | ID of the currently-running task; used as the default `parent_task_id` for new subtasks | - -### Commands - -```bash -# Create a subtask and immediately queue it (returns task ID) -ct task submit --name "Fix tests" --instructions "Run tests and fix any failures." [--model sonnet] [--budget 3.0] - -# Create, queue, and wait for completion (exits 0=COMPLETED, 1=FAILED, 2=BLOCKED) -ct task submit --name "Fix tests" --instructions "..." --wait - -# Read instructions from a file instead of inline -ct task submit --name "Fix tests" --file /workspace/subtask-instructions.txt --wait - -# Lower-level: create only (returns task ID), then run separately -TASK_ID=$(ct task create --name "..." --instructions "...") -ct task run "$TASK_ID" -ct task wait "$TASK_ID" --timeout 600 - -# Check status of any task -ct task status - -# List recent tasks -ct task list -``` - -### Notes - -- Default model is `sonnet`; default budget is `$3.00 USD`. Override with `--model` / `--budget`. -- `ct task wait` polls every 5 seconds and exits with the task's terminal state on stdout. -- Subtasks inherit the current task as their parent automatically (via `$CLAUDOMATOR_TASK_ID`). -- Override parent with `--parent ` if needed. - -## ADRs - -See `docs/adr/001-language-and-architecture.md` for the Go + SQLite + WebSocket rationale. +Refer to ".agent/config.md" before performing any tasks. diff --git a/SESSION_STATE.md b/SESSION_STATE.md deleted file mode 100644 index 6fb8033..0000000 --- a/SESSION_STATE.md +++ /dev/null @@ -1,72 +0,0 @@ -# SESSION_STATE.md - -## Current Task Goal -ADR-007 implementation: Epic→Story→Task→Subtask hierarchy, project registry, Doot integration - -## Status: IN_PROGRESS - ---- - -## Completed Items - -| Step | Description | Test / Verification | -|------|-------------|---------------------| -| Phase 1 | Doot dead code removal: Bug struct, BugToAtom, bug store methods, bug handlers, bug routes, bugs.html template, TypeNote, AddMealToPlanner stub | `go test ./...` in /workspace/doot — all pass (2 pre-existing failures unrelated) | -| Phase 2 | Claudomator project registry: `task.Project` type, storage CRUD + UpsertProject, seed.go, API endpoints (GET/POST /api/projects, GET/PUT /api/projects/{id}), legacy AgentConfig.ProjectDir/RepositoryURL/SkipPlanning fields removed, container.go fallback removed, fallbackGitInit removed, processResult changestats extraction removed (pool-side only) | `TestCreateProject`, `TestListProjects`, `TestUpdateProject`, `TestProjects_CRUD` — all pass | - ---- - -## Next Steps (Claudomator tasks created) - -Phases 3–6 are queued as Claudomator tasks. See `ct task list` or the web UI. - -| Task ID | Phase | Status | Depends On | -|---------|-------|--------|------------| -| f8829d6f-b8b6-4ff2-9c1a-e55dd3ab300e | Phase 3: Stories data model | PENDING | — | -| c8a0dc6c-0605-4acb-a789-1155ad8824cb | Phase 4: Story execution and deploy | PENDING | Phase 3 | -| faf5a371-8f1c-46a3-bb74-b0df1f062dee | Phase 5: Story elaboration | PENDING | Phase 3 | -| f39af70f-72c5-4ac1-9522-83c2e11b37c9 | Phase 6: Doot — Claudomator integration | PENDING | Phase 3 | - -Instruction files: `scripts/.claude/phase{3,4,5,6}-*-instructions.txt` - -### Phase 3: Stories data model (claudomator repo) -- `internal/task/story.go` — Story struct + ValidStoryTransition -- `internal/storage/db.go` — stories table + story_id on tasks, CRUD + ListTasksByStory -- `internal/api/stories.go` — story API endpoints -- Tests: ValidStoryTransition, CRUD, depends_on auto-wire - -### Phase 4: Story execution and deploy (claudomator repo, depends Phase 3) -- `internal/executor/executor.go` — checkStoryCompletion → SHIPPABLE -- `internal/executor/container.go` — checkout story branch after clone -- `internal/api/stories.go` — POST /api/stories/{id}/branch - -### Phase 5: Story elaboration (claudomator repo, depends Phase 3) -- `internal/api/elaborate.go` — POST /api/stories/elaborate + approve -- SeedProjects called at server startup - -### Phase 6: Doot — Claudomator integration (doot repo, depends Phase 3) -- `internal/api/claudomator.go` — ClaudomatorClient -- `internal/models/atom.go` — StoryToAtom, SourceClaudomator -- `internal/handlers/atoms.go` — BuildUnifiedAtomList extended -- `cmd/dashboard/main.go` — wire ClaudomatorURL config - ---- - -## Key Files Changed (Phases 1-2) - -### Claudomator -- `internal/task/project.go` — new Project struct -- `internal/task/task.go` — removed Agent.ProjectDir, Agent.RepositoryURL, Agent.SkipPlanning -- `internal/storage/db.go` — projects table migration + CRUD -- `internal/storage/seed.go` — SeedProjects upserts claudomator + nav on startup -- `internal/api/projects.go` — project CRUD handlers -- `internal/api/server.go` — project routes; processResult no longer extracts changestats -- `internal/api/deployment.go` + `task_view.go` — use tk.RepositoryURL (was tk.Agent.ProjectDir) -- `internal/executor/container.go` — fallback logic removed; requires t.RepositoryURL - -### Doot -- Bug feature removed entirely (models, handlers, store, routes, template, migration) -- `migrations/018_drop_bugs.sql` — DROP TABLE IF EXISTS bugs -- `internal/api/interfaces.go` — AddMealToPlanner removed from PlanToEatAPI -- `internal/api/plantoeat.go` — AddMealToPlanner stub removed -- `internal/models/atom.go` — SourceBug, TypeBug, TypeNote, BugToAtom removed diff --git a/docs/RAW_NARRATIVE.md b/docs/RAW_NARRATIVE.md deleted file mode 100644 index 834d812..0000000 --- a/docs/RAW_NARRATIVE.md +++ /dev/null @@ -1,399 +0,0 @@ -# Claudomator: Development Narrative - -This document is a chronological engineering history of the Claudomator project, -reconstructed from the git log, ADRs, and source code. - ---- - -## 1. Initial commit — core scaffolding (2e2b218) - -The project started with a single commit that established the full skeleton: -task model, executor, API server, CLI, storage layer, and reporter. The Go module -was `github.com/thepeterstone/claudomator`. The initial `Task` struct had a -`ClaudeConfig` field (later renamed to `AgentConfig`) holding the model, -instructions, `working_dir`, budget, permission mode, and tool lists. SQLite was -chosen as the storage backend (see ADR-001). The executor pool used a bounded -goroutine model. The API server was plain `net/http` with no external framework. -The CLI was Cobra. - -## 2. JSON tags, module rename, gitignore (8ee1fb5, 46ba3f5, 2bf317d) - -Early housekeeping: added JSON struct tags to all exported types, renamed the Go -module to its final identifier, and set up the `.gitignore` to exclude the compiled -binary and local Claude settings. - -## 3. Verbose flag, logs CLI command (0377c06, f27d4f7) - -Added `--verbose` to the Claude subprocess invocation and a `logs` CLI subcommand -for tailing execution output. - -## 4. Embedded web UI and HTTP wiring (135d8eb) - -The first web UI was embedded into the binary using `go:embed`. This made the -binary fully self-contained: no separate static file server was needed. - -## 5. CLAUDE.md, clickable fold, subtask support (bdcc33f, 3881f80, 704d007) - -Added the project-level `CLAUDE.md` guidance document. Added a clickable fold to -the web UI to expand hidden completed/failed tasks. Added `parent_task_id` to the -`Task` struct, `ListSubtasks` to storage, and `UpdateTask` — the foundational -subtask plumbing. - -## 6. Dependency waiting and planning preamble (f527972) - -The executor gained dependency waiting: tasks with `depends_on` now block in a -polling loop until all dependencies reach `COMPLETED`. Any dependency entering a -terminal failure state (`FAILED`, `TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`) -immediately fails the waiting task. - -The planning preamble was also introduced here — a system prompt prefix injected -into every task's instructions that explains to the agent how to write question -files, how to break tasks into subtasks via the `claudomator` CLI, and how to -commit all changes in git sandboxes. - -## 7. Elaborate, logs-stream, templates, subtask-list endpoints (74cc740) - -The API gained several new endpoints: -- `POST /api/elaborate` — calls Claude to expand a brief task description into - structured YAML. -- `GET /api/executions/{id}/stream` — live-streams the execution log. -- `GET /api/templates` / `POST /api/templates` — task template CRUD (later removed). -- `GET /api/tasks/{id}/subtasks` — lists subtasks for a parent task. - -## 8. Web UI: tabs, new task modal, templates panel (e8d1b80) - -The web UI got a tabbed layout (Running / Done / Templates), a modal for creating -new tasks with AI-drafted instructions, and a templates panel. This was the first -version of the UI that matched the current design. - -## 9. READY state for human-in-the-loop review (6511d6e) - -A critical design point: when a top-level task's runner exits successfully, the -task does not immediately go to `COMPLETED`. Instead it transitions to `READY`, -meaning it paused for the operator to review the agent's output and explicitly -accept or reject it. `READY → COMPLETED` requires `POST /api/tasks/{id}/accept`. -`READY → PENDING` (for re-running) requires `POST /api/tasks/{id}/reject`. - -This is specific to top-level tasks. Subtasks (`parent_task_id != ""`) bypass READY -and go directly to `COMPLETED` — only the root task requires human sign-off. - -## 10. Fix working_dir failures, hardcoded /root removed (3962597) - -Early deployments hardcoded `/root` as the base path for `working_dir`. This was -removed. `working_dir` is now validated to exist before the subprocess starts. - -## 11. Scripts, debug-execution, deploy (2bbae74, f7c6de4) - -Added the `scripts/` directory with `debug-execution` (inspects a specific -execution's logs) and `deploy` (builds and deploys the binary to the production -server). Added a CLI `start` command and the `version` package. - -## 12. Rescue from recovery branch — question/answer, rate limiting, start-next-task (cf83444) - -A batch of features rescued from a detached-work branch: -- **Question/answer flow (`BLOCKED` state)**: agents can write a `question.json` - file before exiting. The pool detects this and transitions the task to `BLOCKED`, - storing the question for the user. `POST /api/tasks/{id}/answer` resumes the - Claude session with the user's answer injected as the next message. -- **Rate limiting**: the pool tracks which agents are rate-limited and when. - `isRateLimitError` and `isQuotaExhausted` distinguish transient throttles from - 5-hour quota exhaustion. The per-agent `rateLimited` map stores the deadline. -- **Start-next-task script**: a shell script that picks the highest-priority pending - task and starts it. - -## 13. Accept/Reject for READY tasks, Start Next button in UI (9e790e3) - -The web UI gained explicit Accept/Reject buttons for tasks in the `READY` state -and a "Start Next" button in the header that triggers the `start-next-task` script. - -## 14. Stream-level failure detection when claude exits 0 (4c0ee5c) - -Claude can exit 0 even when the task actually failed — for example when the -permission mode denies a tool_use and Claude exits politely. `parseStream` was -updated to detect `is_error: true` in the result message and -`tool_result.is_error: true` with permission-denial text, returning an error in -both cases so the task goes to `FAILED` rather than silently succeeding. - -## 15. Persist log paths at CreateExecution time (f8b5f25) - -Previously, `StdoutPath`, `StderrPath`, and `ArtifactDir` were only written to the -execution record at `UpdateExecution` time (after the subprocess finished). This -prevented live log tailing. Introduced the `LogPather` interface: runners that -implement `ExecLogDir(execID)` allow the pool to pre-populate paths before calling -`CreateExecution`, making them available for streaming before the process ends. - -## 16. bypassPermissions as executor default (a33211d) - -`permission_mode` defaults to `bypassPermissions` when not set in the task YAML. -This was a deliberate trade-off: unattended automation needs to proceed without -tool-use confirmation prompts. Operators can override per-task via `permission_mode`. - -## 17. Cancel endpoint and pool cancel mechanism (3672981) - -`POST /api/tasks/{id}/cancel` was implemented. The pool maintains a `cancels` map -from taskID to context cancel functions. Cancellation sends a SIGKILL to the -entire process group (via `syscall.Kill(-pgid, SIGKILL)`) to reap MCP servers and -bash children that the claude subprocess spawned. - -## 18. BLOCKED state, session resume, fix: persist session_id (7466b17, 40d9ace) - -The full BLOCKED cycle was wired end-to-end: -1. Agent writes `question.json` to `$CLAUDOMATOR_QUESTION_FILE` and exits. -2. Runner detects the file and returns `*BlockedError`. -3. Pool transitions task to `BLOCKED` and stores the question JSON. -4. User answers via `POST /api/tasks/{id}/answer`. -5. Pool calls `SubmitResume` with a new `Execution` carrying `ResumeSessionID` - and `ResumeAnswer`. -6. Runner invokes `claude --resume -p `. - -A bug was found and fixed: `session_id` was not persisted in `UpdateExecution`, -causing the BLOCKED → answer → resume cycle to fail because `GetLatestExecution` -returned no session ID. - -## 19. Context.Background for resume execution; CANCELLED→QUEUED restart (7d4890c) - -Resume executions now use `context.Background()` instead of inheriting a potentially -stale context. `CANCELLED → QUEUED` was added as a valid transition so cancelled -tasks can be manually restarted. - -## 20. git sandbox execution, project_dir rename (1f36e23) - -The `working_dir` field was renamed to `project_dir` across all layers (task YAML, -storage, API, UI). When `project_dir` is set, the runner no longer executes -directly in that directory. Instead it: - -1. Detects whether `project_dir` is a git repo (initialising one if not). -2. Clones the repo into `/tmp/claudomator-sandbox-*` (using `--no-hardlinks` - to avoid permission issues with mixed-owner `.git/objects`). -3. Runs the agent in the sandbox clone. -4. After the agent exits, verifies no uncommitted changes remain and pushes - new commits to the canonical bare repo. -5. Removes the sandbox. - -On BLOCKED, the sandbox is preserved so the agent can resume where it left off -in the same working tree. - -Concurrent push conflicts (two sandboxes pushing at the same time) are handled -by a fetch-rebase-retry sequence. - -## 21. Storage: enforce valid state transitions in UpdateTaskState (8777bf2) - -`storage.DB.UpdateTaskState` now calls `task.ValidTransition` before writing. If -the transition is not allowed by the state machine, the function returns an error -and no write occurs. This is the enforcement point for the state machine invariants. - -## 22. Executor internal dispatch queue; remove at-capacity rejection (2cf6d97) - -The previous pool rejected `Submit` when all slots were taken. This was replaced -with an internal `workCh` channel and a `dispatch` goroutine: tasks submitted -while the pool is at capacity are buffered in the channel and picked up as soon -as a slot opens. `Submit` now only returns an error if the channel itself is full -(which requires an enormous backlog). - -## 23. API hardening — WebSocket auth, per-IP rate limiter, script registry (363fc9e, 417034b, 181a376) - -Several API reliability improvements: -- WebSocket connections now require an API token (if `SetAPIToken` was called) and - are capped at a configurable maximum number of clients. A ping/pong keepalive - prevents stale connections from accumulating. -- A per-IP rate limiter was added to the `/api/elaborate` and `/api/validate` - endpoints to prevent abuse. -- The scripts endpoints were collapsed into a generic `ScriptRegistry`: instead of - individual handlers per script, a single handler dispatches to registered scripts - by name. - -## 24. API: extend executions and log streaming endpoints (7914153) - -`GET /api/executions` gained filtering and sorting. `GET /api/executions/{id}/logs` -was added for fetching completed log files. Live streaming via SSE and the log -tail endpoint were polished. - -## 25. CLI: newLogger, shared HTTP client, report command (1ce83b6) - -CLI utilities consolidated: a shared logger constructor (`newLogger`), a shared -HTTP client, a default server URL (`http://localhost:8484`). Added the `report` -CLI subcommand for fetching execution summaries from the server. - -## 26. Generic agent architecture — transition from Claude-only (306482d to f2d6822) - -This was a major refactor over several commits: -1. `ClaudeConfig` was renamed to `AgentConfig` with a new `Type` field (`"claude"`, - `"gemini"`, etc.). -2. `Pool` was changed from holding a single `ClaudeRunner` to holding a - `map[string]Runner` — one runner per agent type. -3. `GeminiRunner` was implemented, mirroring `ClaudeRunner` but invoking the - `gemini` CLI. -4. The storage layer, API handlers, elaborate/validate endpoints, and all tests - were updated to use `AgentConfig`. -5. The web UI was updated to expose agent type selection. - -## 27. Gemini-based task classification and explicit load balancing (406247b) - -`Classifier` and `pickAgent` were introduced to automate agent and model selection: - -- **`pickAgent(SystemStatus)`** — explicit load balancing: picks the available - (non-rate-limited) agent with the fewest active tasks. Falls back to fewest-active - if all agents are rate-limited. -- **`Classifier`** — calls the Gemini CLI with a meta-prompt asking it to pick - the best model for the task. This is intentionally model-picks-model: use a fast, - cheap classifier to avoid wasting expensive tokens. - -After this commit the flow is: `execute()` → pick agent → call classifier → set -`t.Agent.Type` and `t.Agent.Model` → dispatch to runner. - -## 28. ADR-003: Security Model (93a4c85) - -The security model was documented formally: no auth, permissive CORS, `bypassPermissions` -as default, and the known risk inventory (see `docs/adr/003-security-model.md`). - -## 29. Various web UI improvements (91fd904, 7b53b9e, 560f42b, cdfdc30) - -Running tasks became the default view. A "Running view" showing currently running -tasks alongside the 24h execution history was added. Agent type and model were -surfaced on running task cards. The Done/Interrupted tabs were filtered to 24h. - -## 30. Quota exhaustion detection from stream (076c0fa) - -Previously, quota exhaustion (the 5-hour usage limit) was treated identically to -generic failures. `isQuotaExhausted` was introduced to distinguish it: quota -exhaustion maps to `BUDGET_EXCEEDED` and sets a 5-hour rate-limit deadline on the -agent, rather than failing the task with a generic error. - -## 31. Sandbox fixes — push via bare repo, fetch/rebase (cfbcc7b, f135ab8, 07061ac) - -The sandbox teardown strategy was revised: instead of pushing directly into the -working copy (which fails for non-bare repos), the sandbox pushes to a bare repo -(`remote "local"` or `remote "origin"`) and the working copy is pulled separately -by the developer. This avoids permission errors from mixed-owner `.git/objects`. -The `--no-hardlinks` clone flag was added to prevent object sharing. - -## 32. BLOCKED→READY for parent tasks with subtasks (441ed9e, c8e3b46) - -When a top-level task exits the runner successfully but has subtasks, it transitions -to `BLOCKED` (waiting for subtasks to finish) rather than `READY`. A new -`maybeUnblockParent` function is called every time a subtask completes: if all -siblings are `COMPLETED`, the parent transitions `BLOCKED → READY` and is -presented for operator review. - -## 33. Stale RUNNING task recovery on server startup (9159572) - -`Pool.RecoverStaleRunning()` was added and called from `cli.serve`. It queries for -tasks still in `RUNNING` state (left over from a previous server crash) and marks -them `FAILED`, closing their open execution records. This prevents stuck tasks -after server restarts. - -## 34. API: configurable mockRunner, async error-path tests (b33566b) - -The `api` test suite was hardened with a configurable `mockRunner` that can be -injected into the test server. Async error paths (runner returns an error, DB -update fails mid-execution) were now exercised in tests. - -## 35. Storage: missing indexes, ListRecentExecutions tests, DeleteTask atomicity (8b6c97e, 3610409) - -Several storage correctness fixes: -- `idx_tasks_state`, `idx_tasks_parent_task_id`, `idx_executions_status`, - `idx_executions_task_id`, and `idx_executions_start_time` indexes were added. -- `ListRecentExecutions` had an off-by-one that caused it to miss recent executions; - tests were added to catch this. -- `DeleteTask` was made atomic using a recursive CTE to delete the task and all - its subtasks in a single transaction. - -## 36. API: validate ?state= param, standardize operation response shapes (933af81) - -`GET /api/tasks?state=XYZ` now validates the state value. All mutating operation -responses (`/run`, `/cancel`, `/accept`, `/reject`, `/answer`) were standardised -to return `{"status": "ok"}` on success. - -## 37. Re-classify on manual restart; handleRunResult extraction (0676f0f, 7d6943c) - -Tasks that are manually restarted (from `FAILED`, `CANCELLED`, etc.) now go through -classification again so they pick up the latest agent/model selection logic. The -post-run error classification block was extracted into `handleRunResult` — a shared -helper called by both `execute` and `executeResume` — eliminating 60+ lines of -duplication. - -## 38. Legacy Claude field removed (b4371d0, a782bbf) - -The last remnants of the original `ClaudeConfig` type and backward-compat `working_dir` -shim were removed. The schema is now fully generic. - -## 39. Kill-goroutine safety documentation, goroutine-leak test (3b4c50e) - -A documented invariant was added to the `execOnce` goroutine that kills the -subprocess process group: it cannot block indefinitely. Tests were added to verify -no goroutine leak occurs when a task is cancelled. - -## 40. Rate-limit avoidance in classifier; model list updates (8ec366d, fc1459b) - -The classifier now skips calling itself if the selected agent is rate-limited, -avoiding a redundant Gemini API call when the rate-limited agent is already known. -The model list was updated to Claude 4.x (`claude-sonnet-4-6`, `claude-opus-4-6`, -`claude-haiku-4-5-20251001`) and current Gemini models (`gemini-2.5-flash-lite`, -`gemini-2.5-flash`, `gemini-2.5-pro`). - -## 41. Map leak fixes — activePerAgent and rateLimited (7c7dd2b) - -Two map leak bugs were fixed in the pool: -- `activePerAgent[agentType]` was decremented but never deleted when the count hit - zero, so inactive agents accumulated as dead entries. -- Expired `rateLimited[agentType]` entries were not deleted, so the map grew - unboundedly over long runs. - -## 42. Sandbox teardown: remove working-copy pull, retry push on concurrent rejection (5c85624) - -The sandbox teardown removed the `git pull` into the working copy (which was failing -due to mixed-owner object dirs). The retry-push-on-rejection path was tightened to -detect `"fetch first"` and `"non-fast-forward"` as the rejection signals. - -## 43. Explicit load balancing separated from classification (e033504) - -Previously the `Classifier` both picked the agent and selected the model. This was -split: `pickAgent` is deterministic code that picks the agent from the registered -runners using the load-balancing algorithm. The `Classifier` only picks the model -for the already-selected agent. This makes load balancing reliable and fast even -when the Gemini classifier is unavailable. - -## 44. Session ID fix on second block-and-resume cycle (65c7638) - -A bug was found where the second BLOCKED→answer→resume cycle passed the wrong -`--resume` session ID to Claude. The fix ensures that resume executions propagate -the original session ID rather than the new execution's UUID. - -## 45. validTransitions promoted to package-level var (3226af3) - -`validTransitions` was promoted to a package-level variable in `internal/task/task.go` -for clarity and potential reuse outside the package. ADR-002 was updated to reflect -the current state machine including the `BLOCKED→READY` transition for parent tasks. - ---- - -## Feature Summary (current state) - -| Feature | Status | -|---|---| -| Task YAML parsing, batch files | Done | -| SQLite persistence | Done | -| REST API (CRUD + lifecycle) | Done | -| WebSocket real-time events | Done | -| Claude subprocess execution | Done | -| Gemini subprocess execution | Done | -| Explicit load balancing (pickAgent) | Done | -| Gemini-based model classification | Done | -| BLOCKED / question-answer / resume | Done | -| git sandbox isolation | Done | -| Subtask creation and unblocking | Done | -| READY state / human accept-reject | Done | -| Rate-limit and quota tracking | Done | -| Stale RUNNING recovery on startup | Done | -| Per-IP rate limiter on elaborate | Done | -| Web UI (PWA) | Done | -| Push notifications (PWA) | Planned | - ---- 2026-03-16T00:56:20Z --- -Converter sudoku to rust - ---- 2026-03-16T01:14:27Z --- -For claudomator tasks that are ready, check the deployed server version against their fix commit - ---- 2026-03-16T01:17:00Z --- -For every claudomator task that is ready, display on the task whether the currently deployed server includes the commit which fixes that task diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index 27c7601..0000000 --- a/docs/architecture.md +++ /dev/null @@ -1,392 +0,0 @@ -# Claudomator — System Architecture - -## 1. System Purpose and Design Goals - -Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude, -Gemini), and reports results. Its primary use case is unattended, automated execution of agent -tasks with real-time status streaming to a mobile Progressive Web App. - -**Design goals:** - -- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite. -- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning. -- **Durability** — all task state survives server restarts (SQLite + WAL mode). -- **Real-time feedback** — WebSocket pushes task completion events to connected clients. -- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven - model-tier selection via a cheap Gemini classifier. -- **Git-isolated execution** — tasks that modify source code run in a temporary git clone - (sandbox) to prevent concurrent corruption and partial-work leakage. -- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before - reaching `COMPLETED`. - -See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale. - ---- - -## 2. High-Level Architecture - -```mermaid -flowchart TD - CLI["CLI\ncmd/claudomator\n(cobra)"] - API["HTTP API\ninternal/api\nREST + WebSocket"] - Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"] - Classifier["Gemini Classifier\ninternal/executor.Classifier"] - ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"] - GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"] - Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"] - Subprocess["AI subprocess\nclaude -p / gemini"] - SQLite["SQLite\ntasks.db"] - LogFiles["Log Files\n~/.claudomator/executions/"] - Hub["WebSocket Hub\ninternal/api.Hub"] - Clients["Clients\nbrowser / mobile PWA"] - Notifier["Notifier\ninternal/notify"] - - CLI -->|run / serve| API - CLI -->|direct YAML run| Pool - API -->|POST /api/tasks/run| Pool - Pool -->|pickAgent + Classify| Classifier - Pool -->|Submit / SubmitResume| ClaudeRunner - Pool -->|Submit / SubmitResume| GeminiRunner - ClaudeRunner -->|git clone| Sandbox - Sandbox -->|claude -p| Subprocess - GeminiRunner -->|gemini| Subprocess - Subprocess -->|stream-json stdout| LogFiles - Pool -->|UpdateTaskState| SQLite - Pool -->|resultCh| API - API -->|Broadcast| Hub - Hub -->|fan-out| Clients - API -->|Notify| Notifier -``` - ---- - -## 3. Component Table - -| Package | Role | Key Exported Types | -|---|---|---| -| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` | -| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` | -| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` | -| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` | -| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` | -| `internal/config` | TOML config loading; data-directory layout | `Config` | -| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` | -| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` | -| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) | -| `version` | Build-time version string | `Version` | - ---- - -## 4. Package Dependency Graph - -```mermaid -graph LR - cli["internal/cli"] - api["internal/api"] - executor["internal/executor"] - storage["internal/storage"] - task["internal/task"] - config["internal/config"] - reporter["internal/reporter"] - notify["internal/notify"] - web["web"] - version["version"] - - cli --> api - cli --> executor - cli --> storage - cli --> config - cli --> version - api --> executor - api --> storage - api --> task - api --> notify - api --> web - executor --> storage - executor --> task - reporter --> task - reporter --> storage - storage --> task -``` - ---- - -## 5. Task Execution Pipeline - -The following numbered steps trace a task from API submission to final state, with file and line -references to the key logic in each step. - -1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`. - Task is written to SQLite in `PENDING` state. - (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`) - -2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates - the `PENDING → QUEUED` transition) then `executor.Pool.Submit`. - (`internal/api/server.go:460`, `internal/executor/executor.go:125`) - -3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot - (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`. - (`internal/executor/executor.go:102`) - -4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest - active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes - `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal. - (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`) - -5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite - every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state). - (`internal/executor/executor.go:642`) - -6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING` - status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are - immediately available for tailing. - (`internal/executor/executor.go:483`) - -7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument - list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume - execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first. - (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`) - -8. **Output streaming** — stdout is written to `//stdout.log`; the runner - concurrently parses the `stream-json` lines for cost and session ID. - (`internal/executor/claude.go`) - -9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error - type to a final task state and calls `storage.DB.UpdateTaskState`. - (`internal/executor/executor.go:256`) - - | Outcome | Final state | - |---|---| - | `runner.Run` → `nil`, top-level, no subtasks | `READY` | - | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` | - | `runner.Run` → `nil`, subtask | `COMPLETED` | - | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` | - | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` | - | `ctx.Err() == Canceled` | `CANCELLED` | - | quota exhausted | `BUDGET_EXCEEDED` | - | any other error | `FAILED` | - -10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults` - reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`. - (`internal/api/server.go:123`, `internal/api/server.go:129`) - -11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain, - `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task - fails and the sandbox is preserved for inspection. - (`internal/executor/claude.go:teardownSandbox`) - -12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or - `POST /api/tasks/{id}/reject` (`READY → PENDING`). - (`internal/api/server.go:487`, `internal/api/server.go:507`) - ---- - -## 6. Task State Machine - -```mermaid -stateDiagram-v2 - [*] --> PENDING : task created - - PENDING --> QUEUED : POST /run - PENDING --> CANCELLED : POST /cancel - - QUEUED --> RUNNING : pool goroutine starts - QUEUED --> CANCELLED : POST /cancel - - RUNNING --> READY : exit 0, top-level, no subtasks - RUNNING --> BLOCKED : exit 0, top-level, has subtasks - RUNNING --> BLOCKED : question.json written - RUNNING --> COMPLETED : exit 0, subtask - RUNNING --> FAILED : exit non-zero / stream error - RUNNING --> TIMED_OUT : context deadline exceeded - RUNNING --> CANCELLED : context cancelled - RUNNING --> BUDGET_EXCEEDED : quota exhausted - - READY --> COMPLETED : POST /accept - READY --> PENDING : POST /reject - - BLOCKED --> QUEUED : POST /answer - BLOCKED --> READY : all subtasks COMPLETED - - FAILED --> QUEUED : POST /run (retry) - TIMED_OUT --> QUEUED : POST /resume - CANCELLED --> QUEUED : POST /run (restart) - BUDGET_EXCEEDED --> QUEUED : POST /run (retry) - - COMPLETED --> [*] -``` - -**State definitions** (`internal/task/task.go:9`): - -| State | Meaning | -|---|---| -| `PENDING` | Created; not yet submitted for execution | -| `QUEUED` | Submitted to pool; waiting for a goroutine slot | -| `RUNNING` | Subprocess actively executing | -| `READY` | Top-level task done; awaiting operator accept/reject | -| `COMPLETED` | Fully done (only true terminal state) | -| `FAILED` | Execution error; eligible for retry | -| `TIMED_OUT` | Exceeded configured timeout; resumable | -| `CANCELLED` | Explicitly cancelled by operator | -| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` | -| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks | - -`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state -write. (`internal/task/task.go:113`) - ---- - -## 7. WebSocket Broadcast Flow - -```mermaid -sequenceDiagram - participant Runner as ClaudeRunner / GeminiRunner - participant Pool as executor.Pool - participant API as api.Server (forwardResults) - participant Hub as api.Hub - participant Client1 as WebSocket client 1 - participant Client2 as WebSocket client 2 - - Runner->>Pool: runner.Run() returns - Pool->>Pool: handleRunResult() — sets exec.Status - Pool->>SQLite: UpdateTaskState + UpdateExecution - Pool->>Pool: resultCh <- &Result{...} - API->>Pool: <-pool.Results() - API->>API: marshal task_completed JSON event - API->>Hub: hub.Broadcast(data) - Hub->>Client1: ws.Write(data) - Hub->>Client2: ws.Write(data) - API->>Notifier: notifier.Notify(Event{...}) [if set] -``` - -**Event payload** (JSON): -```json -{ - "type": "task_completed", - "task_id": "", - "status": "READY | COMPLETED | FAILED | ...", - "exit_code": 0, - "cost_usd": 0.042, - "error": "", - "timestamp": "2026-03-11T12:00:00Z" -} -``` - -The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses -an interactive question tool (currently unused in the primary file-based `BLOCKED` flow). - -WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is -configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections. -(`internal/api/websocket.go`) - ---- - -## 8. Subtask / Parent-Task Dependency Resolution - -Claudomator supports two distinct dependency mechanisms: - -### 8a. `depends_on` — explicit task ordering - -Tasks declare `depends_on: [, ...]` in their YAML or creation payload. When the -executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed -tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`, -`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED` -immediately. (`internal/executor/executor.go:642`) - -### 8b. Parent / subtask blocking (`BLOCKED` state) - -A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it) -transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows -an agent to dispatch subtasks and then wait for them. - -``` -Parent task runner exits 0 - ├── no subtasks → READY (normal review gate) - └── has subtasks → BLOCKED (waiting for subtasks) - │ - Each subtask completes → COMPLETED - │ - All subtasks COMPLETED? - ├── yes → maybeUnblockParent() → parent READY - └── no → parent stays BLOCKED -``` - -`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It -loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls -`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`) - -The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects -a `question.json` file in the execution log directory, it returns a `*BlockedError` containing -the question JSON and session ID. The pool stores the question in the `tasks.question_json` -column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer` -resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying -`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`, -`internal/api/server.go:221`) - ---- - -## 9. External Go Dependencies - -| Module | Version | Purpose | -|---|---|---| -| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) | -| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs | -| `github.com/spf13/cobra` | v1.10.2 | CLI command framework | -| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing | -| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server | -| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing | - ---- - -## 10. Related Documentation - -### Architectural Decision Records - -| ADR | Title | Summary | -|---|---|---| -| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale | -| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases | -| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist | -| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier | -| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation | - -### Package-Level Docs - -Per-package design notes live in [`docs/packages/`](packages/) (in progress). - -### Task YAML Reference - -```yaml -name: "My Task" -agent: - type: "claude" # "claude" | "gemini"; optional — load balancer may override - model: "sonnet" # model tier hint; optional — classifier may override - instructions: | - Do something useful. - project_dir: "/workspace/myproject" # if set, runs in a git sandbox - max_budget_usd: 1.00 - permission_mode: "bypassPermissions" # default - allowed_tools: ["Bash", "Read"] - context_files: ["README.md"] -timeout: "15m" -priority: "normal" # high | normal | low -tags: ["ci", "backend"] -depends_on: [""] # explicit ordering -parent_task_id: "" # set by parent agent when creating subtasks -``` - -Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`. - -### Storage Schema - -Two tables auto-migrated on `storage.Open()`: - -- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`, - `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`, - `question_json`, `rejection_comment`, `created_at`, `updated_at` -- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`, - `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`, - `resume_session_id`, `resume_answer` - -Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`, -`executions.status`, `executions.start_time`. -- cgit v1.2.3