# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. Also check `~/.claude/CLAUDE.md` for user-level development standards (TDD workflow, git practices, session state management, etc.) that apply globally across all projects. ## Canonical Repository **The canonical source of truth is `/workspace/claudomator`.** All development must happen here. Do not work in any other directory unless explicitly instructed. Do not explore `/site/doot.terst.org/` for source files. ## Build & Test Commands ```bash # Build go build ./... # Run all tests go test ./... # Run a single package's tests go test ./internal/executor/... # Run a single test by name go test ./internal/api/ -run TestServer_CreateTask_MissingName # Run with race detector (important for executor/pool tests) go test -race ./... # Build the binary go build -o claudomator ./cmd/claudomator/ ``` > **Note:** `go-sqlite3` uses CGo. A C compiler (`gcc`) must be present for builds and tests. ## Running the Server ```bash # Initialize data directory ./claudomator init # Start API server (default :8484) ./claudomator serve # Run a task file directly (bypasses server) ./claudomator run ./test/fixtures/tasks/simple-task.yaml # List tasks via CLI ./claudomator list ``` Config defaults to `~/.claudomator/config.toml`. Data is stored in `~/.claudomator/` (SQLite DB + execution logs). --- ## Architecture **Pipeline:** CLI/API → `executor.Pool` → `executor.ContainerRunner` → `claude -p` subprocess → SQLite + log files ### Package Overview | Package | Role | |---|---| | `internal/task` | `Task` struct, YAML/JSON parsing, state machine constants, validation | | `internal/executor` | `Pool` (bounded goroutine dispatcher) + `ClaudeRunner` (subprocess + sandbox) + `GeminiRunner` (stub) + `Classifier` + preamble + question/summary helpers | | `internal/storage` | SQLite wrapper; additive migrations; tasks + executions tables | | `internal/api` | HTTP/WebSocket server — REST endpoints, webhook handler, elaborate/validate, script runner | | `internal/notify` | `Notifier` interface; webhook, multi, log implementations | | `internal/reporter` | Console/JSON/HTML report generation | | `internal/deployment` | Deployment-status checking (polls URL for expected version) | | `internal/config` | TOML config loading + data-dir layout helpers | | `internal/cli` | Cobra commands: `run`, `serve`, `list`, `status`, `start`, `logs`, `create`, `report`, `init` | | `internal/version` | VCS version detection (`debug.ReadBuildInfo`) | | `web` | Embedded static UI (`embed.go`) | ### Key Data Flows **Task execution:** 1. Task created via `POST /api/tasks` or YAML file (`task.ParseFile`) 2. `POST /api/tasks/{id}/run` → `executor.Pool.Submit()` → buffered work queue 3. `dispatch()` goroutine picks from queue, waits for slot, launches `execute()` 4. `execute()` calls `ContainerRunner.Run()` → `claude -p --output-format stream-json` 5. stdout piped through `parseStream()` to `~/.claudomator/executions//stdout.log` 6. Execution result written to SQLite, broadcast via WebSocket to connected clients **Task state machine** (enforced in `storage.UpdateTaskState` via `task.ValidTransition`): ``` PENDING ──→ QUEUED ──→ RUNNING ──→ READY ──→ COMPLETED ↑ │ └──→ PENDING (rejected) │ │ │ ├──→ BLOCKED ──→ READY (all subtasks done) │ │ └──→ QUEUED (question answered) │ │ └──────────────├──→ FAILED ├──→ TIMED_OUT ├──→ CANCELLED └──→ BUDGET_EXCEEDED ``` - **BLOCKED**: Parent task completed but has subtasks that are not yet COMPLETED, OR agent wrote a question file. Unblocked by `maybeUnblockParent()` or user answer via `/api/tasks/{id}/answer`. - **READY**: Execution succeeded; awaits manual accept/reject via `/api/tasks/{id}/accept` or `/api/tasks/{id}/reject`. - **COMPLETED**: Terminal — entered only via user accept (top-level) or automatic subtask completion. - `FAILED/TIMED_OUT/CANCELLED/BUDGET_EXCEEDED` all re-enter at `QUEUED` for retry/resume. **WebSocket:** `Hub` fans out task completion events to all connected clients. `Server.StartHub()` must be called before `ListenAndServe`. ### Sandbox Lifecycle (ContainerRunner (Docker-based)) When `agent.project_dir` is set: 1. `setupSandbox()` clones the project into `/tmp/claudomator-sandbox-*` via the "local" remote (bare repo), then falls back to "origin", then the working copy path. 2. The claude subprocess runs inside the sandbox. 3. After successful execution, `teardownSandbox()` auto-commits any uncommitted changes (after running a build if `Makefile`/`go.mod`/`gradlew` is present), then pushes new commits to the bare repo (`origin` from the sandbox's perspective). The sandbox is then removed. 4. On failure the sandbox is preserved and its path is returned in the error. 5. On BLOCKED (question written), the sandbox path is stored in `executions.sandbox_dir` so the resume execution can reuse it. > **Known bug:** Variable shadowing in `claude.go` `Run()` means the outer `sandboxDir` is never assigned (both `setupSandbox` calls use `:=` inside nested blocks). This causes: (a) `teardownSandbox` is never called — work is discarded, sandboxes accumulate in `/tmp`; (b) `BlockedError.SandboxDir` is always `""`, so resume clones a fresh sandbox and loses the agent's partial work. See [Known Bugs](#known-bugs). > **Known bug:** `teardownSandbox` hardcodes `origin/master` when rebasing on conflict. Repos using `main` will fail on concurrent push. See [Known Bugs](#known-bugs). ### Task YAML Format ```yaml name: "My Task" description: "Optional longer description" agent: type: "claude" # "claude" (default) or "gemini" (stub, not production-ready) model: "sonnet" # optional; auto-classified by Classifier if omitted instructions: | Do something useful. project_dir: "/path/to/project" # optional; triggers sandbox isolation max_budget_usd: 1.00 permission_mode: "bypassPermissions" # default; or "default", "acceptEdits" allowed_tools: ["Bash", "Read", "Edit"] disallowed_tools: [] context_files: ["/extra/context/path"] system_prompt_append: "Extra instructions appended to system prompt." skip_planning: false # if false, prepends planning/orchestration preamble additional_args: [] # extra flags forwarded verbatim to claude CLI timeout: "15m" priority: "normal" # "high" | "normal" | "low" (stored but not yet used for scheduling) tags: ["ci"] depends_on: ["other-task-id"] retry: max_attempts: 1 # stored but retry is currently manual via /resume backoff: "exponential" ``` > **Note:** The YAML key is `agent:`, not `claude:`. Earlier docs showed `claude:` which was wrong. Batch files wrap multiple tasks under a `tasks:` key. ### Storage Schema Two tables. Schema is auto-migrated additively on `storage.Open()` — new columns are `ALTER TABLE ... ADD COLUMN` statements that silently succeed if the column already exists. ``` tasks: id, name, description, config_json, priority, timeout_ns, retry_json, tags_json, depends_on_json, parent_task_id, state, rejection_comment, question_json, summary, elaboration_input, interactions_json, created_at, updated_at executions: id, task_id, start_time, end_time, exit_code, status, stdout_path, stderr_path, artifact_dir, cost_usd, error_msg, session_id, sandbox_dir, changestats_json, commits_json ``` JSON blobs: `config_json` (AgentConfig), `retry_json`, `tags_json`, `depends_on_json`, `interactions_json`, `changestats_json`, `commits_json`. --- ## Features ### Planning Preamble & Orchestration When `agent.skip_planning` is false (the default), `withPlanningPreamble()` prepends a system-level prompt to the agent's instructions that: - Instructs the agent to POST subtasks to `$CLAUDOMATOR_API_URL/api/tasks` and stop if the task will take more than ~3 minutes - Instructs the agent to write a JSON question to `$CLAUDOMATOR_QUESTION_FILE` and exit if it needs user input - Requires all changes to be committed before exit - Requires a summary written to `$CLAUDOMATOR_SUMMARY_FILE` Env vars injected into every execution: `CLAUDOMATOR_API_URL`, `CLAUDOMATOR_TASK_ID`, `CLAUDOMATOR_PROJECT_DIR`, `CLAUDOMATOR_QUESTION_FILE`, `CLAUDOMATOR_SUMMARY_FILE`. ### Changestats After each execution, changestats (files changed, lines added/removed) are parsed from git `diff --stat` output in `stdout.log` and stored in `executions.changestats_json`. > **Duplication debt:** Changestats are extracted in two places: `executor.Pool.handleRunResult()` and `api.Server.processResult()`. Both write the same value to the same row (idempotent), but the double-extraction is confusing and should be consolidated. See [Design Debt](#design-debt). **Parser:** `internal/task/changestats.go` — `ParseChangestatFromOutput`, `ParseChangestatFromFile`. **Frontend:** `web/app.js` renders a `.changestats-badge` on COMPLETED/READY task cards. ### GitHub Webhook Integration `POST /api/webhooks/github` accepts `check_run` and `workflow_run` events. Returns `{"task_id": "..."}` (200) on task creation or 204 if ignored. #### Config (`~/.claudomator/config.toml`) ```toml webhook_secret = "your-github-webhook-secret" # HMAC-SHA256; skip validation if omitted [[projects]] name = "myrepo" dir = "/workspace/myrepo" ``` #### Matching logic Repository name matched case-insensitively against each project's `name` and the basename of its `dir`. Falls back to the only configured project if no match found. #### Task creation Tasks created for: - `check_run` with `action: completed` and `conclusion: failure` - `workflow_run` with `action: completed` and `conclusion: failure` or `timed_out` Tagged `["ci", "auto"]`, capped at $3 USD, allowed tools: Read, Edit, Bash, Glob, Grep. ### Elaborate Endpoint `POST /api/tasks/elaborate` converts natural language → task JSON via a `claude --prompt` invocation. Optionally reads `CLAUDE.md` / `SESSION_STATE.md` from a configured working directory for context. Per-IP rate-limited. > **Implementation gap:** The elaborate endpoint is not tested against real Claude invocations. `sanitizeElaboratedTask()` uses keyword heuristics to infer missing tools (fragile). No caching. ### Model Classifier `executor.Classifier` calls the Gemini CLI (`gemini-2.5-flash-lite`) to pick the best Claude or Gemini model for a task. Falls back to the default model (`sonnet`) if Gemini fails. Agent type is selected first by load balancer; classifier only picks the model within that agent. > **Implementation gap:** Output parsing is brittle — strips `"Loaded cached credentials."` lines and markdown fences by string matching. No fallback if Gemini CLI isn't installed. Classification results are not cached or logged for learning. --- --- ## Design Debt ### GeminiRunner is a non-functional stub `internal/executor/gemini.go` `execOnce()` does not run the `gemini` binary. It starts a goroutine that writes hardcoded fake JSON to a pipe. `parseGeminiStream()` strips markdown fences but does no semantic parsing. There is no session/resume support. Any task with `agent.type: "gemini"` will silently return canned output. This is dangerous in production. **Decision needed:** Either implement GeminiRunner properly (subprocess + stream parsing + sandbox integration mirroring ClaudeRunner) or remove it and the `Classifier` from the codebase until it's ready. ### Priority field is stored but never used `task.Priority` (`high`, `normal`, `low`) is persisted in SQLite and surfaced in the API. The executor `dispatch()` goroutine uses a simple FIFO channel (`workCh`) with no priority ordering. ### RetryConfig is stored but retry is manual `task.RetryConfig.MaxAttempts` and `Backoff` are parsed and stored. No code reads them during execution. Retries must be triggered manually via `POST /api/tasks/{id}/resume`. ### Changestats extracted in two places `executor.Pool.handleRunResult()` and `api.Server.processResult()` both call `task.ParseChangestatFromFile()` and write to `executions.changestats_json`. The second write is idempotent but wasteful and confusing. One of the two should be removed. ### context.Background() in resume path `api.Server.handleAnswerQuestion()` calls `p.SubmitResume(context.Background(), ...)`. If the HTTP request context is cancelled, the resume still runs. Inversely, if the server shuts down, in-flight resumes using the server's root context would be cancelled while this one would not. Should use a long-lived server-level context, not `Background()`. ### Non-transactional execution creation `pool.execute()` calls `store.CreateExecution(exec)` followed by `store.UpdateTaskState(t.ID, task.StateRunning)` as separate statements. If the server crashes between them, the task stays PENDING while an execution record exists with status RUNNING. Recovery (`RecoverStaleRunning`) partially handles this but the root cause is the missing transaction. ### Elaborate/validate cmd path indirection `Server` has two separate fields `elaborateCmdPath` and `validateCmdPath` that override `claudeBinPath` only for tests. This is a testing-time seam that leaks into the production struct. A cleaner approach would be to inject an `Elaborator` interface. ### `withFailureHistory` mutates a shallow copy In `executor.go`, `withFailureHistory` creates a copy of the task struct (`copy := *t`) but `copy.Agent = t.Agent` copies the struct value — slices inside AgentConfig (`AllowedTools`, `DisallowedTools`, etc.) share the backing array. Appending to `SystemPromptAppend` is safe but any mutation of slices would affect the original. ### Additive migration strategy is fragile `storage.migrate()` lists every `ALTER TABLE ADD COLUMN` statement in code order. The only idempotency guard is catching "column already exists" errors. There is no migration version tracking. Columns dropped in `CREATE TABLE IF NOT EXISTS` and added back via ALTER are indistinguishable from new columns. Concurrent server instances running migrations simultaneously have no protection. --- ## REST API Reference | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/tasks` | List tasks; `?state=RUNNING&since=&limit=50` | | POST | `/api/tasks` | Create task (JSON body) | | GET | `/api/tasks/{id}` | Get task | | DELETE | `/api/tasks/{id}` | Delete task + subtasks + executions | | POST | `/api/tasks/{id}/run` | Submit PENDING task to executor | | POST | `/api/tasks/{id}/cancel` | Cancel RUNNING/QUEUED task | | POST | `/api/tasks/{id}/accept` | Accept READY task → COMPLETED | | POST | `/api/tasks/{id}/reject` | Reject READY task → PENDING | | POST | `/api/tasks/{id}/answer` | Answer BLOCKED task question → QUEUED | | POST | `/api/tasks/{id}/resume` | Resume FAILED/TIMED_OUT/CANCELLED task | | GET | `/api/tasks/{id}/subtasks` | List subtasks | | GET | `/api/tasks/{id}/executions` | List execution history | | GET | `/api/executions/{id}` | Get execution | | GET | `/api/executions/{id}/log` | Get execution log (`?tail=100`) | | GET | `/api/executions/{id}/logs/stream` | Stream logs as SSE | | GET | `/api/tasks/{id}/logs/stream` | Stream latest execution logs | | GET | `/api/executions` | List recent executions across all tasks | | GET | `/api/tasks/{id}/deployment-status` | Poll deployment readiness | | POST | `/api/tasks/elaborate` | Convert natural language → task JSON | | POST | `/api/tasks/validate` | Validate task JSON | | POST | `/api/scripts/{name}` | Run named script with task context | | GET | `/api/ws` | WebSocket upgrade (live task updates) | | GET | `/api/workspaces` | List directories under `workspace_root` | | GET | `/api/health` | Server health | | POST | `/api/webhooks/github` | GitHub CI webhook | --- ## ADRs See `docs/adr/001-language-and-architecture.md` for the Go + SQLite + WebSocket rationale.