diff options
Diffstat (limited to 'CLAUDE.md')
| -rw-r--r-- | CLAUDE.md | 260 |
1 files changed, 202 insertions, 58 deletions
@@ -51,120 +51,264 @@ go build -o claudomator ./cmd/claudomator/ Config defaults to `~/.claudomator/config.toml`. Data is stored in `~/.claudomator/` (SQLite DB + execution logs). +--- + ## Architecture -**Pipeline:** CLI/API → `executor.Pool` → `executor.ClaudeRunner` → `claude -p` subprocess → SQLite + log files +**Pipeline:** CLI/API → `executor.Pool` → `executor.ContainerRunner` → `claude -p` subprocess → SQLite + log files -### Packages +### Package Overview | Package | Role | |---|---| -| `internal/task` | `Task` struct, YAML parsing, state machine, validation | -| `internal/executor` | `Pool` (bounded goroutine pool) + `ClaudeRunner` (subprocess manager) | -| `internal/storage` | SQLite wrapper; stores tasks and execution records | -| `internal/api` | HTTP server (REST + WebSocket via `internal/api.Hub`) | -| `internal/reporter` | Formats and emits execution results | -| `internal/config` | TOML config + data dir layout | -| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | +| `internal/task` | `Task` struct, YAML/JSON parsing, state machine constants, validation | +| `internal/executor` | `Pool` (bounded goroutine dispatcher) + `ClaudeRunner` (subprocess + sandbox) + `GeminiRunner` (stub) + `Classifier` + preamble + question/summary helpers | +| `internal/storage` | SQLite wrapper; additive migrations; tasks + executions tables | +| `internal/api` | HTTP/WebSocket server — REST endpoints, webhook handler, elaborate/validate, script runner | +| `internal/notify` | `Notifier` interface; webhook, multi, log implementations | +| `internal/reporter` | Console/JSON/HTML report generation | +| `internal/deployment` | Deployment-status checking (polls URL for expected version) | +| `internal/config` | TOML config loading + data-dir layout helpers | +| `internal/cli` | Cobra commands: `run`, `serve`, `list`, `status`, `start`, `logs`, `create`, `report`, `init` | +| `internal/version` | VCS version detection (`debug.ReadBuildInfo`) | +| `web` | Embedded static UI (`embed.go`) | ### Key Data Flows **Task execution:** 1. Task created via `POST /api/tasks` or YAML file (`task.ParseFile`) -2. `POST /api/tasks/{id}/run` → `executor.Pool.Submit()` → goroutine in pool -3. `ClaudeRunner.Run()` invokes `claude -p <instructions> --output-format stream-json` -4. stdout streamed to `~/.claudomator/executions/<exec-id>/stdout.log`; cost parsed from stream-json -5. Execution result written to SQLite; broadcast via WebSocket to connected clients +2. `POST /api/tasks/{id}/run` → `executor.Pool.Submit()` → buffered work queue +3. `dispatch()` goroutine picks from queue, waits for slot, launches `execute()` +4. `execute()` calls `ContainerRunner.Run()` → `claude -p <instructions> --output-format stream-json` +5. stdout piped through `parseStream()` to `~/.claudomator/executions/<exec-id>/stdout.log` +6. Execution result written to SQLite, broadcast via WebSocket to connected clients + +**Task state machine** (enforced in `storage.UpdateTaskState` via `task.ValidTransition`): + +``` +PENDING ──→ QUEUED ──→ RUNNING ──→ READY ──→ COMPLETED + ↑ │ └──→ PENDING (rejected) + │ │ + │ ├──→ BLOCKED ──→ READY (all subtasks done) + │ │ └──→ QUEUED (question answered) + │ │ + └──────────────├──→ FAILED + ├──→ TIMED_OUT + ├──→ CANCELLED + └──→ BUDGET_EXCEEDED +``` + +- **BLOCKED**: Parent task completed but has subtasks that are not yet COMPLETED, OR agent wrote a question file. Unblocked by `maybeUnblockParent()` or user answer via `/api/tasks/{id}/answer`. +- **READY**: Execution succeeded; awaits manual accept/reject via `/api/tasks/{id}/accept` or `/api/tasks/{id}/reject`. +- **COMPLETED**: Terminal — entered only via user accept (top-level) or automatic subtask completion. +- `FAILED/TIMED_OUT/CANCELLED/BUDGET_EXCEEDED` all re-enter at `QUEUED` for retry/resume. + +**WebSocket:** `Hub` fans out task completion events to all connected clients. `Server.StartHub()` must be called before `ListenAndServe`. -**State machine** (`task.ValidTransition`): -`PENDING` → `QUEUED` → `RUNNING` → `COMPLETED | FAILED | TIMED_OUT | CANCELLED | BUDGET_EXCEEDED` -Failed tasks can retry: `FAILED` → `QUEUED` +### Sandbox Lifecycle (ContainerRunner (Docker-based)) -**WebSocket:** `Hub` fans out task completion events to all connected clients. `Server.StartHub()` must be called after creating the server. +When `agent.project_dir` is set: +1. `setupSandbox()` clones the project into `/tmp/claudomator-sandbox-*` via the "local" remote (bare repo), then falls back to "origin", then the working copy path. +2. The claude subprocess runs inside the sandbox. +3. After successful execution, `teardownSandbox()` auto-commits any uncommitted changes (after running a build if `Makefile`/`go.mod`/`gradlew` is present), then pushes new commits to the bare repo (`origin` from the sandbox's perspective). The sandbox is then removed. +4. On failure the sandbox is preserved and its path is returned in the error. +5. On BLOCKED (question written), the sandbox path is stored in `executions.sandbox_dir` so the resume execution can reuse it. + +> **Known bug:** Variable shadowing in `claude.go` `Run()` means the outer `sandboxDir` is never assigned (both `setupSandbox` calls use `:=` inside nested blocks). This causes: (a) `teardownSandbox` is never called — work is discarded, sandboxes accumulate in `/tmp`; (b) `BlockedError.SandboxDir` is always `""`, so resume clones a fresh sandbox and loses the agent's partial work. See [Known Bugs](#known-bugs). + +> **Known bug:** `teardownSandbox` hardcodes `origin/master` when rebasing on conflict. Repos using `main` will fail on concurrent push. See [Known Bugs](#known-bugs). ### Task YAML Format ```yaml name: "My Task" -claude: - model: "sonnet" +description: "Optional longer description" +agent: + type: "claude" # "claude" (default) or "gemini" (stub, not production-ready) + model: "sonnet" # optional; auto-classified by Classifier if omitted instructions: | Do something useful. - working_dir: "/path/to/project" + project_dir: "/path/to/project" # optional; triggers sandbox isolation max_budget_usd: 1.00 - permission_mode: "default" - allowed_tools: ["Bash", "Read"] + permission_mode: "bypassPermissions" # default; or "default", "acceptEdits" + allowed_tools: ["Bash", "Read", "Edit"] + disallowed_tools: [] + context_files: ["/extra/context/path"] + system_prompt_append: "Extra instructions appended to system prompt." + skip_planning: false # if false, prepends planning/orchestration preamble + additional_args: [] # extra flags forwarded verbatim to claude CLI timeout: "15m" -priority: "normal" # high | normal | low +priority: "normal" # "high" | "normal" | "low" (stored but not yet used for scheduling) tags: ["ci"] +depends_on: ["other-task-id"] +retry: + max_attempts: 1 # stored but retry is currently manual via /resume + backoff: "exponential" ``` +> **Note:** The YAML key is `agent:`, not `claude:`. Earlier docs showed `claude:` which was wrong. + Batch files wrap multiple tasks under a `tasks:` key. ### Storage Schema -Two tables: `tasks` (with `config_json`, `retry_json`, `tags_json`, `depends_on_json` as JSON blobs) and `executions` (with paths to log files). Schema is auto-migrated on `storage.Open()`. +Two tables. Schema is auto-migrated additively on `storage.Open()` — new columns are `ALTER TABLE ... ADD COLUMN` statements that silently succeed if the column already exists. + +``` +tasks: id, name, description, config_json, priority, timeout_ns, retry_json, + tags_json, depends_on_json, parent_task_id, state, rejection_comment, + question_json, summary, elaboration_input, interactions_json, + created_at, updated_at + +executions: id, task_id, start_time, end_time, exit_code, status, stdout_path, + stderr_path, artifact_dir, cost_usd, error_msg, session_id, + sandbox_dir, changestats_json, commits_json +``` + +JSON blobs: `config_json` (AgentConfig), `retry_json`, `tags_json`, `depends_on_json`, `interactions_json`, `changestats_json`, `commits_json`. + +--- ## Features -### Changestats +### Planning Preamble & Orchestration -After each task execution, Claudomator extracts git diff statistics from the execution's stdout log. If the log contains a git `diff --stat` summary line (e.g. `5 files changed, 127 insertions(+), 43 deletions(-)`), the stats are parsed and stored in the `executions.changestats_json` column via `storage.DB.UpdateExecutionChangestats`. +When `agent.skip_planning` is false (the default), `withPlanningPreamble()` prepends a system-level prompt to the agent's instructions that: +- Instructs the agent to POST subtasks to `$CLAUDOMATOR_API_URL/api/tasks` and stop if the task will take more than ~3 minutes +- Instructs the agent to write a JSON question to `$CLAUDOMATOR_QUESTION_FILE` and exit if it needs user input +- Requires all changes to be committed before exit +- Requires a summary written to `$CLAUDOMATOR_SUMMARY_FILE` -**Extraction points:** -- `internal/executor.Pool.handleRunResult` — calls `task.ParseChangestatFromFile(exec.StdoutPath)` after every execution; stores via `Store.UpdateExecutionChangestats`. -- `internal/api.Server.processResult` — also extracts changestats when the API server processes a result (same file, idempotent second write). +Env vars injected into every execution: `CLAUDOMATOR_API_URL`, `CLAUDOMATOR_TASK_ID`, `CLAUDOMATOR_PROJECT_DIR`, `CLAUDOMATOR_QUESTION_FILE`, `CLAUDOMATOR_SUMMARY_FILE`. -**Parser location:** `internal/task/changestats.go` — exported functions `ParseChangestatFromOutput` and `ParseChangestatFromFile` usable by any package without creating circular imports. +### Changestats -**Frontend display:** `web/app.js` renders a `.changestats-badge` on COMPLETED/READY task cards and in execution history rows. +After each execution, changestats (files changed, lines added/removed) are parsed from git `diff --stat` output in `stdout.log` and stored in `executions.changestats_json`. -## GitHub Webhook Integration +> **Duplication debt:** Changestats are extracted in two places: `executor.Pool.handleRunResult()` and `api.Server.processResult()`. Both write the same value to the same row (idempotent), but the double-extraction is confusing and should be consolidated. See [Design Debt](#design-debt). -Claudomator can automatically create tasks when CI builds fail on GitHub. +**Parser:** `internal/task/changestats.go` — `ParseChangestatFromOutput`, `ParseChangestatFromFile`. -### Endpoint +**Frontend:** `web/app.js` renders a `.changestats-badge` on COMPLETED/READY task cards. -`POST /api/webhooks/github` +### GitHub Webhook Integration -Accepts `check_run` and `workflow_run` events from GitHub. Returns `{"task_id": "..."}` (200) when a task is created, or 204 when the event is ignored. +`POST /api/webhooks/github` accepts `check_run` and `workflow_run` events. Returns `{"task_id": "..."}` (200) on task creation or 204 if ignored. -### Config (`~/.claudomator/config.toml`) +#### Config (`~/.claudomator/config.toml`) ```toml -# Optional: HMAC-SHA256 secret set in the GitHub webhook settings. -# If omitted, signature validation is skipped. -webhook_secret = "your-github-webhook-secret" +webhook_secret = "your-github-webhook-secret" # HMAC-SHA256; skip validation if omitted -# Projects for matching incoming webhook repository names to local directories. [[projects]] name = "myrepo" dir = "/workspace/myrepo" - -[[projects]] -name = "other-service" -dir = "/workspace/other-service" ``` -### Matching logic +#### Matching logic + +Repository name matched case-insensitively against each project's `name` and the basename of its `dir`. Falls back to the only configured project if no match found. + +#### Task creation + +Tasks created for: +- `check_run` with `action: completed` and `conclusion: failure` +- `workflow_run` with `action: completed` and `conclusion: failure` or `timed_out` + +Tagged `["ci", "auto"]`, capped at $3 USD, allowed tools: Read, Edit, Bash, Glob, Grep. + +### Elaborate Endpoint + +`POST /api/tasks/elaborate` converts natural language → task JSON via a `claude --prompt` invocation. Optionally reads `CLAUDE.md` / `SESSION_STATE.md` from a configured working directory for context. Per-IP rate-limited. + +> **Implementation gap:** The elaborate endpoint is not tested against real Claude invocations. `sanitizeElaboratedTask()` uses keyword heuristics to infer missing tools (fragile). No caching. + +### Model Classifier + +`executor.Classifier` calls the Gemini CLI (`gemini-2.5-flash-lite`) to pick the best Claude or Gemini model for a task. Falls back to the default model (`sonnet`) if Gemini fails. Agent type is selected first by load balancer; classifier only picks the model within that agent. + +> **Implementation gap:** Output parsing is brittle — strips `"Loaded cached credentials."` lines and markdown fences by string matching. No fallback if Gemini CLI isn't installed. Classification results are not cached or logged for learning. + +--- + + +--- + +## Design Debt + +### GeminiRunner is a non-functional stub + +`internal/executor/gemini.go` `execOnce()` does not run the `gemini` binary. It starts a goroutine that writes hardcoded fake JSON to a pipe. `parseGeminiStream()` strips markdown fences but does no semantic parsing. There is no session/resume support. + +Any task with `agent.type: "gemini"` will silently return canned output. This is dangerous in production. + +**Decision needed:** Either implement GeminiRunner properly (subprocess + stream parsing + sandbox integration mirroring ClaudeRunner) or remove it and the `Classifier` from the codebase until it's ready. + +### Priority field is stored but never used + +`task.Priority` (`high`, `normal`, `low`) is persisted in SQLite and surfaced in the API. The executor `dispatch()` goroutine uses a simple FIFO channel (`workCh`) with no priority ordering. + +### RetryConfig is stored but retry is manual + +`task.RetryConfig.MaxAttempts` and `Backoff` are parsed and stored. No code reads them during execution. Retries must be triggered manually via `POST /api/tasks/{id}/resume`. + +### Changestats extracted in two places + +`executor.Pool.handleRunResult()` and `api.Server.processResult()` both call `task.ParseChangestatFromFile()` and write to `executions.changestats_json`. The second write is idempotent but wasteful and confusing. One of the two should be removed. + +### context.Background() in resume path + +`api.Server.handleAnswerQuestion()` calls `p.SubmitResume(context.Background(), ...)`. If the HTTP request context is cancelled, the resume still runs. Inversely, if the server shuts down, in-flight resumes using the server's root context would be cancelled while this one would not. Should use a long-lived server-level context, not `Background()`. + +### Non-transactional execution creation + +`pool.execute()` calls `store.CreateExecution(exec)` followed by `store.UpdateTaskState(t.ID, task.StateRunning)` as separate statements. If the server crashes between them, the task stays PENDING while an execution record exists with status RUNNING. Recovery (`RecoverStaleRunning`) partially handles this but the root cause is the missing transaction. + +### Elaborate/validate cmd path indirection + +`Server` has two separate fields `elaborateCmdPath` and `validateCmdPath` that override `claudeBinPath` only for tests. This is a testing-time seam that leaks into the production struct. A cleaner approach would be to inject an `Elaborator` interface. + +### `withFailureHistory` mutates a shallow copy + +In `executor.go`, `withFailureHistory` creates a copy of the task struct (`copy := *t`) but `copy.Agent = t.Agent` copies the struct value — slices inside AgentConfig (`AllowedTools`, `DisallowedTools`, etc.) share the backing array. Appending to `SystemPromptAppend` is safe but any mutation of slices would affect the original. -The handler matches the webhook's `repository.name` against each project's `name` and the basename of its `dir` (case-insensitive substring). If no match is found and only one project is configured, that project is used as a fallback. +### Additive migration strategy is fragile -### GitHub webhook setup +`storage.migrate()` lists every `ALTER TABLE ADD COLUMN` statement in code order. The only idempotency guard is catching "column already exists" errors. There is no migration version tracking. Columns dropped in `CREATE TABLE IF NOT EXISTS` and added back via ALTER are indistinguishable from new columns. Concurrent server instances running migrations simultaneously have no protection. -In your GitHub repository → Settings → Webhooks → Add webhook: -- **Payload URL:** `https://<your-claudomator-host>/api/webhooks/github` -- **Content type:** `application/json` -- **Secret:** value of `webhook_secret` in config (or leave blank if not configured) -- **Events:** select *Workflow runs* and *Check runs* +--- -### Task creation +## REST API Reference -A task is created for: -- `check_run` events with `action: completed` and `conclusion: failure` -- `workflow_run` events with `action: completed` and `conclusion: failure` or `timed_out` +| Method | Endpoint | Description | +|--------|----------|-------------| +| GET | `/api/tasks` | List tasks; `?state=RUNNING&since=<RFC3339>&limit=50` | +| POST | `/api/tasks` | Create task (JSON body) | +| GET | `/api/tasks/{id}` | Get task | +| DELETE | `/api/tasks/{id}` | Delete task + subtasks + executions | +| POST | `/api/tasks/{id}/run` | Submit PENDING task to executor | +| POST | `/api/tasks/{id}/cancel` | Cancel RUNNING/QUEUED task | +| POST | `/api/tasks/{id}/accept` | Accept READY task → COMPLETED | +| POST | `/api/tasks/{id}/reject` | Reject READY task → PENDING | +| POST | `/api/tasks/{id}/answer` | Answer BLOCKED task question → QUEUED | +| POST | `/api/tasks/{id}/resume` | Resume FAILED/TIMED_OUT/CANCELLED task | +| GET | `/api/tasks/{id}/subtasks` | List subtasks | +| GET | `/api/tasks/{id}/executions` | List execution history | +| GET | `/api/executions/{id}` | Get execution | +| GET | `/api/executions/{id}/log` | Get execution log (`?tail=100`) | +| GET | `/api/executions/{id}/logs/stream` | Stream logs as SSE | +| GET | `/api/tasks/{id}/logs/stream` | Stream latest execution logs | +| GET | `/api/executions` | List recent executions across all tasks | +| GET | `/api/tasks/{id}/deployment-status` | Poll deployment readiness | +| POST | `/api/tasks/elaborate` | Convert natural language → task JSON | +| POST | `/api/tasks/validate` | Validate task JSON | +| POST | `/api/scripts/{name}` | Run named script with task context | +| GET | `/api/ws` | WebSocket upgrade (live task updates) | +| GET | `/api/workspaces` | List directories under `workspace_root` | +| GET | `/api/health` | Server health | +| POST | `/api/webhooks/github` | GitHub CI webhook | -Tasks are tagged `["ci", "auto"]`, capped at $3 USD, and use tools: Read, Edit, Bash, Glob, Grep. +--- ## ADRs |
