CLAUDE.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Also check `~/.claude/CLAUDE.md` for user-level development standards (TDD workflow, git practices, session state management, etc.) that apply globally across all projects.

## Canonical Repository

**The canonical source of truth is `/workspace/claudomator`.** All development must happen here.
Do not work in any other directory unless explicitly instructed. Do not explore `/site/doot.terst.org/` for source files.

## Build & Test Commands

```bash
# Build
go build ./...

# Run all tests
go test ./...

# Run a single package's tests
go test ./internal/executor/...

# Run a single test by name
go test ./internal/api/ -run TestServer_CreateTask_MissingName

# Run with race detector (important for executor/pool tests)
go test -race ./...

# Build the binary
go build -o claudomator ./cmd/claudomator/
```

> **Note:** `go-sqlite3` uses CGo. A C compiler (`gcc`) must be present for builds and tests.

## Running the Server

```bash
# Initialize data directory
./claudomator init

# Start API server (default :8484)
./claudomator serve

# Run a task file directly (bypasses server)
./claudomator run ./test/fixtures/tasks/simple-task.yaml

# List tasks via CLI
./claudomator list
```

Config defaults to `~/.claudomator/config.toml`. Data is stored in `~/.claudomator/` (SQLite DB + execution logs).

---

## Architecture

**Pipeline:** CLI/API → `executor.Pool` → `executor.ContainerRunner` → `claude -p` subprocess → SQLite + log files

### Package Overview

| Package | Role |
|---|---|
| `internal/task` | `Task` struct, YAML/JSON parsing, state machine constants, validation |
| `internal/executor` | `Pool` (bounded goroutine dispatcher) + `ClaudeRunner` (subprocess + sandbox) + `GeminiRunner` (stub) + `Classifier` + preamble + question/summary helpers |
| `internal/storage` | SQLite wrapper; additive migrations; tasks + executions tables |
| `internal/api` | HTTP/WebSocket server — REST endpoints, webhook handler, elaborate/validate, script runner |
| `internal/notify` | `Notifier` interface; webhook, multi, log implementations |
| `internal/reporter` | Console/JSON/HTML report generation |
| `internal/deployment` | Deployment-status checking (polls URL for expected version) |
| `internal/config` | TOML config loading + data-dir layout helpers |
| `internal/cli` | Cobra commands: `run`, `serve`, `list`, `status`, `start`, `logs`, `create`, `report`, `init` |
| `internal/version` | VCS version detection (`debug.ReadBuildInfo`) |
| `web` | Embedded static UI (`embed.go`) |

### Key Data Flows

**Task execution:**
1. Task created via `POST /api/tasks` or YAML file (`task.ParseFile`)
2. `POST /api/tasks/{id}/run` → `executor.Pool.Submit()` → buffered work queue
3. `dispatch()` goroutine picks from queue, waits for slot, launches `execute()`
4. `execute()` calls `ContainerRunner.Run()` → `claude -p <instructions> --output-format stream-json`
5. stdout piped through `parseStream()` to `~/.claudomator/executions/<exec-id>/stdout.log`
6. Execution result written to SQLite, broadcast via WebSocket to connected clients

**Task state machine** (enforced in `storage.UpdateTaskState` via `task.ValidTransition`):

```
PENDING ──→ QUEUED ──→ RUNNING ──→ READY ──→ COMPLETED
               ↑              │     └──→ PENDING (rejected)
               │              │
               │              ├──→ BLOCKED ──→ READY (all subtasks done)
               │              │         └──→ QUEUED (question answered)
               │              │
               └──────────────├──→ FAILED
                              ├──→ TIMED_OUT
                              ├──→ CANCELLED
                              └──→ BUDGET_EXCEEDED
```

- **BLOCKED**: Parent task completed but has subtasks that are not yet COMPLETED, OR agent wrote a question file. Unblocked by `maybeUnblockParent()` or user answer via `/api/tasks/{id}/answer`.
- **READY**: Execution succeeded; awaits manual accept/reject via `/api/tasks/{id}/accept` or `/api/tasks/{id}/reject`.
- **COMPLETED**: Terminal — entered only via user accept (top-level) or automatic subtask completion.
- `FAILED/TIMED_OUT/CANCELLED/BUDGET_EXCEEDED` all re-enter at `QUEUED` for retry/resume.

**WebSocket:** `Hub` fans out task completion events to all connected clients. `Server.StartHub()` must be called before `ListenAndServe`.

### Sandbox Lifecycle (ContainerRunner (Docker-based))

When `agent.project_dir` is set:
1. `setupSandbox()` clones the project into `/tmp/claudomator-sandbox-*` via the "local" remote (bare repo), then falls back to "origin", then the working copy path.
2. The claude subprocess runs inside the sandbox.
3. After successful execution, `teardownSandbox()` auto-commits any uncommitted changes (after running a build if `Makefile`/`go.mod`/`gradlew` is present), then pushes new commits to the bare repo (`origin` from the sandbox's perspective). The sandbox is then removed.
4. On failure the sandbox is preserved and its path is returned in the error.
5. On BLOCKED (question written), the sandbox path is stored in `executions.sandbox_dir` so the resume execution can reuse it.

> **Known bug:** Variable shadowing in `claude.go` `Run()` means the outer `sandboxDir` is never assigned (both `setupSandbox` calls use `:=` inside nested blocks). This causes: (a) `teardownSandbox` is never called — work is discarded, sandboxes accumulate in `/tmp`; (b) `BlockedError.SandboxDir` is always `""`, so resume clones a fresh sandbox and loses the agent's partial work. See [Known Bugs](#known-bugs).

> **Known bug:** `teardownSandbox` hardcodes `origin/master` when rebasing on conflict. Repos using `main` will fail on concurrent push. See [Known Bugs](#known-bugs).

### Task YAML Format

```yaml
name: "My Task"
description: "Optional longer description"
agent:
  type: "claude"              # "claude" (default) or "gemini" (stub, not production-ready)
  model: "sonnet"             # optional; auto-classified by Classifier if omitted
  instructions: |
    Do something useful.
  project_dir: "/path/to/project"  # optional; triggers sandbox isolation
  max_budget_usd: 1.00
  permission_mode: "bypassPermissions"  # default; or "default", "acceptEdits"
  allowed_tools: ["Bash", "Read", "Edit"]
  disallowed_tools: []
  context_files: ["/extra/context/path"]
  system_prompt_append: "Extra instructions appended to system prompt."
  skip_planning: false        # if false, prepends planning/orchestration preamble
  additional_args: []         # extra flags forwarded verbatim to claude CLI
timeout: "15m"
priority: "normal"            # "high" | "normal" | "low" (stored but not yet used for scheduling)
tags: ["ci"]
depends_on: ["other-task-id"]
retry:
  max_attempts: 1             # stored but retry is currently manual via /resume
  backoff: "exponential"
```

> **Note:** The YAML key is `agent:`, not `claude:`. Earlier docs showed `claude:` which was wrong.

Batch files wrap multiple tasks under a `tasks:` key.

### Storage Schema

Two tables. Schema is auto-migrated additively on `storage.Open()` — new columns are `ALTER TABLE ... ADD COLUMN` statements that silently succeed if the column already exists.

```
tasks:      id, name, description, config_json, priority, timeout_ns, retry_json,
            tags_json, depends_on_json, parent_task_id, state, rejection_comment,
            question_json, summary, elaboration_input, interactions_json,
            created_at, updated_at

executions: id, task_id, start_time, end_time, exit_code, status, stdout_path,
            stderr_path, artifact_dir, cost_usd, error_msg, session_id,
            sandbox_dir, changestats_json, commits_json
```

JSON blobs: `config_json` (AgentConfig), `retry_json`, `tags_json`, `depends_on_json`, `interactions_json`, `changestats_json`, `commits_json`.

---

## Features

### Planning Preamble & Orchestration

When `agent.skip_planning` is false (the default), `withPlanningPreamble()` prepends a system-level prompt to the agent's instructions that:
- Instructs the agent to POST subtasks to `$CLAUDOMATOR_API_URL/api/tasks` and stop if the task will take more than ~3 minutes
- Instructs the agent to write a JSON question to `$CLAUDOMATOR_QUESTION_FILE` and exit if it needs user input
- Requires all changes to be committed before exit
- Requires a summary written to `$CLAUDOMATOR_SUMMARY_FILE`

Env vars injected into every execution: `CLAUDOMATOR_API_URL`, `CLAUDOMATOR_TASK_ID`, `CLAUDOMATOR_PROJECT_DIR`, `CLAUDOMATOR_QUESTION_FILE`, `CLAUDOMATOR_SUMMARY_FILE`.

### Changestats

After each execution, changestats (files changed, lines added/removed) are parsed from git `diff --stat` output in `stdout.log` and stored in `executions.changestats_json`.

> **Duplication debt:** Changestats are extracted in two places: `executor.Pool.handleRunResult()` and `api.Server.processResult()`. Both write the same value to the same row (idempotent), but the double-extraction is confusing and should be consolidated. See [Design Debt](#design-debt).

**Parser:** `internal/task/changestats.go` — `ParseChangestatFromOutput`, `ParseChangestatFromFile`.

**Frontend:** `web/app.js` renders a `.changestats-badge` on COMPLETED/READY task cards.

### GitHub Webhook Integration

`POST /api/webhooks/github` accepts `check_run` and `workflow_run` events. Returns `{"task_id": "..."}` (200) on task creation or 204 if ignored.

#### Config (`~/.claudomator/config.toml`)

```toml
webhook_secret = "your-github-webhook-secret"   # HMAC-SHA256; skip validation if omitted

[[projects]]
name = "myrepo"
dir  = "/workspace/myrepo"
```

#### Matching logic

Repository name matched case-insensitively against each project's `name` and the basename of its `dir`. Falls back to the only configured project if no match found.

#### Task creation

Tasks created for:
- `check_run` with `action: completed` and `conclusion: failure`
- `workflow_run` with `action: completed` and `conclusion: failure` or `timed_out`

Tagged `["ci", "auto"]`, capped at $3 USD, allowed tools: Read, Edit, Bash, Glob, Grep.

### Elaborate Endpoint

`POST /api/tasks/elaborate` converts natural language → task JSON via a `claude --prompt` invocation. Optionally reads `CLAUDE.md` / `SESSION_STATE.md` from a configured working directory for context. Per-IP rate-limited.

> **Implementation gap:** The elaborate endpoint is not tested against real Claude invocations. `sanitizeElaboratedTask()` uses keyword heuristics to infer missing tools (fragile). No caching.

### Model Classifier

`executor.Classifier` calls the Gemini CLI (`gemini-2.5-flash-lite`) to pick the best Claude or Gemini model for a task. Falls back to the default model (`sonnet`) if Gemini fails. Agent type is selected first by load balancer; classifier only picks the model within that agent.

> **Implementation gap:** Output parsing is brittle — strips `"Loaded cached credentials."` lines and markdown fences by string matching. No fallback if Gemini CLI isn't installed. Classification results are not cached or logged for learning.

---


---

## Design Debt

### GeminiRunner is a non-functional stub

`internal/executor/gemini.go` `execOnce()` does not run the `gemini` binary. It starts a goroutine that writes hardcoded fake JSON to a pipe. `parseGeminiStream()` strips markdown fences but does no semantic parsing. There is no session/resume support.

Any task with `agent.type: "gemini"` will silently return canned output. This is dangerous in production.

**Decision needed:** Either implement GeminiRunner properly (subprocess + stream parsing + sandbox integration mirroring ClaudeRunner) or remove it and the `Classifier` from the codebase until it's ready.

### Priority field is stored but never used

`task.Priority` (`high`, `normal`, `low`) is persisted in SQLite and surfaced in the API. The executor `dispatch()` goroutine uses a simple FIFO channel (`workCh`) with no priority ordering.

### RetryConfig is stored but retry is manual

`task.RetryConfig.MaxAttempts` and `Backoff` are parsed and stored. No code reads them during execution. Retries must be triggered manually via `POST /api/tasks/{id}/resume`.

### Changestats extracted in two places

`executor.Pool.handleRunResult()` and `api.Server.processResult()` both call `task.ParseChangestatFromFile()` and write to `executions.changestats_json`. The second write is idempotent but wasteful and confusing. One of the two should be removed.

### context.Background() in resume path

`api.Server.handleAnswerQuestion()` calls `p.SubmitResume(context.Background(), ...)`. If the HTTP request context is cancelled, the resume still runs. Inversely, if the server shuts down, in-flight resumes using the server's root context would be cancelled while this one would not. Should use a long-lived server-level context, not `Background()`.

### Non-transactional execution creation

`pool.execute()` calls `store.CreateExecution(exec)` followed by `store.UpdateTaskState(t.ID, task.StateRunning)` as separate statements. If the server crashes between them, the task stays PENDING while an execution record exists with status RUNNING. Recovery (`RecoverStaleRunning`) partially handles this but the root cause is the missing transaction.

### Elaborate/validate cmd path indirection

`Server` has two separate fields `elaborateCmdPath` and `validateCmdPath` that override `claudeBinPath` only for tests. This is a testing-time seam that leaks into the production struct. A cleaner approach would be to inject an `Elaborator` interface.

### `withFailureHistory` mutates a shallow copy

In `executor.go`, `withFailureHistory` creates a copy of the task struct (`copy := *t`) but `copy.Agent = t.Agent` copies the struct value — slices inside AgentConfig (`AllowedTools`, `DisallowedTools`, etc.) share the backing array. Appending to `SystemPromptAppend` is safe but any mutation of slices would affect the original.

### Additive migration strategy is fragile

`storage.migrate()` lists every `ALTER TABLE ADD COLUMN` statement in code order. The only idempotency guard is catching "column already exists" errors. There is no migration version tracking. Columns dropped in `CREATE TABLE IF NOT EXISTS` and added back via ALTER are indistinguishable from new columns. Concurrent server instances running migrations simultaneously have no protection.

---

## REST API Reference

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/tasks` | List tasks; `?state=RUNNING&since=<RFC3339>&limit=50` |
| POST | `/api/tasks` | Create task (JSON body) |
| GET | `/api/tasks/{id}` | Get task |
| DELETE | `/api/tasks/{id}` | Delete task + subtasks + executions |
| POST | `/api/tasks/{id}/run` | Submit PENDING task to executor |
| POST | `/api/tasks/{id}/cancel` | Cancel RUNNING/QUEUED task |
| POST | `/api/tasks/{id}/accept` | Accept READY task → COMPLETED |
| POST | `/api/tasks/{id}/reject` | Reject READY task → PENDING |
| POST | `/api/tasks/{id}/answer` | Answer BLOCKED task question → QUEUED |
| POST | `/api/tasks/{id}/resume` | Resume FAILED/TIMED_OUT/CANCELLED task |
| GET | `/api/tasks/{id}/subtasks` | List subtasks |
| GET | `/api/tasks/{id}/executions` | List execution history |
| GET | `/api/executions/{id}` | Get execution |
| GET | `/api/executions/{id}/log` | Get execution log (`?tail=100`) |
| GET | `/api/executions/{id}/logs/stream` | Stream logs as SSE |
| GET | `/api/tasks/{id}/logs/stream` | Stream latest execution logs |
| GET | `/api/executions` | List recent executions across all tasks |
| GET | `/api/tasks/{id}/deployment-status` | Poll deployment readiness |
| POST | `/api/tasks/elaborate` | Convert natural language → task JSON |
| POST | `/api/tasks/validate` | Validate task JSON |
| POST | `/api/scripts/{name}` | Run named script with task context |
| GET | `/api/ws` | WebSocket upgrade (live task updates) |
| GET | `/api/workspaces` | List directories under `workspace_root` |
| GET | `/api/health` | Server health |
| POST | `/api/webhooks/github` | GitHub CI webhook |

---

## ADRs

See `docs/adr/001-language-and-architecture.md` for the Go + SQLite + WebSocket rationale.