summaryrefslogtreecommitdiff
path: root/.agent/design.md
blob: 27c7601b3cde37aaeccc5c2b9bcb058ea782228a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
# Claudomator — System Architecture

## 1. System Purpose and Design Goals

Claudomator is a local developer tool that captures tasks, dispatches them to AI agents (Claude,
Gemini), and reports results. Its primary use case is unattended, automated execution of agent
tasks with real-time status streaming to a mobile Progressive Web App.

**Design goals:**

- **Single binary, zero runtime deps** — Go for the backend; CGo only for SQLite.
- **Bounded concurrency** — a pool of goroutines prevents unbounded subprocess spawning.
- **Durability** — all task state survives server restarts (SQLite + WAL mode).
- **Real-time feedback** — WebSocket pushes task completion events to connected clients.
- **Multi-agent routing** — deterministic load balancing across Claude and Gemini; AI-driven
  model-tier selection via a cheap Gemini classifier.
- **Git-isolated execution** — tasks that modify source code run in a temporary git clone
  (sandbox) to prevent concurrent corruption and partial-work leakage.
- **Review gate** — top-level tasks wait in `READY` state for operator accept/reject before
  reaching `COMPLETED`.

See [ADR-001](adr/001-language-and-architecture.md) for the language and architecture rationale.

---

## 2. High-Level Architecture

```mermaid
flowchart TD
    CLI["CLI\ncmd/claudomator\n(cobra)"]
    API["HTTP API\ninternal/api\nREST + WebSocket"]
    Pool["Executor Pool\ninternal/executor.Pool\n(bounded goroutines)"]
    Classifier["Gemini Classifier\ninternal/executor.Classifier"]
    ClaudeRunner["ClaudeRunner\ninternal/executor.ClaudeRunner"]
    GeminiRunner["GeminiRunner\ninternal/executor.GeminiRunner"]
    Sandbox["Git Sandbox\n/tmp/claudomator-sandbox-*"]
    Subprocess["AI subprocess\nclaude -p / gemini"]
    SQLite["SQLite\ntasks.db"]
    LogFiles["Log Files\n~/.claudomator/executions/"]
    Hub["WebSocket Hub\ninternal/api.Hub"]
    Clients["Clients\nbrowser / mobile PWA"]
    Notifier["Notifier\ninternal/notify"]

    CLI -->|run / serve| API
    CLI -->|direct YAML run| Pool
    API -->|POST /api/tasks/run| Pool
    Pool -->|pickAgent + Classify| Classifier
    Pool -->|Submit / SubmitResume| ClaudeRunner
    Pool -->|Submit / SubmitResume| GeminiRunner
    ClaudeRunner -->|git clone| Sandbox
    Sandbox -->|claude -p| Subprocess
    GeminiRunner -->|gemini| Subprocess
    Subprocess -->|stream-json stdout| LogFiles
    Pool -->|UpdateTaskState| SQLite
    Pool -->|resultCh| API
    API -->|Broadcast| Hub
    Hub -->|fan-out| Clients
    API -->|Notify| Notifier
```

---

## 3. Component Table

| Package | Role | Key Exported Types |
|---|---|---|
| `internal/task` | `Task` struct, YAML parsing, state machine, validation | `Task`, `AgentConfig`, `RetryConfig`, `State`, `Priority`, `ValidTransition` |
| `internal/executor` | Bounded goroutine pool; subprocess manager; multi-agent routing; classification | `Pool`, `ClaudeRunner`, `GeminiRunner`, `Classifier`, `Runner`, `Result`, `BlockedError`, `QuestionRegistry` |
| `internal/storage` | SQLite wrapper; auto-migrating schema; all task and execution CRUD | `DB`, `Execution`, `TaskFilter` |
| `internal/api` | HTTP server (REST + WebSocket); result forwarding; elaboration; log streaming | `Server`, `Hub` |
| `internal/reporter` | Formats and emits execution results (text, HTML) | `Reporter`, `TextReporter`, `HTMLReporter` |
| `internal/config` | TOML config loading; data-directory layout | `Config` |
| `internal/cli` | Cobra CLI commands (`run`, `serve`, `list`, `status`, `init`) | `RootCmd` |
| `internal/notify` | Webhook notifier for task completion events | `Notifier`, `WebhookNotifier`, `Event` |
| `web` | Embedded PWA static files (served by `internal/api`) | `Files` (embed.FS) |
| `version` | Build-time version string | `Version` |

---

## 4. Package Dependency Graph

```mermaid
graph LR
    cli["internal/cli"]
    api["internal/api"]
    executor["internal/executor"]
    storage["internal/storage"]
    task["internal/task"]
    config["internal/config"]
    reporter["internal/reporter"]
    notify["internal/notify"]
    web["web"]
    version["version"]

    cli --> api
    cli --> executor
    cli --> storage
    cli --> config
    cli --> version
    api --> executor
    api --> storage
    api --> task
    api --> notify
    api --> web
    executor --> storage
    executor --> task
    reporter --> task
    reporter --> storage
    storage --> task
```

---

## 5. Task Execution Pipeline

The following numbered steps trace a task from API submission to final state, with file and line
references to the key logic in each step.

1. **Task creation** — `POST /api/tasks` calls `task.Validate` and `storage.DB.CreateTask`.
   Task is written to SQLite in `PENDING` state.
   (`internal/api/server.go:349`, `internal/task/parse.go`, `internal/storage/db.go`)

2. **Run request** — `POST /api/tasks/{id}/run` calls `storage.DB.ResetTaskForRetry` (validates
   the `PENDING → QUEUED` transition) then `executor.Pool.Submit`.
   (`internal/api/server.go:460`, `internal/executor/executor.go:125`)

3. **Pool dispatch** — The `dispatch` goroutine reads from `workCh`, waits for a free slot
   (blocks on `doneCh` if at capacity), then spawns `go execute(ctx, task)`.
   (`internal/executor/executor.go:102`)

4. **Agent selection** — `pickAgent(SystemStatus)` selects the available agent with the fewest
   active tasks (deterministic, no I/O). If the pool has a `Classifier`, it invokes
   `Classifier.Classify` (one Gemini API call) to select the model tier; failures are non-fatal.
   (`internal/executor/executor.go:349`, `internal/executor/executor.go:396`)

5. **Dependency wait** — If `t.DependsOn` is non-empty, `waitForDependencies` polls SQLite
   every 5 s until all dependencies reach `COMPLETED` (or a terminal failure state).
   (`internal/executor/executor.go:642`)

6. **Execution record created** — A new `storage.Execution` row is inserted with `RUNNING`
   status. Log paths (`stdout.log`, `stderr.log`) are pre-populated via `LogPather` so they are
   immediately available for tailing.
   (`internal/executor/executor.go:483`)

7. **Subprocess launch** — `ClaudeRunner.Run` (or `GeminiRunner.Run`) builds the CLI argument
   list and calls `exec.CommandContext`. If `project_dir` is set and this is not a resume
   execution, `setupSandbox` clones the project to `/tmp/claudomator-sandbox-*` first.
   (`internal/executor/claude.go:63`, `internal/executor/claude.go:setupSandbox`)

8. **Output streaming** — stdout is written to `<LogDir>/<execID>/stdout.log`; the runner
   concurrently parses the `stream-json` lines for cost and session ID.
   (`internal/executor/claude.go`)

9. **Execution outcome → state** — After the subprocess exits, `handleRunResult` maps the error
   type to a final task state and calls `storage.DB.UpdateTaskState`.
   (`internal/executor/executor.go:256`)

   | Outcome | Final state |
   |---|---|
   | `runner.Run` → `nil`, top-level, no subtasks | `READY` |
   | `runner.Run` → `nil`, top-level, has subtasks | `BLOCKED` |
   | `runner.Run` → `nil`, subtask | `COMPLETED` |
   | `runner.Run` → `*BlockedError` (question file) | `BLOCKED` |
   | `ctx.Err() == DeadlineExceeded` | `TIMED_OUT` |
   | `ctx.Err() == Canceled` | `CANCELLED` |
   | quota exhausted | `BUDGET_EXCEEDED` |
   | any other error | `FAILED` |

10. **Result broadcast** — The pool emits a `*Result` to `resultCh`. `Server.forwardResults`
    reads it, marshals a `task_completed` JSON event, and calls `hub.Broadcast`.
    (`internal/api/server.go:123`, `internal/api/server.go:129`)

11. **Sandbox teardown** — If a sandbox was used and no uncommitted changes remain,
    `teardownSandbox` removes the temp directory. If uncommitted changes are detected, the task
    fails and the sandbox is preserved for inspection.
    (`internal/executor/claude.go:teardownSandbox`)

12. **Review gate** — Operator calls `POST /api/tasks/{id}/accept` (`READY → COMPLETED`) or
    `POST /api/tasks/{id}/reject` (`READY → PENDING`).
    (`internal/api/server.go:487`, `internal/api/server.go:507`)

---

## 6. Task State Machine

```mermaid
stateDiagram-v2
    [*] --> PENDING : task created

    PENDING --> QUEUED : POST /run
    PENDING --> CANCELLED : POST /cancel

    QUEUED --> RUNNING : pool goroutine starts
    QUEUED --> CANCELLED : POST /cancel

    RUNNING --> READY : exit 0, top-level, no subtasks
    RUNNING --> BLOCKED : exit 0, top-level, has subtasks
    RUNNING --> BLOCKED : question.json written
    RUNNING --> COMPLETED : exit 0, subtask
    RUNNING --> FAILED : exit non-zero / stream error
    RUNNING --> TIMED_OUT : context deadline exceeded
    RUNNING --> CANCELLED : context cancelled
    RUNNING --> BUDGET_EXCEEDED : quota exhausted

    READY --> COMPLETED : POST /accept
    READY --> PENDING : POST /reject

    BLOCKED --> QUEUED : POST /answer
    BLOCKED --> READY : all subtasks COMPLETED

    FAILED --> QUEUED : POST /run (retry)
    TIMED_OUT --> QUEUED : POST /resume
    CANCELLED --> QUEUED : POST /run (restart)
    BUDGET_EXCEEDED --> QUEUED : POST /run (retry)

    COMPLETED --> [*]
```

**State definitions** (`internal/task/task.go:9`):

| State | Meaning |
|---|---|
| `PENDING` | Created; not yet submitted for execution |
| `QUEUED` | Submitted to pool; waiting for a goroutine slot |
| `RUNNING` | Subprocess actively executing |
| `READY` | Top-level task done; awaiting operator accept/reject |
| `COMPLETED` | Fully done (only true terminal state) |
| `FAILED` | Execution error; eligible for retry |
| `TIMED_OUT` | Exceeded configured timeout; resumable |
| `CANCELLED` | Explicitly cancelled by operator |
| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` |
| `BLOCKED` | Agent wrote a `question.json`, or parent waiting for subtasks |

`ValidTransition(from, to State) bool` enforces the allowed edges at runtime before every state
write. (`internal/task/task.go:113`)

---

## 7. WebSocket Broadcast Flow

```mermaid
sequenceDiagram
    participant Runner as ClaudeRunner / GeminiRunner
    participant Pool as executor.Pool
    participant API as api.Server (forwardResults)
    participant Hub as api.Hub
    participant Client1 as WebSocket client 1
    participant Client2 as WebSocket client 2

    Runner->>Pool: runner.Run() returns
    Pool->>Pool: handleRunResult() — sets exec.Status
    Pool->>SQLite: UpdateTaskState + UpdateExecution
    Pool->>Pool: resultCh <- &Result{...}
    API->>Pool: <-pool.Results()
    API->>API: marshal task_completed JSON event
    API->>Hub: hub.Broadcast(data)
    Hub->>Client1: ws.Write(data)
    Hub->>Client2: ws.Write(data)
    API->>Notifier: notifier.Notify(Event{...}) [if set]
```

**Event payload** (JSON):
```json
{
  "type":      "task_completed",
  "task_id":   "<uuid>",
  "status":    "READY | COMPLETED | FAILED | ...",
  "exit_code": 0,
  "cost_usd":  0.042,
  "error":     "",
  "timestamp": "2026-03-11T12:00:00Z"
}
```

The `Hub` also emits `task_question` events via `Server.BroadcastQuestion` when an agent uses
an interactive question tool (currently unused in the primary file-based `BLOCKED` flow).

WebSocket endpoint: `GET /api/ws`. Supports optional bearer-token auth when `--api-token` is
configured. Up to 1000 concurrent clients; periodic 30-second pings detect dead connections.
(`internal/api/websocket.go`)

---

## 8. Subtask / Parent-Task Dependency Resolution

Claudomator supports two distinct dependency mechanisms:

### 8a. `depends_on` — explicit task ordering

Tasks declare `depends_on: [<task-id>, ...]` in their YAML or creation payload. When the
executor goroutine starts, `waitForDependencies` polls SQLite every 5 seconds until all listed
tasks reach `COMPLETED`. If any dependency reaches a terminal failure state (`FAILED`,
`TIMED_OUT`, `CANCELLED`, `BUDGET_EXCEEDED`), the waiting task transitions to `FAILED`
immediately. (`internal/executor/executor.go:642`)

### 8b. Parent / subtask blocking (`BLOCKED` state)

A top-level task that creates subtasks (tasks with `parent_task_id` pointing back to it)
transitions to `BLOCKED` — not `READY` — when its own runner exits successfully. This allows
an agent to dispatch subtasks and then wait for them.

```
Parent task runner exits 0
    ├── no subtasks → READY (normal review gate)
    └── has subtasks → BLOCKED (waiting for subtasks)
                          │
            Each subtask completes → COMPLETED
                          │
            All subtasks COMPLETED?
                  ├── yes → maybeUnblockParent() → parent READY
                  └── no  → parent stays BLOCKED
```

`maybeUnblockParent(parentID)` is called every time a subtask transitions to `COMPLETED`. It
loads all subtasks and checks whether every one is `COMPLETED`. If so, it calls
`UpdateTaskState(parentID, StateReady)`. (`internal/executor/executor.go:616`)

The `BLOCKED` state also covers the interactive question flow: when `ClaudeRunner.Run` detects
a `question.json` file in the execution log directory, it returns a `*BlockedError` containing
the question JSON and session ID. The pool stores the question in the `tasks.question_json`
column and the session ID in the `executions.session_id` column. `POST /api/tasks/{id}/answer`
resumes the task by calling `pool.SubmitResume` with a new `Execution` carrying
`ResumeSessionID` and `ResumeAnswer`. (`internal/executor/claude.go:103`,
`internal/api/server.go:221`)

---

## 9. External Go Dependencies

| Module | Version | Purpose |
|---|---|---|
| `github.com/mattn/go-sqlite3` | v1.14.33 | CGo SQLite driver (requires C compiler) |
| `github.com/google/uuid` | v1.6.0 | UUID generation for task and execution IDs |
| `github.com/spf13/cobra` | v1.10.2 | CLI command framework |
| `github.com/BurntSushi/toml` | v1.6.0 | TOML config file parsing |
| `golang.org/x/net` | v0.49.0 | `golang.org/x/net/websocket` — WebSocket server |
| `gopkg.in/yaml.v3` | v3.0.1 | YAML task definition parsing |

---

## 10. Related Documentation

### Architectural Decision Records

| ADR | Title | Summary |
|---|---|---|
| [ADR-001](adr/001-language-and-architecture.md) | Go + SQLite + WebSocket Architecture | Language choice, pipeline design, storage and API rationale |
| [ADR-002](adr/002-task-state-machine.md) | Task State Machine Design | All 10 states, transition table, side effects, known edge cases |
| [ADR-003](adr/003-security-model.md) | Security Model | Trust boundary, no-auth posture, known risks, hardening checklist |
| [ADR-004](adr/004-multi-agent-routing-and-classification.md) | Multi-Agent Routing | `pickAgent` load balancing, Gemini-based model classifier |
| [ADR-005](adr/005-sandbox-execution-model.md) | Git Sandbox Execution Model | Isolated git clone per task, push-back flow, BLOCKED preservation |

### Package-Level Docs

Per-package design notes live in [`docs/packages/`](packages/) (in progress).

### Task YAML Reference

```yaml
name: "My Task"
agent:
  type: "claude"               # "claude" | "gemini"; optional — load balancer may override
  model: "sonnet"              # model tier hint; optional — classifier may override
  instructions: |
    Do something useful.
  project_dir: "/workspace/myproject"   # if set, runs in a git sandbox
  max_budget_usd: 1.00
  permission_mode: "bypassPermissions"  # default
  allowed_tools: ["Bash", "Read"]
  context_files: ["README.md"]
timeout: "15m"
priority: "normal"             # high | normal | low
tags: ["ci", "backend"]
depends_on: ["<task-uuid>"]    # explicit ordering
parent_task_id: "<task-uuid>"  # set by parent agent when creating subtasks
```

Batch files wrap multiple tasks under a `tasks:` key and are accepted by `claudomator run`.

### Storage Schema

Two tables auto-migrated on `storage.Open()`:

- **`tasks`** — `id`, `name`, `description`, `config_json` (AgentConfig), `priority`,
  `timeout_ns`, `retry_json`, `tags_json`, `depends_on_json`, `parent_task_id`, `state`,
  `question_json`, `rejection_comment`, `created_at`, `updated_at`
- **`executions`** — `id`, `task_id`, `start_time`, `end_time`, `exit_code`, `status`,
  `stdout_path`, `stderr_path`, `artifact_dir`, `cost_usd`, `error_msg`, `session_id`,
  `resume_session_id`, `resume_answer`

Indexed columns: `tasks.state`, `tasks.parent_task_id`, `executions.task_id`,
`executions.status`, `executions.start_time`.