1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
|
# ADR-002: Task State Machine Design
## Status
Accepted
## Context
Claudomator tasks move through a well-defined lifecycle: from creation through
queuing, execution, and final resolution. The lifecycle must handle asynchronous
execution (subprocess), user interaction (review, Q&A), retries, and cancellation.
## States
| State | Meaning |
|---|---|
| `PENDING` | Task created; not yet submitted for execution |
| `QUEUED` | Submitted to executor pool; goroutine slot may be waiting |
| `RUNNING` | Claude subprocess is actively executing |
| `READY` | Top-level task completed execution; awaiting user accept/reject |
| `COMPLETED` | Task is fully done (terminal) |
| `FAILED` | Execution error occurred; eligible for retry (terminal if retries exhausted) |
| `TIMED_OUT` | Task exceeded configured timeout; resumable or retryable (terminal if not resumed) |
| `CANCELLED` | Explicitly cancelled by user or API (terminal) |
| `BUDGET_EXCEEDED` | Exceeded `max_budget_usd` (terminal) |
| `BLOCKED` | Agent paused and wrote a question file; awaiting user answer |
Terminal states with no outgoing transitions: `COMPLETED`, `CANCELLED`, `BUDGET_EXCEEDED`.
## State Transition Diagram
```
┌─────────┐
│ PENDING │◄───────────────────────────────┐
└────┬────┘ │
POST /run │ POST /reject │
POST /cancel │ │
┌────▼────┐ ┌──────┴─────┐
┌────────┤ QUEUED ├─────────────┐ │ READY │
│ └────┬────┘ │ └──────┬─────┘
POST /cancel │ │ POST /accept │
│ pool picks up │ ▼
▼ ▼ │ ┌─────────────┐
┌──────────┐ ┌─────────┐ │ │ COMPLETED │
│CANCELLED │◄──┤ RUNNING ├──────────────┘ └─────────────┘
└──────────┘ └────┬────┘
│
┌───────────────┼───────────────────┬───────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌─────────────┐ ┌─────────┐
│ FAILED │ │ TIMED_OUT │ │ BUDGET │ │ BLOCKED │
└────┬─────┘ └──────┬───────┘ │ _EXCEEDED │ └────┬────┘
│ │ └─────────────┘ │
retry │ resume/ │ POST /answer
│ retry │ │
└────────┬───────┘ │
▼ │
QUEUED ◄────────────────────────────────────┘
```
### Transition Table
| From | To | Trigger |
|---|---|---|
| `PENDING` | `QUEUED` | `POST /api/tasks/{id}/run` |
| `PENDING` | `CANCELLED` | `POST /api/tasks/{id}/cancel` |
| `QUEUED` | `RUNNING` | Pool goroutine starts execution |
| `QUEUED` | `CANCELLED` | `POST /api/tasks/{id}/cancel` |
| `RUNNING` | `READY` | Runner exits 0, no question file, top-level task (`parent_task_id == ""`) |
| `RUNNING` | `COMPLETED` | Runner exits 0, no question file, subtask (`parent_task_id != ""`) |
| `RUNNING` | `FAILED` | Runner exits non-zero or stream signals `is_error: true` |
| `RUNNING` | `TIMED_OUT` | Context deadline exceeded (`context.DeadlineExceeded`) |
| `RUNNING` | `CANCELLED` | Context cancelled (`context.Canceled`) |
| `RUNNING` | `BUDGET_EXCEEDED` | `--max-budget-usd` exceeded (signalled by runner) |
| `RUNNING` | `BLOCKED` | Runner exits 0 but left a `question.json` file in log dir |
| `READY` | `COMPLETED` | `POST /api/tasks/{id}/accept` |
| `READY` | `PENDING` | `POST /api/tasks/{id}/reject` (with optional comment) |
| `FAILED` | `QUEUED` | Retry (manual re-run via `POST /api/tasks/{id}/run`) |
| `TIMED_OUT` | `QUEUED` | `POST /api/tasks/{id}/resume` (resumes with session ID) |
| `BLOCKED` | `QUEUED` | `POST /api/tasks/{id}/answer` (resumes with user answer) |
## Implementation
**Validation:** `task.ValidTransition(from, to State) bool`
(`internal/task/task.go:93`) — called by API handlers before every state change.
**State writes:** `storage.DB.UpdateTaskState(id, state)` — single source of
write; called by both API handlers and the executor pool.
**Execution outcome → state mapping** (in `executor.Pool.execute` and `executeResume`):
```
runner.Run() returns nil AND parent_task_id == "" → READY
runner.Run() returns nil AND parent_task_id != "" → COMPLETED
runner.Run() returns *BlockedError → BLOCKED (question stored)
ctx.Err() == DeadlineExceeded → TIMED_OUT
ctx.Err() == Canceled → CANCELLED
any other error → FAILED
```
## Key Invariants
1. **`READY` is top-level only.** Subtasks (tasks with `parent_task_id != ""`) skip
`READY` and go directly to `COMPLETED` on success. This allows parent review
flows without requiring per-subtask acknowledgement.
2. **`BLOCKED` requires a `session_id`.** The executor persists `session_id` in
the execution record at start time. `POST /answer` retrieves it via
`GetLatestExecution` to resume the correct Claude session.
3. **`DELETE` is blocked for `RUNNING` and `QUEUED` tasks.** (`server.go:127`)
4. **`CANCELLED` is reachable from `PENDING`, `QUEUED`, and `RUNNING`.** For
`RUNNING` tasks, `pool.Cancel(taskID)` sends a context cancellation signal;
the state transition happens asynchronously when the goroutine exits. For
`PENDING`/`QUEUED` tasks, `UpdateTaskState` is called directly.
5. **Dependency waiting.** Tasks with `depends_on` stay conceptually in `QUEUED`
(goroutine running but not executing) while `waitForDependencies` polls until
all dependencies reach `COMPLETED`. If any dependency reaches a terminal
failure state, the waiting task transitions to `FAILED`.
6. **Retry re-enters at `QUEUED`.** `FAILED → QUEUED` and `TIMED_OUT → QUEUED`
are the only back-edges. Retry is manual (caller must call `/run` again); the
`RetryConfig.MaxAttempts` field exists but enforcement is left to callers.
## Side Effects on Transition
| Transition | Side effects |
|---|---|
| → `QUEUED` | `pool.Submit()` called; execution record created in DB with log paths |
| → `RUNNING` | `store.UpdateTaskState`; cancel func registered in `pool.cancels` |
| → `BLOCKED` | `store.UpdateTaskQuestion(taskID, questionJSON)`; session_id in execution |
| → `READY/COMPLETED/FAILED/TIMED_OUT/CANCELLED/BUDGET_EXCEEDED` | Execution end time set; `store.UpdateExecution`; result broadcast via WebSocket (`pool.resultCh → forwardResults → hub.Broadcast`) |
| `READY → PENDING` (reject) | `store.RejectTask(id, comment)` — stores `rejection_comment` |
| `BLOCKED → QUEUED` (answer) | `store.UpdateTaskQuestion(taskID, "")` clears question; new `storage.Execution` created with `ResumeSessionID` + `ResumeAnswer` |
## WebSocket Events
Task lifecycle changes produce WebSocket broadcasts to all connected clients:
- `task_completed` — emitted on any terminal or quasi-terminal transition (READY,
COMPLETED, FAILED, TIMED_OUT, CANCELLED, BUDGET_EXCEEDED, BLOCKED)
- `task_question` — emitted by `BroadcastQuestion` when an agent uses
`AskUserQuestion` (currently unused in the file-based flow; the file-based
mechanism is the primary BLOCKED path)
## Known Limitations and Edge Cases
- **`BUDGET_EXCEEDED` transition.** `BUDGET_EXCEEDED` appears in `terminalFailureStates`
(used by `waitForDependencies`) but has no outgoing transitions in `ValidTransition`,
making it permanently terminal. There is no `/resume` endpoint for it.
- **Retry enforcement.** `RetryConfig.MaxAttempts` is stored but not enforced by
the pool. The API allows unlimited manual retries via `POST /run` from `FAILED`.
- **`TIMED_OUT` resumability.** Only `POST /api/tasks/{id}/resume` resumes timed-out
tasks. Unlike BLOCKED, there is no question/answer — the resume message is
hardcoded: _"Your previous execution timed out. Please continue where you left off."_
- **Concurrent cancellation race.** If a task transitions `RUNNING → COMPLETED`
and `POST /cancel` is called in the same window, `pool.Cancel()` may return
`true` (cancel func still registered) even though the goroutine is finishing.
The goroutine's `ctx.Err()` check wins; the task ends in `COMPLETED`.
## Relevant Code Locations
| Concern | File | Lines |
|---|---|---|
| State constants | `internal/task/task.go` | 7–18 |
| `ValidTransition` | `internal/task/task.go` | 93–109 |
| State machine tests | `internal/task/task_test.go` | 8–72 |
| Pool execute | `internal/executor/executor.go` | 194–303 |
| Pool executeResume | `internal/executor/executor.go` | 116–185 |
| Dependency wait | `internal/executor/executor.go` | 305–340 |
| `BlockedError` | `internal/executor/claude.go` | 31–37 |
| Question file detection | `internal/executor/claude.go` | 103–110 |
| API state transitions | `internal/api/server.go` | 138–415 |
|