From 5303a68d67e435da863353cdce09fa2e3a8c2ccd Mon Sep 17 00:00:00 2001 From: Peter Stone Date: Fri, 13 Mar 2026 03:14:40 +0000 Subject: feat: resume support, summary extraction, and task state improvements - Extend Resume to CANCELLED, FAILED, and BUDGET_EXCEEDED tasks - Add summary extraction from agent stdout stream-json output - Fix storage: persist stdout/stderr/artifact_dir paths in UpdateExecution - Clear question_json on ResetTaskForRetry - Resume BLOCKED tasks in preserved sandbox so Claude finds its session - Add planning preamble: CLAUDOMATOR_SUMMARY_FILE env var + summary step - Update ADR-002 with new state transitions - UI style improvements Co-Authored-By: Claude Sonnet 4.6 --- docs/adr/002-task-state-machine.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'docs/adr/002-task-state-machine.md') diff --git a/docs/adr/002-task-state-machine.md b/docs/adr/002-task-state-machine.md index 310c337..6910f6a 100644 --- a/docs/adr/002-task-state-machine.md +++ b/docs/adr/002-task-state-machine.md @@ -66,13 +66,13 @@ True terminal state (no outgoing transitions): `COMPLETED`. All other non-succes | `QUEUED` | `RUNNING` | Pool goroutine starts execution | | `QUEUED` | `CANCELLED` | `POST /api/tasks/{id}/cancel` | | `RUNNING` | `READY` | Runner exits 0, no question file, top-level task (`parent_task_id == ""`), and task has no subtasks | -| `RUNNING` | `BLOCKED` | Runner exits 0, no question file, top-level task (`parent_task_id == ""`), and task has subtasks | +| `RUNNING` | `BLOCKED` (subtasks) | Runner exits 0, no question file, top-level task (`parent_task_id == ""`), and task has subtasks | +| `RUNNING` | `BLOCKED` (question) | Runner exits 0 but left a `question.json` file in log dir (any task type) | | `RUNNING` | `COMPLETED` | Runner exits 0, no question file, subtask (`parent_task_id != ""`) | | `RUNNING` | `FAILED` | Runner exits non-zero or stream signals `is_error: true` | | `RUNNING` | `TIMED_OUT` | Context deadline exceeded (`context.DeadlineExceeded`) | | `RUNNING` | `CANCELLED` | Context cancelled (`context.Canceled`) | | `RUNNING` | `BUDGET_EXCEEDED` | `--max-budget-usd` exceeded (signalled by runner) | -| `RUNNING` | `BLOCKED` | Runner exits 0 but left a `question.json` file in log dir | | `READY` | `COMPLETED` | `POST /api/tasks/{id}/accept` | | `READY` | `PENDING` | `POST /api/tasks/{id}/reject` (with optional comment) | | `FAILED` | `QUEUED` | Retry (manual re-run via `POST /api/tasks/{id}/run`) | @@ -85,7 +85,7 @@ True terminal state (no outgoing transitions): `COMPLETED`. All other non-succes ## Implementation **Validation:** `task.ValidTransition(from, to State) bool` -(`internal/task/task.go:93`) — called by API handlers before every state change. +(`internal/task/task.go:123`) — called by API handlers before every state change. **State writes:** `storage.DB.UpdateTaskState(id, state)` — single source of write; called by both API handlers and the executor pool. @@ -158,9 +158,9 @@ Task lifecycle changes produce WebSocket broadcasts to all connected clients: ## Known Limitations and Edge Cases -- **`BUDGET_EXCEEDED` transition.** `BUDGET_EXCEEDED` appears in `terminalFailureStates` - (used by `waitForDependencies`) but has no outgoing transitions in `ValidTransition`, - making it permanently terminal. There is no `/resume` endpoint for it. +- **`BUDGET_EXCEEDED` retry.** `BUDGET_EXCEEDED → QUEUED` is a valid transition (retry via + `POST /run`), matching `FAILED` and `CANCELLED` behaviour. However, there is no dedicated + `/resume` endpoint for it — callers must use the standard `/run` restart path. - **Retry enforcement.** `RetryConfig.MaxAttempts` is stored but not enforced by the pool. The API allows unlimited manual retries via `POST /run` from `FAILED`. @@ -178,9 +178,9 @@ Task lifecycle changes produce WebSocket broadcasts to all connected clients: | Concern | File | Lines | |---|---|---| -| State constants | `internal/task/task.go` | 7–18 | -| `ValidTransition` | `internal/task/task.go` | 93–109 | -| State machine tests | `internal/task/task_test.go` | 8–72 | +| State constants | `internal/task/task.go` | 9–20 | +| `ValidTransition` | `internal/task/task.go` | 107–130 | +| State machine tests | `internal/task/task_test.go` | 8–75 | | Pool execute | `internal/executor/executor.go` | 194–303 | | Pool executeResume | `internal/executor/executor.go` | 116–185 | | Dependency wait | `internal/executor/executor.go` | 305–340 | -- cgit v1.2.3