| Age | Commit message (Collapse) | Author |
|
Two fixes for BLOCKED task issues:
1. Allow BLOCKED → CANCELLED state transition so users can cancel tasks
stuck waiting for input. Adds Cancel button to BLOCKED task cards in
the UI alongside the question/answer controls.
2. Detect when agents write completion reports to $CLAUDOMATOR_QUESTION_FILE
instead of real questions. If the question JSON has no options and no "?"
in the text, treat it as a summary (stored on the execution) and fall
through to normal completion + sandbox teardown rather than blocking.
Also tightened the preamble to make the distinction explicit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The summary+interactions feature was already fully implemented but
lacked storage-layer tests. Added tests covering round-trip persistence
of task summaries, accumulation of Q&A interactions, and error handling
for nonexistent tasks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
All, Stats, Settings)
- Replace Tasks/Active tabs with Queue (QUEUED+PENDING), Interrupted, Ready top-level tabs
- Add All tab (COMPLETED, TIMED_OUT, BUDGET_EXCEEDED within last 24h) and Settings placeholder
- Export filterQueueTasks, filterReadyTasks, filterAllDoneTasks from app.js
- Refactor poll() to dispatch to active tab's render function instead of always rendering all panels
- Add renderQueuePanel, renderInterruptedPanel, renderReadyPanel, renderAllPanel helpers
- Add tests in web/test/tab-filters.test.mjs covering all new filter functions (16 tests)
- All 165 JS tests and all Go tests pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Previously appendRawNarrative was called with the server's default workDir
(os.Getwd()) when no project_dir was in the request, causing test runs and
any elaboration without a project to pollute the repo's own RAW_NARRATIVE.md.
The narrative is per-project human input — only write it when the caller
explicitly specifies which project they're working in.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
GeminiRunner.buildArgs was missing --yolo (auto-approve all tools)
so the gemini CLI only registered 3 tools (read_file, write_todos,
cli_help) and write_file was not available. Agents that needed to
create files silently failed (exit 0, no files written).
Also switch instructions from bare positional arg to -p flag, which
is required for non-interactive headless mode.
Update preamble tests to match file-based summary approach
(CLAUDOMATOR_SUMMARY_FILE) kept from the merge conflict resolution.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
Removed the switchTab() logic that hid btn-new-task on non-tasks tabs.
The button lives in the global header so no structural changes were needed.
Added new-task-button.test.mjs to contract-test the always-visible behavior.
|
|
Keep file-based summary approach (CLAUDOMATOR_SUMMARY_FILE) from HEAD.
Combine Q&A History and Stats tab CSS from both branches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add Pool.RecoverStaleQueued() that lists all QUEUED tasks from the DB on
startup and re-submits them to the in-memory pool. Previously, tasks that
were QUEUED when the server restarted would remain stuck indefinitely since
only RUNNING tasks were recovered (and marked FAILED).
Called in serve.go immediately after RecoverStaleRunning().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
practices
Add sanitizeElaboratedTask() called after every elaboration response:
- Infers missing allowed_tools from instruction keywords (Write/Edit/Read/Bash/Grep/Glob)
- Auto-adds Read when Edit is present
- Appends Acceptance Criteria section if none present
- Appends TDD reminder for coding tasks without test mention
Also tighten buildElaboratePrompt to require acceptance criteria and
list concrete tool examples, reducing how often the model omits tools.
Fixes class of failures where agents couldn't create files because
the elaborator omitted Write from allowed_tools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Extend Resume to CANCELLED, FAILED, and BUDGET_EXCEEDED tasks
- Add summary extraction from agent stdout stream-json output
- Fix storage: persist stdout/stderr/artifact_dir paths in UpdateExecution
- Clear question_json on ResetTaskForRetry
- Resume BLOCKED tasks in preserved sandbox so Claude finds its session
- Add planning preamble: CLAUDOMATOR_SUMMARY_FILE env var + summary step
- Update ADR-002 with new state transitions
- UI style improvements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Interrupted tasks (CANCELLED, FAILED, BUDGET_EXCEEDED) now support session
resume in addition to restart. Both buttons are shown on the task card.
- executor: extend resumablePoolStates to include CANCELLED, FAILED, BUDGET_EXCEEDED
- api: extend handleResumeTimedOutTask to accept all resumable states with
state-specific resume messages; replace hard-coded TIMED_OUT check with a
resumableStates map
- web: add RESUME_STATES set; render Resume + Restart buttons for interrupted
states; TIMED_OUT keeps Resume only
- tests: 5 new Go tests (TestResumeInterrupted_*); updated task-actions.test.mjs
with 17 tests covering dual-button behaviour
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Export computeTaskStats and computeExecutionStats from app.js
- Add renderStatsPanel with state count grid, KPI row (total/success-rate/cost/avg-duration), and outcome bar chart
- Wire stats tab into switchTab and poll for live refresh
- Add Stats tab button and panel to index.html
- Add CSS for .stats-counts, .stats-kpis, .stats-bar-chart using existing state color variables
- Add docs/stats-tab-plan.md with component structure and data flow
- 14 new unit tests in web/test/stats.test.mjs (140 total, all passing)
No backend changes — derives all metrics from existing /api/tasks and /api/executions endpoints.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add a mandatory '## Final Summary' section to planningPreamble
instructing agents to output a 2-5 sentence summary paragraph
(headed by '## Summary') as their last output before exiting.
Adds three tests to verify the section and its required content
are present in the preamble.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
A BLOCKED task that fails on resume would keep its stale question_json
after being restarted. The frontend then showed "waiting for your input"
with the old prompt even though the task was running fresh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
When a task ran in a sandbox (/tmp/claudomator-sandbox-*) and went BLOCKED,
Claude stored its session under the sandbox path as the project slug. The
resume execution was running in project_dir, causing Claude to look for the
session in the wrong project directory and fail with "No conversation found".
Fix: carry SandboxDir through BlockedError → Execution → resume execution,
and run the resume in that directory so the session lookup succeeds.
- BlockedError gains SandboxDir field; claude.go sets it on BLOCKED exit
- storage.Execution gains SandboxDir (persisted via new sandbox_dir column)
- executor.go stores blockedErr.SandboxDir in the execution record
- server.go copies SandboxDir from latest execution to the resume execution
- claude.go uses e.SandboxDir as working dir for resume when set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
|
|
executor: add 7 tests for sandboxCloneSource, setupSandbox, and
teardownSandbox (uncommitted-changes error, clean-no-commits removal).
api: fix two data races in WebSocket tests — wsPingInterval/Deadline
are now captured as locals before goroutine start; maxWsClients is
moved from a package-level var into Hub.maxClients (with SetMaxClients
method) so concurrent tests don't stomp each other.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
RAW_NARRATIVE.md: comprehensive chronological engineering history
reconstructed from the git log covering all 45 major milestones.
ADR-004: multi-agent routing — explicit load balancing in code (pickAgent)
plus Gemini-based model classification (Classifier), and why the two
decisions are intentionally separated.
ADR-005: git sandbox execution model — clone isolation, bare-repo push,
uncommitted-change enforcement, BLOCKED preservation, and session ID
propagation on second resume cycle.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Hoists the map out of ValidTransition so it's not reallocated on every
call. Adds missing CANCELLED→QUEUED and BUDGET_EXCEEDED→QUEUED entries
to the ADR transition table to match the implemented state machine.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
When a resumed execution is blocked again, SessionID was set to the new
exec's own UUID instead of the original ResumeSessionID. The next resume
would then pass the wrong --resume argument to claude and fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
pickAgent() deterministically selects the agent with the fewest active tasks,
skipping rate-limited agents. The classifier now only selects the model for the
pre-assigned agent, so Gemini gets tasks from the start rather than only as a
fallback when Claude's quota is exhausted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The elaborator now logs every user prompt to docs/RAW_NARRATIVE.md within the project directory. This is done in a background goroutine to ensure it doesn't delay the response.
|
|
The elaborator now reads CLAUDE.md and SESSION_STATE.md from the project directory (if they exist) and prepends their content to the user prompt. This allows the AI to generate tasks that are more context-aware.
|
|
Updated handleRunTask to use ResetTaskForRetry, which clears the agent type and model. This ensures that manually restarted tasks are always re-classified, allowing the system to switch to a different agent if the previous one is rate-limited. Also improved Claude quota-exhaustion detection.
|
|
|
|
The --config flag was registered but silently ignored. Now:
- config.LoadFile loads a TOML file on top of defaults
- PersistentPreRunE applies the file when --config is set
- Explicit CLI flags (--data-dir, --claude-bin) take precedence over the file
Tests: TestLoadFile_OverridesDefaults, TestLoadFile_MissingFile_ReturnsError,
TestRootCmd_ConfigFile_Loaded, TestRootCmd_ConfigFile_CLIFlagOverrides,
TestRootCmd_ConfigFile_Missing_ReturnsError
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Both execute() and executeResume() shared ~80% identical post-run logic:
error classification (BLOCKED, TIMED_OUT, CANCELLED, BUDGET_EXCEEDED, FAILED),
state transitions, result emission, and UpdateExecution. Extract this into
handleRunResult(ctx, t, exec, err, agentType) on *Pool. Both functions now
call it after runner.Run() returns.
Also adds TestHandleRunResult_SharedPath which directly exercises the new
function via a minimalMockStore, covering FAILED, READY, COMPLETED, and
TIMED_OUT classification paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Remove Claude field alias from Task struct (already removed in sandbox)
- Remove UnmarshalJSON from AgentConfig that silently accepted working_dir
- Remove legacy claude fallback in scanTask (db.go)
- Remove TestGetTask_BackwardCompatibility test that validated removed behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add maybeUnblockParent helper that transitions a BLOCKED parent task to
READY once every subtask is in the COMPLETED state. Called in both
execute() and executeResume() immediately after a subtask is marked
COMPLETED. Any non-COMPLETED sibling (RUNNING, FAILED, etc.) keeps the
parent BLOCKED.
Tests added:
- TestPool_Submit_LastSubtask_UnblocksParent
- TestPool_Submit_NotLastSubtask_ParentStaysBlocked
- TestPool_Submit_ParentNotBlocked_NoTransition
|
|
When a top-level task (ParentTaskID == "") finishes successfully,
check for subtasks before deciding the next state:
- subtasks exist → BLOCKED (waiting for subtasks to complete)
- no subtasks → READY (existing behavior, unchanged)
This applies to both execute() and executeResume().
Adds ListSubtasks to the Store interface.
Tests:
- TestPool_Submit_TopLevel_WithSubtasks_GoesBlocked
- TestPool_Submit_TopLevel_NoSubtasks_GoesReady
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Transition table: add BLOCKED→READY (trigger: all subtasks COMPLETED)
- Transition table: clarify RUNNING→READY only when no subtasks exist
- Transition table: add RUNNING→BLOCKED for parent-with-subtasks path
- Execution outcome mapping: reflect subtask check
- State diagram: show BLOCKED→READY arc
- Key Invariants: add #7 parent-with-subtasks goes BLOCKED on runner exit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
All previously ignored errors from p.store.UpdateTaskState() and
p.store.UpdateTaskQuestion() in execute() and executeResume() now log
with structured context (taskID, state, error).
Introduces a Store interface so tests can inject a failing mock store.
Adds TestPool_UpdateTaskState_DBError_IsLoggedAndResultDelivered to
verify that a DB write failure is logged and the result is still
delivered to resultCh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Replace BFS loop with a single recursive CTE to collect all descendant
task IDs in one query, and wrap all DELETE statements in a transaction
so a partial failure cannot leave orphaned executions.
Add TestDeleteTask_DeepSubtaskCascadeAtomic: creates a 3-level task
hierarchy with executions at each level, deletes the root, and verifies
all tasks and executions are removed with an explicit orphan-row check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add two schema indexes that were missing:
- idx_executions_start_time on executions(start_time): speeds up
ListRecentExecutions WHERE start_time >= ? ORDER BY start_time DESC
- idx_tasks_parent_task_id on tasks(parent_task_id): speeds up
ListSubtasks WHERE parent_task_id = ?
Both use CREATE INDEX IF NOT EXISTS so they are safe to apply on
existing databases without a migration version bump.
Add TestListRecentExecutions_LargeDataset (100 rows, two tasks) covering:
- returns all rows in descending start_time order
- respects the limit parameter
- filters correctly by since time
- filters correctly by task_id
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
response shapes
- handleListTasks: validate ?state= against known states, return 400 with clear
error for unrecognized values (e.g. ?state=BOGUS)
- handleCancelTask: replace {"status":"cancelling"|"cancelled"} with
{"message":"...","task_id":"..."} to match run/resume shape
- handleAnswerQuestion: replace {"status":"queued"} with
{"message":"task queued for resume","task_id":"..."}
- Tests: add TestListTasks_InvalidState_Returns400, TestListTasks_ValidState_Returns200,
TestCancelTask_ResponseShape, TestAnswerQuestion_ResponseShape,
TestRunTask_ResponseShape, TestResumeTimedOut_ResponseShape
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Replace the no-op mockRunner in server_test.go with a configurable
version that supports err and sleep fields. Add testServerWithRunner
helper and a pollState utility for async assertions.
Add three new tests that exercise the pool's error paths end-to-end:
- TestRunTask_AgentFails_TaskSetToFailed
- TestRunTask_AgentTimesOut_TaskSetToTimedOut
- TestRunTask_AgentCancelled_TaskSetToCancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The pgid-kill goroutine in execOnce() uses a select with both ctx.Done()
and the killDone channel. Add a detailed comment explaining why the goroutine
cannot block indefinitely: the killDone arm fires unconditionally when
cmd.Wait() returns (whether the process exited naturally or was killed),
so the goroutine always exits before execOnce() returns.
Add TestExecOnce_NoGoroutineLeak_OnNaturalExit to verify this: it samples
runtime.NumGoroutine() before and after execOnce() with a no-op binary
("true") and a background context (never cancelled), asserting no net
goroutine growth.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
|
|
Updated isQuotaExhausted to detect more Claude quota messages. Added 'rate limit reached (rejected)' to quota exhausted checks. Strengthened classifier prompt to explicitly forbid selecting rate-limited agents. Improved Pool to set 5h rate limit on quota exhaustion.
|
|
concurrent rejection
- Remove git pull into project_dir: working copy is the developer workspace
and should be pulled manually; www-data can't write to root-owned .git/objects
- On non-fast-forward push rejection (concurrent task pushed first), fetch and
rebase then retry once instead of failing the entire task
|
|
activePerAgent: delete zero-count entries after decrement so the map
doesn't accumulate stale keys for agent types that are no longer active.
rateLimited: delete entries whose deadline has passed when reading them
(in both the classifier block and the execute() pre-flight), so stale
entries are cleaned up on the next check rather than accumulating forever.
Both fixes are covered by new regression tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|