summaryrefslogtreecommitdiff
path: root/internal/executor/executor.go
AgeCommit message (Collapse)Author
2026-05-13merge: integrate github/main — LocalRunner, real GeminiRunner, llm clientHEADmainPeter Stone
Merges 12 commits from github/main (formerly master) that were developed independently. Key additions: - LocalRunner: OpenAI-compatible local LLM execution (Ollama, LM Studio) - Real GeminiRunner with full sandbox parity to ClaudeRunner - llm.Client for enriching CI failures and elaboration via local model - retry.ParseRetryAfter moved to shared package - tokens_in/tokens_out columns in executions table Conflict resolutions: - Kept local main's VAPID/push, stories, projects, agent events schema - Merged both sets of Config fields (local + LocalModel from github/main) - Unified activePerAgent accounting (decActiveAgent helper) - Removed duplicate helpers from claude.go (now in helpers.go) - Fixed double-decrement bug in handleRunResult vs decActiveAgent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03fix: atomic execution creation + RUNNING state transitionPeter Stone
Add CreateExecutionAndSetRunning to storage.DB and Store interface, replacing the two sequential CreateExecution/UpdateTaskState calls in executor.go. Eliminates the crash window where a task stays PENDING with an orphaned RUNNING execution record. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03fix: prevent SHIPPABLE stories and wrong READY state on failed tasksPeter Stone
Three related bugs fixed: 1. maybeUnblockParent: guard against promoting QUEUED leaf tasks (no subtasks) to READY. The vacuously-true 'all subtasks done' check was advancing tasks that stalled in QUEUED (due to a prior SQLite lock error) to READY on server restart via RecoverStaleBlocked, despite having only failed executions and no commits. 2. checkStoryCompletion: require COMPLETED (not just READY) for all top-level tasks before advancing a story to SHIPPABLE. READY means the checker agent is still pending or the task awaits human review; a story with READY tasks is not ready to ship. 3. handleAcceptTask: call CheckStoryCompletion after a task is accepted so stories with parent tasks (whose subtasks are all done and then the parent is manually accepted) can auto-advance to SHIPPABLE. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03chore: post-epic cleanup — green test suite, no skipsClaude
Addresses the cleanup queue captured in docs/plans/local-oss-runner.md after the local-OSS-models epic landed. After this commit `go test -race ./...` is green across every package with zero `t.Skip` calls and no excluded tests. Real bugs fixed: - claude.go setupSandbox callsites used `sandboxDir, err := ...` which shadowed the outer variable, so BlockedError.SandboxDir was always empty. Resume-after-block was broken for both new and stale-sandbox paths. TestBlockedError_IncludesSandboxDir now exercises the right invariant. - TestPool_ActivePerAgent_DeletesZeroEntries flake under -race: the cleanup defer in execute()/executeResume() runs AFTER handleRunResult sends on resultCh, so consumers observing a result could see a still-counted activePerAgent entry. Extracted decActiveAgent(agentType, *cleaned) helper; called explicitly before every resultCh send, defer becomes a no-op via the cleaned flag. Verified clean over `go test -race -count=10`. Test infrastructure made hermetic: - gitSafe now also passes -c commit.gpgsign=false / -c tag.gpgsign=false so sandbox tests pass on hosts whose global config requires signing. - Bare repos in tests initialized with `-b main` (HEAD symbolic ref matched to the branch we push) so `git log` after push works. - TestSandboxCloneSource_FallsBackToOrigin uses a local-FS origin URL, matching sandboxCloneSource's intentional filter against network URLs. - TestGeminiLogs_ParsedCorrectly URL fixed to the actual log route (/api/executions/{id}/log). GeminiRunner gap closed (partial): - parseGeminiStream now walks lines for `result` events, surfacing is_error as an error and total_cost_usd as the float return value. - GeminiRunner.Run propagates parsed cost to Execution.CostUSD. - TestParseGeminiStream_ParsesStructuredOutput unskipped. Notes: - GeminiRunner is still simulated end-to-end (Run writes hardcoded stream data instead of execing the binary). The result/cost parser now exists; finishing the runner is a smaller, contained follow-up. Kept on the deferred queue. - Frontend "Local" agent option and a minor storage.db.go logger TODO remain on the deferred queue, both intentionally — neither blocks anything in flight. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-05-02feat(executor): synthesize execution summary via local LLM fallbackClaude
Phase 4 of "local OSS models as agents" plan. Closes the epic. When an execution finishes and the agent did NOT write a "## Summary" heading in its stdout (so the existing extractSummary path returns empty), and the Pool has a local LLM configured, we now synthesize a 2-4 sentence summary from the assistant text content of the log tail. Behavior: - Primary path unchanged: if the agent wrote "## Summary", that wins byte-for-byte (TestPool_HandleRunResult_ExtractSummaryWins guards). - Fallback path: empty extractSummary + Pool.LLM != nil → synthesize. - All-empty path: when no LLM is configured, summary stays empty — identical to pre-Phase-4 behavior. Implementation: - Pool gains an LLM *llm.Client field, wired in serve.go and run.go alongside Classifier.LLM (same localClient used everywhere). - New synthesizeSummary in internal/executor/summary.go: * 6s timeout so a slow local model can't stall finalization * 16 KB tail cap on the stdout log * readAssistantTextTail seeks to the last 16 KB and skips the first (likely partial) line, parses each line as a stream-json event, joins assistant `text` blocks (skips system/result/etc). * Returns "" on any error so the caller's behavior never regresses. - handleRunResult: 3-tier summary resolution — exec.Summary set by runner → extractSummary → synthesizeSummary → empty. - minimalMockStore now records UpdateTaskSummary calls (additive; existing tests unaffected) so integration tests can assert. Tests (9 new): - synthesizeSummary nil client / empty path / missing file all return "" without HTTP calls. - empty assistant content short-circuits without LLM call. - success path returns trimmed body, with both assistant texts in the user prompt. - LLM 500 returns "" (caller handles same as no-summary). - readAssistantTextTail seeks past early content in a large file. - Pool integration: ## Summary present → LLM not called, agent text used. ## Summary absent + LLM set → LLM called, synthesized summary recorded against the right task ID. Plan: docs/plans/local-oss-runner.md. Epic complete. Post-epic deep cleanup queue captured in the same plan file for follow-up. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-28feat(executor): add LocalRunner and OpenAI-compat LLM clientClaude
Phase 1 of "local OSS models as agents" plan. Adds a third Runner backed by any OpenAI-compatible HTTP server (Ollama, vLLM, LM Studio, llama.cpp), and migrates the Gemini-CLI classifier to route through the same client when configured. Two-layer split: internal/llm.Client is the workhorse (HTTP, no Pool, no DB) used directly by the classifier and any future internal helper that needs cheap reasoning. internal/executor.LocalRunner is a thin adapter implementing Runner for user-facing tasks. This avoids Pool reentrancy/deadlock when sub-second internal calls fire from inside Pool.execute(). Highlights: - internal/retry: relocated runWithBackoff/IsRateLimitError/ParseRetryAfter into a shared package reused by executor and llm. - internal/llm: Chat (non-streaming) and ChatStream (SSE) over /chat/completions with optional bearer auth, json_object response format, retry on 429/503, Retry-After parsing. - internal/executor/LocalRunner: streams deltas into stdout.log in the same stream-json envelope ClaudeRunner emits, then writes one consolidated assistant block plus a result terminator so existing parsers (extractSummary, ParseChangestatFromOutput) work unchanged. - internal/executor/Classifier: gains optional LLM field; uses json_object response format (no markdown-fence cleanup needed). Falls back to Gemini-CLI subprocess when LLM is nil. - Pool.skipClassification: now skips only when the requested agent type is registered, so unknown types still reach the load balancer. - Storage: additive tokens_in/tokens_out ALTERs on executions; CLI runners record cost_usd as before, LocalRunner records 0 + tokens. - Config: [local_model] section (endpoint, model, timeout_seconds, default_temperature, api_key). Empty endpoint = no LocalRunner registered, classifier falls back to Gemini. Pre-existing test issues fixed in passing: - claude_test.go setupSandbox callsites updated to current signature. - gemini_test.go TestParseGeminiStream skipped (asserts unimplemented GeminiRunner stream-error parsing; tracked separately). Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-11cleanup: remove dead code (QuestionRegistry, changestats wrappers, scanner.Err)Claudomator Agent
Fix 1: Remove QuestionRegistry and related types (QuestionHandler, PendingQuestion) from question.go -- nothing reads Pool.Questions or uses the registry. Remove NewQuestionRegistry() call from NewPool and the Questions field from Pool. Remove the now-superfluous registry tests; keep stream/parse helpers which are still used by the claude runner. Fix 2: Check scanner.Err() after the parseStream loop so I/O errors from the scanner are not silently swallowed when streamErr is still nil. Fix 3: Delete internal/api/changestats.go -- the parseChangestatFromFile and parseChangestatFromOutput wrappers were only needed to support processResult(), which no longer calls them; they are unreachable dead code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04fix: ShipStory — use context.Background() to avoid request context ↵Peter Stone
cancellation killing deploy
2026-04-04feat: story ship gate — explicit POST /api/stories/{id}/ship; remove ↵Peter Stone
auto-deploy - checkStoryCompletion now guards against re-running on already-SHIPPABLE stories and no longer auto-triggers triggerStoryDeploy on completion - New Pool.ShipStory method validates SHIPPABLE state then fires triggerStoryDeploy - POST /api/stories/{id}/ship route registered and handleShipStory handler added - Two new tests: 202 for SHIPPABLE story, 409 for non-SHIPPABLE story Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04fix: executor checker — close race, test flakiness, checker report for all ↵Peter Stone
failure states Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04feat: spawn checker task on READY; auto-accept on pass; attach report on failPeter Stone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03fix: remove drain-lock circuit breaker that halted all executions after 3 ↵Peter Stone
consecutive failures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: story stays PENDING when all subtasks complete but parent task stays QUEUEDPeter Stone
When a story is approved with pre-created subtasks, parent tasks are QUEUED but never run. Their subtasks complete, but: - maybeUnblockParent only handled BLOCKED parents, not QUEUED ones - checkStoryCompletion required ALL tasks (incl. subtasks) to be done Fixes: - maybeUnblockParent now also promotes QUEUED parents to READY when all subtasks are COMPLETED - checkStoryCompletion only checks top-level tasks (parent_task_id="") - RecoverStaleBlocked now also scans QUEUED parents on startup and triggers checkStoryCompletion if it promotes them - Add QUEUED→READY to valid state transitions (subtask delegation path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26feat: graceful shutdown — drain workers before exit (default 3m timeout)Peter Stone
- Add workerWg to Pool; Shutdown() closes workCh and waits for all in-flight execute/executeResume goroutines to finish - Signal handler now shuts down HTTP first, then drains the pool - ShutdownTimeout config field (toml: shutdown_timeout); default 3m - Tests: WaitsForWorkers and TimesOut Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: cancel waiting tasks when dep hits terminal failure (QUEUED→CANCELLED)Peter Stone
QUEUED→FAILED is not a valid state transition. When a dependency enters a terminal failure state, cancel the waiting task instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: story branch push to bare repo; drain at 3 consecutive failuresPeter Stone
createStoryBranch was pushing to 'origin' which doesn't exist — branches never landed in the bare repo so agents couldn't clone them. Now uses the project's RemoteURL (bare repo path) directly for fetch and push. Raise drain threshold from 2 to 3 consecutive failures to reduce false positives from transient errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: resolve dep-chain deadlock; broadcast task_started for UI visibilityPeter Stone
With maxPerAgent=1, tasks with DependsOn were entering waitForDependencies while holding the per-agent slot, preventing the dependency from ever running. Fix: check deps before taking the slot. If not ready, requeue without holding activePerAgent. Also accept StateReady (leaf tasks) as a satisfied dependency, not just StateCompleted. Add startedCh to pool and broadcast task_started WebSocket event when a task transitions to RUNNING, so the UI immediately shows the running state during the clone phase instead of waiting for completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: expose drained state in agent status API; fix AgentEvent JSON casingPeter Stone
AgentStatusInfo was missing drained field so UI couldn't show drain lock. AgentEvent had no JSON tags so ev.agent/event/timestamp were undefined in the stats timeline. UI now shows "Drain locked" card state with undrain CTA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25chore: replace all master branch references with mainPeter Stone
- executor.go: merge story branch to main before deploy - container.go: error messages reference git push origin main - api/stories.go: create story branch from origin/main (drop master fallback) - executor_test.go: test setup uses main branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24feat: merge story branch to master before deploy, add doot project to registryPeter Stone
- triggerStoryDeploy: fetch/checkout/merge --no-ff/push before running deploy script (ADR-007) - executor_test: TestPool_StoryDeploy_MergesStoryBranch proves merge happens - seed.go: add doot project with deploy script; wire claudomator deploy script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24Merge branch 'master' of /site/git.terst.org/repos/claudomatorPeter Stone
2026-03-24feat: validation result transitions story to REVIEW_READY or NEEDS_FIX (ADR-007)Claudomator Agent
Add checkValidationResult which inspects the final task.State of a completed validation task and updates the story to REVIEW_READY (pass) or NEEDS_FIX (fail). Wire into handleRunResult so stories in VALIDATING state are dispatched to checkValidationResult instead of checkStoryCompletion, covering both success and FAILED terminal paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24feat: auto-create validation task on story DEPLOYED (ADR-007)Claudomator Agent
2026-03-24feat: trigger deploy script on SHIPPABLE → DEPLOYED (ADR-007)Claudomator Agent
Add triggerStoryDeploy to Pool: fetches story's project, runs its DeployScript via exec.CommandContext, and advances story to DEPLOYED on success. Wire into checkStoryCompletion with go p.triggerStoryDeploy after the SHIPPABLE transition. Covered by TestPool_StoryDeploy_RunsDeployScript. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24fix: resolve merge conflict — integrate agent's story-aware ContainerRunnerPeter Stone
Agent added: Store on ContainerRunner (direct story/project lookup), --reference clone for speed, explicit story branch push, checkStoryCompletion → SHIPPABLE. My additions: BranchName on Task as fallback when Store is nil, tests updated to match checkout-after-clone approach. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24feat: clone story branch in ContainerRunner (ADR-007)Peter Stone
- Add BranchName field to task.Task (populated from story at execution time) - Add GetStory to executor Store interface; resolve BranchName from story in both execute() and executeResume() parallel to RepositoryURL resolution - Pass --branch <name> to git clone when BranchName is set; default clone otherwise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23feat: Phase 4 — story-aware execution, branch clone, story completion ↵Claudomator Agent
check, deployment status - ContainerRunner: add Store field; clone with --reference when story has a local project path; checkout story branch after clone; push to story branch instead of HEAD - executor.Store interface: add GetStory, ListTasksByStory, UpdateStoryStatus - Pool.handleRunResult: trigger checkStoryCompletion when a story task succeeds - Pool.checkStoryCompletion: transitions story to SHIPPABLE when all tasks done - serve.go: wire Store into each ContainerRunner - stories.go: update createStoryBranch to fetch+checkout from origin/master base; add GET /api/stories/{id}/deployment-status endpoint - server.go: register deployment-status route - Tests: TestPool_CheckStoryCompletion_AllComplete/PartialComplete, TestHandleStoryDeploymentStatus Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23feat: populate RepositoryURL from project registry in executor (ADR-007)Peter Stone
- Add GetProject to Store interface used by executor - Resolve RepositoryURL from project registry when task.RepositoryURL is empty - Call SeedProjects at server startup so the project registry is populated - Add GetProject stub to minimalMockStore in executor tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22fix: make requeueDelay configurable to fix test timeout in ↵Peter Stone
TestPool_MaxPerAgent_BlocksSecondTask
2026-03-21feat: executor reliability — per-agent limit, drain gate, pre-flight ↵Claudomator Agent
creds, auth recovery - maxPerAgent=1: only 1 in-flight execution per agent type at a time; excess tasks are requeued after 30s - Drain gate: after 2 consecutive failures the agent is drained and a question is set on the task; reset on first success; POST /api/pool/agents/{agent}/undrain to acknowledge - Pre-flight credential check: verify .credentials.json and .claude.json exist in agentHome before spinning up a container - Auth error auto-recovery: detect auth errors (Not logged in, OAuth token has expired, etc.) and retry once after running sync-credentials and re-copying fresh credentials - Extracted runContainer() helper from ContainerRunner.Run() to support the retry flow - Wire CredentialSyncCmd in serve.go for all three ContainerRunner instances - Tests: TestPool_MaxPerAgent_*, TestPool_ConsecutiveFailures_*, TestPool_Undrain_*, TestContainerRunner_Missing{Credentials,Settings}_FailsFast, TestIsAuthError_*, TestContainerRunner_AuthError_SyncsAndRetries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19feat: agent status dashboard with availability timeline and Gemini quota ↵Peter Stone
detection - Detect Gemini TerminalQuotaError (daily quota) as BUDGET_EXCEEDED, not generic FAILED - Surface container stderr tail in error so quota/rate-limit classifiers can match it - Add agent_events table to persist rate-limit start/recovery events across restarts - Add GET /api/agents/status endpoint returning live agent state + 24h event history - Stats dashboard: agent status cards, 24h availability timeline, per-run execution table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16fix: clean up activePerAgent before sending to resultChClaudomator Agent
Move activePerAgent decrement/deletion out of execute() and executeResume() defers and into the code paths immediately before each resultCh send (handleRunResult and early-return paths). This guarantees that when a result consumer reads from the channel the map is already clean, eliminating a race between defer and result receipt. Remove the polling loop from TestPool_ActivePerAgent_DeletesZeroEntries and check the map state immediately after reading the result instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15fix: promote stale BLOCKED parent tasks to READY on server startupPeter Stone
When the server restarts after all subtasks complete, the parent task was left stuck in BLOCKED state because maybeUnblockParent only fires during a live executor run. RecoverStaleBlocked() scans all BLOCKED tasks on startup and re-evaluates them using the existing logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15feat: fix task failures via sandbox improvements and display commits in Web UIPeter Stone
- Fix ephemeral sandbox deletion issue by passing $CLAUDOMATOR_PROJECT_DIR to agents and using it for subtask project_dir. - Implement sandbox autocommit in teardown to prevent task failures from uncommitted work. - Track git commits created during executions and persist them in the DB. - Display git commits and changestats badges in the Web UI execution history. - Add badge counts to Web UI tabs for Interrupted, Ready, and Running states. - Improve scripts/next-task to handle QUEUED tasks and configurable DB path.
2026-03-14feat(Phase4): add file changes for changestats executor wiringClaude Sonnet 4.6
Files changed: CLAUDE.md, internal/api/changestats.go, internal/executor/executor.go, internal/executor/executor_test.go, internal/task/changestats.go (new) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14fix: surface agent stderr, auto-retry restart-killed tasks, handle stale ↵Peter Stone
sandboxes #1 - Diagnostics: tailFile() reads last 20 lines of subprocess stderr and appends to error message when claude/gemini exits non-zero. Previously all exit-1 failures were opaque; now the error_msg carries the actual subprocess output. #4 - Restart recovery: RecoverStaleRunning() now re-queues tasks after marking them FAILED, so tasks killed by a server restart automatically retry on the next boot rather than staying permanently FAILED. #2 - Stale sandbox: If a resume execution's preserved SandboxDir no longer exists (e.g. /tmp purge after reboot), clone a fresh sandbox instead of failing immediately with "no such file or directory". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14feat: persist agent assignment before task executionClaudomator Agent
- Add UpdateTaskAgent to Store interface and DB implementation - Call UpdateTaskAgent in Pool.execute to persist assigned agent/model to database before the runner starts - Update runTask in app.js to pass selected agent as query param Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14feat: add agent selector to UI and support direct agent assignmentPeter Stone
- Added an agent selector (Auto, Claude, Gemini) to the Start Next Task button. - Updated the backend to pass query parameters as environment variables to scripts. - Modified the executor pool to skip classification when a specific agent is requested. - Added --agent flag to claudomator start command. - Updated tests to cover the new functionality.
2026-03-13fix: resubmit QUEUED tasks on server startup to prevent them getting stuckPeter Stone
Add Pool.RecoverStaleQueued() that lists all QUEUED tasks from the DB on startup and re-submits them to the in-memory pool. Previously, tasks that were QUEUED when the server restarted would remain stuck indefinitely since only RUNNING tasks were recovered (and marked FAILED). Called in serve.go immediately after RecoverStaleRunning(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12feat: add Resume support for CANCELLED, FAILED, and BUDGET_EXCEEDED tasksClaudomator Agent
Interrupted tasks (CANCELLED, FAILED, BUDGET_EXCEEDED) now support session resume in addition to restart. Both buttons are shown on the task card. - executor: extend resumablePoolStates to include CANCELLED, FAILED, BUDGET_EXCEEDED - api: extend handleResumeTimedOutTask to accept all resumable states with state-specific resume messages; replace hard-coded TIMED_OUT check with a resumableStates map - web: add RESUME_STATES set; render Resume + Restart buttons for interrupted states; TIMED_OUT keeps Resume only - tests: 5 new Go tests (TestResumeInterrupted_*); updated task-actions.test.mjs with 17 tests covering dual-button behaviour Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11fix: resume BLOCKED tasks in preserved sandbox so Claude finds its sessionPeter Stone
When a task ran in a sandbox (/tmp/claudomator-sandbox-*) and went BLOCKED, Claude stored its session under the sandbox path as the project slug. The resume execution was running in project_dir, causing Claude to look for the session in the wrong project directory and fail with "No conversation found". Fix: carry SandboxDir through BlockedError → Execution → resume execution, and run the resume in that directory so the session lookup succeeds. - BlockedError gains SandboxDir field; claude.go sets it on BLOCKED exit - storage.Execution gains SandboxDir (persisted via new sandbox_dir column) - executor.go stores blockedErr.SandboxDir in the execution record - server.go copies SandboxDir from latest execution to the resume execution - claude.go uses e.SandboxDir as working dir for resume when set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10executor: explicit load balancing — code picks agent, classifier picks modelPeter Stone
pickAgent() deterministically selects the agent with the fewest active tasks, skipping rate-limited agents. The classifier now only selects the model for the pre-assigned agent, so Gemini gets tasks from the start rather than only as a fallback when Claude's quota is exhausted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10fix: ensure tasks are re-classified on manual restartPeter Stone
Updated handleRunTask to use ResetTaskForRetry, which clears the agent type and model. This ensures that manually restarted tasks are always re-classified, allowing the system to switch to a different agent if the previous one is rate-limited. Also improved Claude quota-exhaustion detection.
2026-03-10executor: extract handleRunResult to deduplicate error-classification logicClaudomator Agent
Both execute() and executeResume() shared ~80% identical post-run logic: error classification (BLOCKED, TIMED_OUT, CANCELLED, BUDGET_EXCEEDED, FAILED), state transitions, result emission, and UpdateExecution. Extract this into handleRunResult(ctx, t, exec, err, agentType) on *Pool. Both functions now call it after runner.Run() returns. Also adds TestHandleRunResult_SharedPath which directly exercises the new function via a minimalMockStore, covering FAILED, READY, COMPLETED, and TIMED_OUT classification paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09executor: unblock parent task when all subtasks completeClaudomator Agent
Add maybeUnblockParent helper that transitions a BLOCKED parent task to READY once every subtask is in the COMPLETED state. Called in both execute() and executeResume() immediately after a subtask is marked COMPLETED. Any non-COMPLETED sibling (RUNNING, FAILED, etc.) keeps the parent BLOCKED. Tests added: - TestPool_Submit_LastSubtask_UnblocksParent - TestPool_Submit_NotLastSubtask_ParentStaysBlocked - TestPool_Submit_ParentNotBlocked_NoTransition
2026-03-09executor: BLOCKED→READY for top-level tasks with subtasksClaudomator Agent
When a top-level task (ParentTaskID == "") finishes successfully, check for subtasks before deciding the next state: - subtasks exist → BLOCKED (waiting for subtasks to complete) - no subtasks → READY (existing behavior, unchanged) This applies to both execute() and executeResume(). Adds ListSubtasks to the Store interface. Tests: - TestPool_Submit_TopLevel_WithSubtasks_GoesBlocked - TestPool_Submit_TopLevel_NoSubtasks_GoesReady Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09executor: log errors from all unchecked UpdateTaskState/UpdateTaskQuestion callsClaudomator Agent
All previously ignored errors from p.store.UpdateTaskState() and p.store.UpdateTaskQuestion() in execute() and executeResume() now log with structured context (taskID, state, error). Introduces a Store interface so tests can inject a failing mock store. Adds TestPool_UpdateTaskState_DBError_IsLoggedAndResultDelivered to verify that a DB write failure is logged and the result is still delivered to resultCh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09executor: strengthen rate-limit avoidance in classifierPeter Stone
Updated isQuotaExhausted to detect more Claude quota messages. Added 'rate limit reached (rejected)' to quota exhausted checks. Strengthened classifier prompt to explicitly forbid selecting rate-limited agents. Improved Pool to set 5h rate limit on quota exhaustion.
2026-03-09executor: fix map leaks in activePerAgent and rateLimitedClaudomator Agent
activePerAgent: delete zero-count entries after decrement so the map doesn't accumulate stale keys for agent types that are no longer active. rateLimited: delete entries whose deadline has passed when reading them (in both the classifier block and the execute() pre-flight), so stale entries are cleaned up on the next check rather than accumulating forever. Both fixes are covered by new regression tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09executor: recover stale RUNNING tasks on server startupPeter Stone
On restart, any tasks in RUNNING state have no active goroutine. RecoverStaleRunning() marks them FAILED (retryable) and closes their open execution records with an appropriate error message. Called once from serve.go after the pool is created.