summaryrefslogtreecommitdiff
path: root/internal/executor/claude_test.go
AgeCommit message (Collapse)Author
2026-05-07test(executor): verify explicit Claude commits are captured in execRecordClaude
Adds TestTeardownSandbox_CapturesExplicitCommits to cover the case where the agent explicitly commits changes (no autocommit needed). Previously only the autocommit path was tested; this confirms teardownSandbox populates Commits for any commits ahead of origin. https://claude.ai/code/session_01G4dT9JBWFFb8xGcSHenzRS
2026-05-03chore: post-epic cleanup — green test suite, no skipsClaude
Addresses the cleanup queue captured in docs/plans/local-oss-runner.md after the local-OSS-models epic landed. After this commit `go test -race ./...` is green across every package with zero `t.Skip` calls and no excluded tests. Real bugs fixed: - claude.go setupSandbox callsites used `sandboxDir, err := ...` which shadowed the outer variable, so BlockedError.SandboxDir was always empty. Resume-after-block was broken for both new and stale-sandbox paths. TestBlockedError_IncludesSandboxDir now exercises the right invariant. - TestPool_ActivePerAgent_DeletesZeroEntries flake under -race: the cleanup defer in execute()/executeResume() runs AFTER handleRunResult sends on resultCh, so consumers observing a result could see a still-counted activePerAgent entry. Extracted decActiveAgent(agentType, *cleaned) helper; called explicitly before every resultCh send, defer becomes a no-op via the cleaned flag. Verified clean over `go test -race -count=10`. Test infrastructure made hermetic: - gitSafe now also passes -c commit.gpgsign=false / -c tag.gpgsign=false so sandbox tests pass on hosts whose global config requires signing. - Bare repos in tests initialized with `-b main` (HEAD symbolic ref matched to the branch we push) so `git log` after push works. - TestSandboxCloneSource_FallsBackToOrigin uses a local-FS origin URL, matching sandboxCloneSource's intentional filter against network URLs. - TestGeminiLogs_ParsedCorrectly URL fixed to the actual log route (/api/executions/{id}/log). GeminiRunner gap closed (partial): - parseGeminiStream now walks lines for `result` events, surfacing is_error as an error and total_cost_usd as the float return value. - GeminiRunner.Run propagates parsed cost to Execution.CostUSD. - TestParseGeminiStream_ParsesStructuredOutput unskipped. Notes: - GeminiRunner is still simulated end-to-end (Run writes hardcoded stream data instead of execing the binary). The result/cost parser now exists; finishing the runner is a smaller, contained follow-up. Kept on the deferred queue. - Frontend "Local" agent option and a minor storage.db.go logger TODO remain on the deferred queue, both intentionally — neither blocks anything in flight. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-28feat(executor): add LocalRunner and OpenAI-compat LLM clientClaude
Phase 1 of "local OSS models as agents" plan. Adds a third Runner backed by any OpenAI-compatible HTTP server (Ollama, vLLM, LM Studio, llama.cpp), and migrates the Gemini-CLI classifier to route through the same client when configured. Two-layer split: internal/llm.Client is the workhorse (HTTP, no Pool, no DB) used directly by the classifier and any future internal helper that needs cheap reasoning. internal/executor.LocalRunner is a thin adapter implementing Runner for user-facing tasks. This avoids Pool reentrancy/deadlock when sub-second internal calls fire from inside Pool.execute(). Highlights: - internal/retry: relocated runWithBackoff/IsRateLimitError/ParseRetryAfter into a shared package reused by executor and llm. - internal/llm: Chat (non-streaming) and ChatStream (SSE) over /chat/completions with optional bearer auth, json_object response format, retry on 429/503, Retry-After parsing. - internal/executor/LocalRunner: streams deltas into stdout.log in the same stream-json envelope ClaudeRunner emits, then writes one consolidated assistant block plus a result terminator so existing parsers (extractSummary, ParseChangestatFromOutput) work unchanged. - internal/executor/Classifier: gains optional LLM field; uses json_object response format (no markdown-fence cleanup needed). Falls back to Gemini-CLI subprocess when LLM is nil. - Pool.skipClassification: now skips only when the requested agent type is registered, so unknown types still reach the load balancer. - Storage: additive tokens_in/tokens_out ALTERs on executions; CLI runners record cost_usd as before, LocalRunner records 0 + tokens. - Config: [local_model] section (endpoint, model, timeout_seconds, default_temperature, api_key). Empty endpoint = no LocalRunner registered, classifier falls back to Gemini. Pre-existing test issues fixed in passing: - claude_test.go setupSandbox callsites updated to current signature. - gemini_test.go TestParseGeminiStream skipped (asserts unimplemented GeminiRunner stream-error parsing; tracked separately). Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-03-15feat: run build (Makefile, gradlew, or go build) before sandbox autocommitPeter Stone
2026-03-15feat: fix task failures via sandbox improvements and display commits in Web UIPeter Stone
- Fix ephemeral sandbox deletion issue by passing $CLAUDOMATOR_PROJECT_DIR to agents and using it for subtask project_dir. - Implement sandbox autocommit in teardown to prevent task failures from uncommitted work. - Track git commits created during executions and persist them in the DB. - Display git commits and changestats badges in the Web UI execution history. - Add badge counts to Web UI tabs for Interrupted, Ready, and Running states. - Improve scripts/next-task to handle QUEUED tasks and configurable DB path.
2026-03-14fix: surface agent stderr, auto-retry restart-killed tasks, handle stale ↵Peter Stone
sandboxes #1 - Diagnostics: tailFile() reads last 20 lines of subprocess stderr and appends to error message when claude/gemini exits non-zero. Previously all exit-1 failures were opaque; now the error_msg carries the actual subprocess output. #4 - Restart recovery: RecoverStaleRunning() now re-queues tasks after marking them FAILED, so tasks killed by a server restart automatically retry on the next boot rather than staying permanently FAILED. #2 - Stale sandbox: If a resume execution's preserved SandboxDir no longer exists (e.g. /tmp purge after reboot), clone a fresh sandbox instead of failing immediately with "no such file or directory". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14fix: trust all directory owners in sandbox git commandsPeter Stone
Sandbox setup runs git commands against project_dir which may be owned by a different OS user, triggering git's 'dubious ownership' error. Fix by passing -c safe.directory=* on all git commands that touch project directories. Also add wildcard to global config for immediate effect on the running server. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14fix: cancel blocked tasks + auto-complete completion reportsPeter Stone
Two fixes for BLOCKED task issues: 1. Allow BLOCKED → CANCELLED state transition so users can cancel tasks stuck waiting for input. Adds Cancel button to BLOCKED task cards in the UI alongside the question/answer controls. 2. Detect when agents write completion reports to $CLAUDOMATOR_QUESTION_FILE instead of real questions. If the question JSON has no options and no "?" in the text, treat it as a summary (stored on the execution) and fall through to normal completion + sandbox teardown rather than blocking. Also tightened the preamble to make the distinction explicit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11fix: resume BLOCKED tasks in preserved sandbox so Claude finds its sessionPeter Stone
When a task ran in a sandbox (/tmp/claudomator-sandbox-*) and went BLOCKED, Claude stored its session under the sandbox path as the project slug. The resume execution was running in project_dir, causing Claude to look for the session in the wrong project directory and fail with "No conversation found". Fix: carry SandboxDir through BlockedError → Execution → resume execution, and run the resume in that directory so the session lookup succeeds. - BlockedError gains SandboxDir field; claude.go sets it on BLOCKED exit - storage.Execution gains SandboxDir (persisted via new sandbox_dir column) - executor.go stores blockedErr.SandboxDir in the execution record - server.go copies SandboxDir from latest execution to the resume execution - claude.go uses e.SandboxDir as working dir for resume when set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10test: sandbox coverage + fix WebSocket racesPeter Stone
executor: add 7 tests for sandboxCloneSource, setupSandbox, and teardownSandbox (uncommitted-changes error, clean-no-commits removal). api: fix two data races in WebSocket tests — wsPingInterval/Deadline are now captured as locals before goroutine start; maxWsClients is moved from a package-level var into Hub.maxClients (with SetMaxClients method) so concurrent tests don't stomp each other. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10executor: fix session ID on second block-and-resume cyclePeter Stone
When a resumed execution is blocked again, SessionID was set to the new exec's own UUID instead of the original ResumeSessionID. The next resume would then pass the wrong --resume argument to claude and fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09executor: document kill-goroutine safety and add goroutine-leak testClaudomator Agent
The pgid-kill goroutine in execOnce() uses a select with both ctx.Done() and the killDone channel. Add a detailed comment explaining why the goroutine cannot block indefinitely: the killDone arm fires unconditionally when cmd.Wait() returns (whether the process exited naturally or was killed), so the goroutine always exits before execOnce() returns. Add TestExecOnce_NoGoroutineLeak_OnNaturalExit to verify this: it samples runtime.NumGoroutine() before and after execOnce() with a no-op binary ("true") and a background context (never cancelled), asserting no net goroutine growth. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08merge: pull latest from master and resolve conflictsPeter Stone
- Resolve conflicts in API server, CLI, and executor. - Maintain Gemini classification and assignment logic. - Update UI to use generic agent config and project_dir. - Fix ProjectDir/WorkingDir inconsistencies in Gemini runner. - All tests passing after merge.
2026-03-08feat: rename working_dir→project_dir; git sandbox executionPeter Stone
- ClaudeConfig.WorkingDir → ProjectDir (json: project_dir) - UnmarshalJSON fallback reads legacy working_dir from DB records - New executions with project_dir clone into a temp sandbox via git clone --local - Non-git project_dirs get git init + initial commit before clone - After success: verify clean working tree, merge --ff-only back to project_dir, remove sandbox - On failure/BLOCKED: sandbox preserved, path included in error message - Resume executions run directly in project_dir (no re-clone) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08test(executor): update claude_test.go to use AgentConfigPeter Stone
2026-03-06feat: blocked task state for agent questions via session resumePeter Stone
When an agent needs user input it writes a question to $CLAUDOMATOR_QUESTION_FILE and exits. The runner detects the file and returns BlockedError; the pool transitions the task to BLOCKED and stores the question JSON on the task record. The user answers via POST /api/tasks/{id}/answer. The server looks up the claude session_id from the most recent execution and submits a resume execution (claude --resume <session-id> "<answer>"), freeing the executor slot entirely while waiting. Changes: - task: add StateBlocked, transitions RUNNING→BLOCKED, BLOCKED→QUEUED - storage: add session_id to executions, question_json to tasks; add GetLatestExecution and UpdateTaskQuestion methods - executor: BlockedError type; ClaudeRunner pre-assigns --session-id, sets CLAUDOMATOR_QUESTION_FILE env var, detects question file on exit; buildArgs handles --resume mode; Pool.SubmitResume for resume path - api: handleAnswerQuestion rewritten to create resume execution - preamble: add question protocol instructions for agents - web: BLOCKED state badge (indigo), question text + option buttons or free-text input with Submit on the task card footer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05executor: default permission_mode to bypassPermissionsPeter Stone
claudomator runs tasks unattended; prompting for write permission always stalls execution. Any task without an explicit permission_mode now gets --permission-mode bypassPermissions automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03Fix working_dir failures: validate path early, remove hardcoded /rootPeter Stone
executor/claude.go: stat working_dir before cmd.Start() so a missing or inaccessible directory surfaces as a clear error ("working_dir \"/bad/path\": no such file or directory") rather than an opaque chdir failure wrapped in "starting claude". api/elaborate.go: replace the hardcoded /root/workspace/claudomator path with buildElaboratePrompt(workDir) which injects the server's actual working directory (from os.Getwd() at startup). Empty workDir tells the model to leave working_dir blank. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03Executor: dependency waiting and planning preamblePeter Stone
- Pool.waitForDependencies polls depends_on task states before running - ClaudeRunner prepends planningPreamble to task instructions to prompt a plan-then-implement approach - Rate-limit test helper updated to match new ClaudeRunner signature Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24Add --verbose flag to Claude subprocess invocationPeter Stone
Ensures richer stream-json output for cost parsing and debugging. Adds a test to verify --verbose is always present in built args. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-08Rename Go module to github.com/thepeterstone/claudomatorPeter Stone
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08Initial project: task model, executor, API server, CLI, storage, reporterPeter Stone
Claudomator automation toolkit for Claude Code with: - Task model with YAML parsing, validation, state machine (49 tests, 0 races) - SQLite storage for tasks and executions - Executor pool with bounded concurrency, timeout, cancellation - REST API + WebSocket for mobile PWA integration - Webhook/multi-notifier system - CLI: init, run, serve, list, status commands - Console, JSON, HTML reporters with cost tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>