claudomator.git - claudomator — task automation server

Age	Commit message (Collapse)	Author
2026-05-13	merge: integrate github/main — LocalRunner, real GeminiRunner, llm clientHEAD main	Peter Stone
	Merges 12 commits from github/main (formerly master) that were developed independently. Key additions: - LocalRunner: OpenAI-compatible local LLM execution (Ollama, LM Studio) - Real GeminiRunner with full sandbox parity to ClaudeRunner - llm.Client for enriching CI failures and elaboration via local model - retry.ParseRetryAfter moved to shared package - tokens_in/tokens_out columns in executions table Conflict resolutions: - Kept local main's VAPID/push, stories, projects, agent events schema - Merged both sets of Config fields (local + LocalModel from github/main) - Unified activePerAgent accounting (decActiveAgent helper) - Removed duplicate helpers from claude.go (now in helpers.go) - Fixed double-decrement bug in handleRunResult vs decActiveAgent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03	chore: post-epic cleanup — green test suite, no skips	Claude
	Addresses the cleanup queue captured in docs/plans/local-oss-runner.md after the local-OSS-models epic landed. After this commit `go test -race ./...` is green across every package with zero `t.Skip` calls and no excluded tests. Real bugs fixed: - claude.go setupSandbox callsites used `sandboxDir, err := ...` which shadowed the outer variable, so BlockedError.SandboxDir was always empty. Resume-after-block was broken for both new and stale-sandbox paths. TestBlockedError_IncludesSandboxDir now exercises the right invariant. - TestPool_ActivePerAgent_DeletesZeroEntries flake under -race: the cleanup defer in execute()/executeResume() runs AFTER handleRunResult sends on resultCh, so consumers observing a result could see a still-counted activePerAgent entry. Extracted decActiveAgent(agentType, *cleaned) helper; called explicitly before every resultCh send, defer becomes a no-op via the cleaned flag. Verified clean over `go test -race -count=10`. Test infrastructure made hermetic: - gitSafe now also passes -c commit.gpgsign=false / -c tag.gpgsign=false so sandbox tests pass on hosts whose global config requires signing. - Bare repos in tests initialized with `-b main` (HEAD symbolic ref matched to the branch we push) so `git log` after push works. - TestSandboxCloneSource_FallsBackToOrigin uses a local-FS origin URL, matching sandboxCloneSource's intentional filter against network URLs. - TestGeminiLogs_ParsedCorrectly URL fixed to the actual log route (/api/executions/{id}/log). GeminiRunner gap closed (partial): - parseGeminiStream now walks lines for `result` events, surfacing is_error as an error and total_cost_usd as the float return value. - GeminiRunner.Run propagates parsed cost to Execution.CostUSD. - TestParseGeminiStream_ParsesStructuredOutput unskipped. Notes: - GeminiRunner is still simulated end-to-end (Run writes hardcoded stream data instead of execing the binary). The result/cost parser now exists; finishing the runner is a smaller, contained follow-up. Kept on the deferred queue. - Frontend "Local" agent option and a minor storage.db.go logger TODO remain on the deferred queue, both intentionally — neither blocks anything in flight. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-28	feat(executor): add LocalRunner and OpenAI-compat LLM client	Claude
	Phase 1 of "local OSS models as agents" plan. Adds a third Runner backed by any OpenAI-compatible HTTP server (Ollama, vLLM, LM Studio, llama.cpp), and migrates the Gemini-CLI classifier to route through the same client when configured. Two-layer split: internal/llm.Client is the workhorse (HTTP, no Pool, no DB) used directly by the classifier and any future internal helper that needs cheap reasoning. internal/executor.LocalRunner is a thin adapter implementing Runner for user-facing tasks. This avoids Pool reentrancy/deadlock when sub-second internal calls fire from inside Pool.execute(). Highlights: - internal/retry: relocated runWithBackoff/IsRateLimitError/ParseRetryAfter into a shared package reused by executor and llm. - internal/llm: Chat (non-streaming) and ChatStream (SSE) over /chat/completions with optional bearer auth, json_object response format, retry on 429/503, Retry-After parsing. - internal/executor/LocalRunner: streams deltas into stdout.log in the same stream-json envelope ClaudeRunner emits, then writes one consolidated assistant block plus a result terminator so existing parsers (extractSummary, ParseChangestatFromOutput) work unchanged. - internal/executor/Classifier: gains optional LLM field; uses json_object response format (no markdown-fence cleanup needed). Falls back to Gemini-CLI subprocess when LLM is nil. - Pool.skipClassification: now skips only when the requested agent type is registered, so unknown types still reach the load balancer. - Storage: additive tokens_in/tokens_out ALTERs on executions; CLI runners record cost_usd as before, LocalRunner records 0 + tokens. - Config: [local_model] section (endpoint, model, timeout_seconds, default_temperature, api_key). Empty endpoint = no LocalRunner registered, classifier falls back to Gemini. Pre-existing test issues fixed in passing: - claude_test.go setupSandbox callsites updated to current signature. - gemini_test.go TestParseGeminiStream skipped (asserts unimplemented GeminiRunner stream-error parsing; tracked separately). Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-03-18	fix: comprehensive addressing of container execution review feedback	Peter Stone
	- Fix Critical Bug 1: Only remove workspace on success, preserve on failure/BLOCKED. - Fix Critical Bug 2: Use correct Claude flag (--resume) and pass instructions via file. - Fix Critical Bug 3: Actually mount and use the instructions file in the container. - Address Design Issue 4: Implement Resume/BLOCKED detection and host-side workspace re-use. - Address Design Issue 5: Consolidate RepositoryURL to Task level and fix API fallback. - Address Design Issue 6: Make agent images configurable per runner type via CLI flags. - Address Design Issue 7: Secure API keys via .claudomator-env file and --env-file flag. - Address Code Quality 8: Add unit tests for ContainerRunner arg construction. - Address Code Quality 9: Fix indentation regression in app.js. - Address Code Quality 10: Clean up orphaned Claude/Gemini runner files and move helpers. - Fix tests: Update server_test.go and executor_test.go to work with new model.
2026-03-16	feat: add web push notifications and file drop	Peter Stone
	Web Push: - WebPushNotifier with VAPID auth; urgency mapped to event type (BLOCKED=urgent, FAILED=high, COMPLETED=low) - Auto-generates VAPID keys on first serve, persists to config file - push_subscriptions table in SQLite (upsert by endpoint) - GET /api/push/vapid-key, POST/DELETE /api/push/subscribe endpoints - Service worker (sw.js) handles push events and notification clicks - Notification bell button in web UI; subscribes on click File Drop: - GET /api/drops, GET /api/drops/{filename}, POST /api/drops - Persistent ~/.claudomator/drops/ directory - CLAUDOMATOR_DROP_DIR env var passed to agent subprocesses - Drops tab (📁) in web UI with file listing and download links Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16	fix: repair test regressions and add pre-commit/pre-push verification gates	Peter Stone
	Fix four pre-existing bugs exposed after resolving a build failure: - sandboxCloneSource: accept any URL scheme for origin remote (was filtering out https://) - setupSandbox callers: fix := shadow variable so sandboxDir is set on BlockedError - parseGeminiStream: parse result lines to return execution errors and cost - TestElaborateTask_InvalidJSONFromClaude: stub Gemini fallback so test is hermetic Add verification infrastructure: - scripts/verify: runs go build + go test -race, used by hooks and deploy - scripts/hooks/pre-commit: blocks commits that don't compile - scripts/hooks/pre-push: blocks pushes where tests fail - scripts/install-hooks: symlinks version-controlled hooks into .git/hooks/ - scripts/deploy: runs scripts/verify before building the binary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16	fix: permission denied and host key verification errors; add gemini ↵	Peter Stone
	elaboration fallback
2026-03-15	feat: run build (Makefile, gradlew, or go build) before sandbox autocommit	Peter Stone

2026-03-15	feat: fix task failures via sandbox improvements and display commits in Web UI	Peter Stone
	- Fix ephemeral sandbox deletion issue by passing $CLAUDOMATOR_PROJECT_DIR to agents and using it for subtask project_dir. - Implement sandbox autocommit in teardown to prevent task failures from uncommitted work. - Track git commits created during executions and persist them in the DB. - Display git commits and changestats badges in the Web UI execution history. - Add badge counts to Web UI tabs for Interrupted, Ready, and Running states. - Improve scripts/next-task to handle QUEUED tasks and configurable DB path.
2026-03-14	fix: surface agent stderr, auto-retry restart-killed tasks, handle stale ↵	Peter Stone
	sandboxes #1 - Diagnostics: tailFile() reads last 20 lines of subprocess stderr and appends to error message when claude/gemini exits non-zero. Previously all exit-1 failures were opaque; now the error_msg carries the actual subprocess output. #4 - Restart recovery: RecoverStaleRunning() now re-queues tasks after marking them FAILED, so tasks killed by a server restart automatically retry on the next boot rather than staying permanently FAILED. #2 - Stale sandbox: If a resume execution's preserved SandboxDir no longer exists (e.g. /tmp purge after reboot), clone a fresh sandbox instead of failing immediately with "no such file or directory". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14	fix: trust all directory owners in sandbox git commands	Peter Stone
	Sandbox setup runs git commands against project_dir which may be owned by a different OS user, triggering git's 'dubious ownership' error. Fix by passing -c safe.directory=* on all git commands that touch project directories. Also add wildcard to global config for immediate effect on the running server. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14	fix: cancel blocked tasks + auto-complete completion reports	Peter Stone
	Two fixes for BLOCKED task issues: 1. Allow BLOCKED → CANCELLED state transition so users can cancel tasks stuck waiting for input. Adds Cancel button to BLOCKED task cards in the UI alongside the question/answer controls. 2. Detect when agents write completion reports to $CLAUDOMATOR_QUESTION_FILE instead of real questions. If the question JSON has no options and no "?" in the text, treat it as a summary (stored on the execution) and fall through to normal completion + sandbox teardown rather than blocking. Also tightened the preamble to make the distinction explicit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13	feat: resume support, summary extraction, and task state improvements	Peter Stone
	- Extend Resume to CANCELLED, FAILED, and BUDGET_EXCEEDED tasks - Add summary extraction from agent stdout stream-json output - Fix storage: persist stdout/stderr/artifact_dir paths in UpdateExecution - Clear question_json on ResetTaskForRetry - Resume BLOCKED tasks in preserved sandbox so Claude finds its session - Add planning preamble: CLAUDOMATOR_SUMMARY_FILE env var + summary step - Update ADR-002 with new state transitions - UI style improvements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11	fix: resume BLOCKED tasks in preserved sandbox so Claude finds its session	Peter Stone
	When a task ran in a sandbox (/tmp/claudomator-sandbox-*) and went BLOCKED, Claude stored its session under the sandbox path as the project slug. The resume execution was running in project_dir, causing Claude to look for the session in the wrong project directory and fail with "No conversation found". Fix: carry SandboxDir through BlockedError → Execution → resume execution, and run the resume in that directory so the session lookup succeeds. - BlockedError gains SandboxDir field; claude.go sets it on BLOCKED exit - storage.Execution gains SandboxDir (persisted via new sandbox_dir column) - executor.go stores blockedErr.SandboxDir in the execution record - server.go copies SandboxDir from latest execution to the resume execution - claude.go uses e.SandboxDir as working dir for resume when set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10	executor: fix session ID on second block-and-resume cycle	Peter Stone
	When a resumed execution is blocked again, SessionID was set to the new exec's own UUID instead of the original ResumeSessionID. The next resume would then pass the wrong --resume argument to claude and fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09	executor: document kill-goroutine safety and add goroutine-leak test	Claudomator Agent
	The pgid-kill goroutine in execOnce() uses a select with both ctx.Done() and the killDone channel. Add a detailed comment explaining why the goroutine cannot block indefinitely: the killDone arm fires unconditionally when cmd.Wait() returns (whether the process exited naturally or was killed), so the goroutine always exits before execOnce() returns. Add TestExecOnce_NoGoroutineLeak_OnNaturalExit to verify this: it samples runtime.NumGoroutine() before and after execOnce() with a no-op binary ("true") and a background context (never cancelled), asserting no net goroutine growth. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09	executor: strengthen rate-limit avoidance in classifier	Peter Stone
	Updated isQuotaExhausted to detect more Claude quota messages. Added 'rate limit reached (rejected)' to quota exhausted checks. Strengthened classifier prompt to explicitly forbid selecting rate-limited agents. Improved Pool to set 5h rate limit on quota exhaustion.
2026-03-09	executor: fix sandbox teardown — remove working copy pull, retry push on ↵	Peter Stone
	concurrent rejection - Remove git pull into project_dir: working copy is the developer workspace and should be pulled manually; www-data can't write to root-owned .git/objects - On non-fast-forward push rejection (concurrent task pushed first), fetch and rebase then retry once instead of failing the entire task
2026-03-09	executor: fix Claude rate-limit detection and prioritize Gemini when limited	Peter Stone
	Updated parseStream to detect 'rate_limit_event' and 'assistant' error:rate_limit messages from the Claude CLI. Updated Classifier to strongly prefer non-rate-limited agents. Added logging to Pool to track rate-limit status during classification.
2026-03-08	executor: push sandbox commits via bare repo, pull into working copy	Peter Stone
	Instead of git fetch/merge INTO the working copy (which fails with mixed-owner .git/objects), clone FROM a bare repo, push BACK to it, then pull into the working copy: sandbox clone ← bare repo (local remote or origin) agent commits in sandbox git push sandbox → bare repo git pull bare repo → working copy sandboxCloneSource() prefers a remote named "local" (local bare repo), then "origin", then falls back to the working copy path. Set up: git remote add local /site/git.terst.org/repos/claudomator.git The bare repo was created with: git clone --bare /workspace/claudomator Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08	executor: fix sandbox git fetch + inject prior failure history	Peter Stone
	Fix: use file:// prefix in git fetch during sandbox teardown to force pack-protocol transfer. The local optimization uses hard links which fail across devices and with mixed-owner object stores. Feature: before running a task, query prior failed/timed-out executions and prepend their error messages to the agent's --append-system-prompt. This tells the agent what went wrong in previous attempts so it doesn't repeat the same mistakes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08	fix(executor): use --no-hardlinks for sandbox git clone	Peter Stone
	git clone --local fails with "Invalid cross-device link" when /workspace and /tmp are on different filesystems. --no-hardlinks forces object copying instead, which works across devices. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08	merge: pull latest from master and resolve conflicts	Peter Stone
	- Resolve conflicts in API server, CLI, and executor. - Maintain Gemini classification and assignment logic. - Update UI to use generic agent config and project_dir. - Fix ProjectDir/WorkingDir inconsistencies in Gemini runner. - All tests passing after merge.
2026-03-08	feat: rename working_dir→project_dir; git sandbox execution	Peter Stone
	- ClaudeConfig.WorkingDir → ProjectDir (json: project_dir) - UnmarshalJSON fallback reads legacy working_dir from DB records - New executions with project_dir clone into a temp sandbox via git clone --local - Non-git project_dirs get git init + initial commit before clone - After success: verify clean working tree, merge --ff-only back to project_dir, remove sandbox - On failure/BLOCKED: sandbox preserved, path included in error message - Resume executions run directly in project_dir (no re-clone) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08	refactor(executor): update runners and tests for generic agents	Peter Stone

2026-03-08	fix: detect quota exhaustion from stream; map to BUDGET_EXCEEDED not FAILED	Peter Stone
	When claude hits the 5-hour usage limit it exits 1. execOnce was returning the generic "exit status 1" error, hiding the real cause from the retry loop and the task state machine. Fix: - execOnce now surfaces streamErr when it indicates rate limiting or quota exhaustion, so callers see the actual message. - New isQuotaExhausted() detects "hit your limit" messages — these are not retried (retrying a depleted 5h bucket wastes nothing but is pointless), and map to BUDGET_EXCEEDED in both execute/executeResume. - isRateLimitError() remains for transient throttling (429/overloaded), which continues to trigger exponential backoff retries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06	feat: blocked task state for agent questions via session resume	Peter Stone
	When an agent needs user input it writes a question to $CLAUDOMATOR_QUESTION_FILE and exits. The runner detects the file and returns BlockedError; the pool transitions the task to BLOCKED and stores the question JSON on the task record. The user answers via POST /api/tasks/{id}/answer. The server looks up the claude session_id from the most recent execution and submits a resume execution (claude --resume <session-id> "<answer>"), freeing the executor slot entirely while waiting. Changes: - task: add StateBlocked, transitions RUNNING→BLOCKED, BLOCKED→QUEUED - storage: add session_id to executions, question_json to tasks; add GetLatestExecution and UpdateTaskQuestion methods - executor: BlockedError type; ClaudeRunner pre-assigns --session-id, sets CLAUDOMATOR_QUESTION_FILE env var, detects question file on exit; buildArgs handles --resume mode; Pool.SubmitResume for resume path - api: handleAnswerQuestion rewritten to create resume execution - preamble: add question protocol instructions for agents - web: BLOCKED state badge (indigo), question text + option buttons or free-text input with Submit on the task card footer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05	executor: default permission_mode to bypassPermissions	Peter Stone
	claudomator runs tasks unattended; prompting for write permission always stalls execution. Any task without an explicit permission_mode now gets --permission-mode bypassPermissions automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05	executor: persist log paths at execution create time, not just at end	Peter Stone
	Add LogPather interface; ClaudeRunner implements it via ExecLogDir(). Pool pre-populates stdout_path/stderr_path/artifact_dir on the execution record before CreateExecution, so paths are in the DB from the moment a task starts running. ClaudeRunner.Run() skips path assignment when already set by the pool. Also update scripts/debug-execution to derive paths from the known convention (<data-dir>/executions/<exec-id>/) as a fallback for historical records that predate this change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05	executor: detect stream-level failures when claude exits 0	Peter Stone
	Rename streamAndParseCost → parseStream (returns float64, error). Detect two failure modes that claude reports via exit 0: - result message with is_error:true - tool_result permission denial ("haven't granted it yet") Also fix cost extraction to read total_cost_usd from the result message (the actual field name), keeping cost_usd as legacy fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05	Rescue work from claudomator-work: question/answer, ratelimit, start-next-task	Peter Stone
	Merges features developed in /site/doot.terst.org/claudomator-work (a stale clone) into the canonical repo: - executor: QuestionRegistry for human-in-the-loop answers, rate limit detection and exponential backoff retry (ratelimit.go, question.go) - executor/claude.go: process group isolation (SIGKILL orphans on cancel), os.Pipe for reliable stdout drain, backoff retry on rate limits - api/scripts.go: POST /api/scripts/start-next-task handler - api/server.go: startNextTaskScript field, answer-question route, BroadcastQuestion for WebSocket question events - web: Cancel/Restart buttons, question banner UI, log viewer, validate section, WebSocket auto-connect All tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03	Fix working_dir failures: validate path early, remove hardcoded /root	Peter Stone
	executor/claude.go: stat working_dir before cmd.Start() so a missing or inaccessible directory surfaces as a clear error ("working_dir \"/bad/path\": no such file or directory") rather than an opaque chdir failure wrapped in "starting claude". api/elaborate.go: replace the hardcoded /root/workspace/claudomator path with buildElaboratePrompt(workDir) which injects the server's actual working directory (from os.Getwd() at startup). Empty workDir tells the model to leave working_dir blank. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03	Executor: dependency waiting and planning preamble	Peter Stone
	- Pool.waitForDependencies polls depends_on task states before running - ClaudeRunner prepends planningPreamble to task instructions to prompt a plan-then-implement approach - Rate-limit test helper updated to match new ClaudeRunner signature Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24	Add --verbose flag to Claude subprocess invocation	Peter Stone
	Ensures richer stream-json output for cost parsing and debugging. Adds a test to verify --verbose is always present in built args. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-08	Rename Go module to github.com/thepeterstone/claudomator	Peter Stone
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08	Initial project: task model, executor, API server, CLI, storage, reporter	Peter Stone
	Claudomator automation toolkit for Claude Code with: - Task model with YAML parsing, validation, state machine (49 tests, 0 races) - SQLite storage for tasks and executions - Executor pool with bounded concurrency, timeout, cancellation - REST API + WebSocket for mobile PWA integration - Webhook/multi-notifier system - CLI: init, run, serve, list, status commands - Console, JSON, HTML reporters with cost tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>