summaryrefslogtreecommitdiff
path: root/docs
AgeCommit message (Collapse)Author
2026-05-13merge: integrate github/main — LocalRunner, real GeminiRunner, llm clientHEADmainPeter Stone
Merges 12 commits from github/main (formerly master) that were developed independently. Key additions: - LocalRunner: OpenAI-compatible local LLM execution (Ollama, LM Studio) - Real GeminiRunner with full sandbox parity to ClaudeRunner - llm.Client for enriching CI failures and elaboration via local model - retry.ParseRetryAfter moved to shared package - tokens_in/tokens_out columns in executions table Conflict resolutions: - Kept local main's VAPID/push, stories, projects, agent events schema - Merged both sets of Config fields (local + LocalModel from github/main) - Unified activePerAgent accounting (decActiveAgent helper) - Removed duplicate helpers from claude.go (now in helpers.go) - Fixed double-decrement bug in handleRunResult vs decActiveAgent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03docs: rewrite CLAUDE.md with accurate architecture, known bugs, and design ↵Claude
debt; add improvement plan - Fix YAML key in Task format (agent: not claude:) - Document complete state machine including BLOCKED and READY states - Document sandbox lifecycle and git clone failure root cause (sandboxDir shadowing) - Document all packages including notify, deployment, version, classifier, preamble - Call out GeminiRunner as a non-functional stub - Document design debt: unused priority/retry fields, double changestats extraction, hardcoded master branch, context.Background() in resume, non-atomic exec creation - Add docs/IMPROVEMENT_PLAN.md with 20 prioritized issues ranked by effort and impact https://claude.ai/code/session_01W8P6wwDYF7dpETXWVXCz5L
2026-05-03chore: close deferred work — real GeminiRunner, Local UI option, db.go cleanupClaude
Closes the three items left on the deferred queue after the post-epic cleanup. GeminiRunner.execOnce now actually executes the gemini binary instead of writing hardcoded stream data. Mirrors ClaudeRunner.execOnce: - exec.CommandContext with the same env vars (CLAUDOMATOR_API_URL etc.) - process group SIGKILL on context cancel - stdout piped through parseGeminiStream → stdoutFile - stderr to file - exit codes captured, stderr tail surfaced on failure Test infrastructure bug uncovered in passing: testServerWithGeminiMockRunner's mock script used double-quoted echo with literal triple-backticks, which bash interpreted as command substitution. The script always produced empty output. The bug was invisible until now because GeminiRunner ignored the script entirely. Switched to a single-quoted heredoc. Frontend: index.html dropdown gains a "Local" option. No JS branching needed — the value flows through to agent.type verbatim and downstream display reads the type string as-is. storage/db.go: removed stale debug-comment scaffolding (the "TODO: Replace with proper logger" block) that was tracking a dead `fmt.Printf` call. The path it commented on is fine without logging — unmarshal errors are returned wrapped. Test status: `go test -race ./...` green across every package, zero skips, zero excluded tests. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-05-03chore: post-epic cleanup — green test suite, no skipsClaude
Addresses the cleanup queue captured in docs/plans/local-oss-runner.md after the local-OSS-models epic landed. After this commit `go test -race ./...` is green across every package with zero `t.Skip` calls and no excluded tests. Real bugs fixed: - claude.go setupSandbox callsites used `sandboxDir, err := ...` which shadowed the outer variable, so BlockedError.SandboxDir was always empty. Resume-after-block was broken for both new and stale-sandbox paths. TestBlockedError_IncludesSandboxDir now exercises the right invariant. - TestPool_ActivePerAgent_DeletesZeroEntries flake under -race: the cleanup defer in execute()/executeResume() runs AFTER handleRunResult sends on resultCh, so consumers observing a result could see a still-counted activePerAgent entry. Extracted decActiveAgent(agentType, *cleaned) helper; called explicitly before every resultCh send, defer becomes a no-op via the cleaned flag. Verified clean over `go test -race -count=10`. Test infrastructure made hermetic: - gitSafe now also passes -c commit.gpgsign=false / -c tag.gpgsign=false so sandbox tests pass on hosts whose global config requires signing. - Bare repos in tests initialized with `-b main` (HEAD symbolic ref matched to the branch we push) so `git log` after push works. - TestSandboxCloneSource_FallsBackToOrigin uses a local-FS origin URL, matching sandboxCloneSource's intentional filter against network URLs. - TestGeminiLogs_ParsedCorrectly URL fixed to the actual log route (/api/executions/{id}/log). GeminiRunner gap closed (partial): - parseGeminiStream now walks lines for `result` events, surfacing is_error as an error and total_cost_usd as the float return value. - GeminiRunner.Run propagates parsed cost to Execution.CostUSD. - TestParseGeminiStream_ParsesStructuredOutput unskipped. Notes: - GeminiRunner is still simulated end-to-end (Run writes hardcoded stream data instead of execing the binary). The result/cost parser now exists; finishing the runner is a smaller, contained follow-up. Kept on the deferred queue. - Frontend "Local" agent option and a minor storage.db.go logger TODO remain on the deferred queue, both intentionally — neither blocks anything in flight. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-05-02feat(executor): synthesize execution summary via local LLM fallbackClaude
Phase 4 of "local OSS models as agents" plan. Closes the epic. When an execution finishes and the agent did NOT write a "## Summary" heading in its stdout (so the existing extractSummary path returns empty), and the Pool has a local LLM configured, we now synthesize a 2-4 sentence summary from the assistant text content of the log tail. Behavior: - Primary path unchanged: if the agent wrote "## Summary", that wins byte-for-byte (TestPool_HandleRunResult_ExtractSummaryWins guards). - Fallback path: empty extractSummary + Pool.LLM != nil → synthesize. - All-empty path: when no LLM is configured, summary stays empty — identical to pre-Phase-4 behavior. Implementation: - Pool gains an LLM *llm.Client field, wired in serve.go and run.go alongside Classifier.LLM (same localClient used everywhere). - New synthesizeSummary in internal/executor/summary.go: * 6s timeout so a slow local model can't stall finalization * 16 KB tail cap on the stdout log * readAssistantTextTail seeks to the last 16 KB and skips the first (likely partial) line, parses each line as a stream-json event, joins assistant `text` blocks (skips system/result/etc). * Returns "" on any error so the caller's behavior never regresses. - handleRunResult: 3-tier summary resolution — exec.Summary set by runner → extractSummary → synthesizeSummary → empty. - minimalMockStore now records UpdateTaskSummary calls (additive; existing tests unaffected) so integration tests can assert. Tests (9 new): - synthesizeSummary nil client / empty path / missing file all return "" without HTTP calls. - empty assistant content short-circuits without LLM call. - success path returns trimmed body, with both assistant texts in the user prompt. - LLM 500 returns "" (caller handles same as no-summary). - readAssistantTextTail seeks past early content in a large file. - Pool integration: ## Summary present → LLM not called, agent text used. ## Summary absent + LLM set → LLM called, synthesized summary recorded against the right task ID. Plan: docs/plans/local-oss-runner.md. Epic complete. Post-epic deep cleanup queue captured in the same plan file for follow-up. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-05-02feat(api): enrich CI failure task instructions via local LLMClaude
Phase 3 of "local OSS models as agents" plan. When the webhook handler creates a task for a failed CI run AND a local LLM is configured on the server, the hardcoded 4-step investigation template is replaced with a project-aware investigation plan generated by the LLM. Scope adjustment from the original sketch: the original plan said "summarize fetched workflow logs", but fetching logs requires GitHub API auth that isn't wired. Narrowed to project-context triage — recent git log + CLAUDE.md content + webhook metadata, fed to the LLM with a system prompt asking for 6-12 lines of concrete next steps. Deferred GitHub log fetching to post-epic cleanup. Implementation: - New internal/api/webhook_llm.go holds enrichCIInstructions and its helpers (readRecentCommits via `git log`, readProjectDoc). - enrichCIInstructions is truly additive: any failure mode (no client, HTTP error, empty body, 10s timeout) returns the original fallback template unchanged. Existing webhook tests pass byte-for-byte. - Always preserves a metadata header (repo/branch/SHA/check/URL) ahead of the LLM body so investigators don't lose context if the LLM is terse. - Reuses s.llm (set via Server.SetLLM in Phase 2) — no new config knob, no per-feature gating. Asymmetric opt-out (yes-elaborate, no-CI-triage) deferred until there's actual demand. Tests: - enrichCIInstructions: nil client, LLM 500, empty body all return fallback unchanged. - enrichCIInstructions: success path produces enriched body with metadata header preserved; user prompt contains repo/branch/SHA. - enrichCIInstructions: real git repo (init + 2 commits) → recent commits appear in user prompt. - Webhook handler regression guard: no-LLM path produces the exact legacy template substrings. - Webhook handler with LLM stubbed: task instructions contain LLM body + metadata header. Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-28feat(api): route elaboration through local LLM when configuredClaude
Phase 2 of "local OSS models as agents" plan. Adds a third elaboration path that calls the local OpenAI-compatible LLM via the internal/llm client, and reorders dispatch so the cheap path is tried first: local → claude → gemini, with each next attempt only on hard failure of the prior. Wiring is opt-out, not opt-in: when [local_model].endpoint is set, elaboration prefers local by default. Users with a slow or low-quality local model can disable just elaboration via: [local_model] endpoint = "..." prefer_for_elaborate = false without giving up the runner or the classifier path. Implementation: - Server gains an optional *llm.Client field via SetLLM (matches the existing SetNotifier/SetWorkspaceRoot setter pattern, no NewServer signature break). - elaborateWithLocal() reuses buildElaboratePrompt verbatim and asks for response_format=json_object so we skip markdown-fence cleanup. - handleElaborateTask reorders try chain; existing Claude-first behavior is preserved exactly when SetLLM is not called. - LocalModel.UseForElaborate() encapsulates the default-true gating with a *bool so explicit-false survives TOML parse. Tests: - elaborateWithLocal: parses valid response, errors on nil client, errors on bad JSON. - handler: local preferred when wired; falls back to claude when local fails; unchanged behavior when no LLM is configured. - config: UseForElaborate gating across empty/default/explicit-true/ explicit-false cases. Pre-existing test failures noted in docs/plans/local-oss-runner.md (post-epic cleanup): TestGeminiLogs_ParsedCorrectly returns 404 for gemini execution log fetch — predates this change. Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-28feat(executor): add LocalRunner and OpenAI-compat LLM clientClaude
Phase 1 of "local OSS models as agents" plan. Adds a third Runner backed by any OpenAI-compatible HTTP server (Ollama, vLLM, LM Studio, llama.cpp), and migrates the Gemini-CLI classifier to route through the same client when configured. Two-layer split: internal/llm.Client is the workhorse (HTTP, no Pool, no DB) used directly by the classifier and any future internal helper that needs cheap reasoning. internal/executor.LocalRunner is a thin adapter implementing Runner for user-facing tasks. This avoids Pool reentrancy/deadlock when sub-second internal calls fire from inside Pool.execute(). Highlights: - internal/retry: relocated runWithBackoff/IsRateLimitError/ParseRetryAfter into a shared package reused by executor and llm. - internal/llm: Chat (non-streaming) and ChatStream (SSE) over /chat/completions with optional bearer auth, json_object response format, retry on 429/503, Retry-After parsing. - internal/executor/LocalRunner: streams deltas into stdout.log in the same stream-json envelope ClaudeRunner emits, then writes one consolidated assistant block plus a result terminator so existing parsers (extractSummary, ParseChangestatFromOutput) work unchanged. - internal/executor/Classifier: gains optional LLM field; uses json_object response format (no markdown-fence cleanup needed). Falls back to Gemini-CLI subprocess when LLM is nil. - Pool.skipClassification: now skips only when the requested agent type is registered, so unknown types still reach the load balancer. - Storage: additive tokens_in/tokens_out ALTERs on executions; CLI runners record cost_usd as before, LocalRunner records 0 + tokens. - Config: [local_model] section (endpoint, model, timeout_seconds, default_temperature, api_key). Empty endpoint = no LocalRunner registered, classifier falls back to Gemini. Pre-existing test issues fixed in passing: - claude_test.go setupSandbox callsites updated to current signature. - gemini_test.go TestParseGeminiStream skipped (asserts unimplemented GeminiRunner stream-error parsing; tracked separately). Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-04docs: task checker + story ship implementation planPeter Stone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04docs: task checker agent and story ship gate design specPeter Stone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03docs: add task→project FK migration planPeter Stone
2026-03-22chore: unify and centralize agent configuration in .agent/Peter Stone
2026-03-21feat: Phase 2 — project registry, legacy field cleanup, credential path fixPeter Stone
- task.Project type + storage CRUD + UpsertProject + SeedProjects - Remove AgentConfig.ProjectDir, RepositoryURL, SkipPlanning - Remove ContainerRunner fallback git init logic - Project API endpoints: GET/POST /api/projects, GET/PUT /api/projects/{id} - processResult no longer extracts changestats (pool-side only) - claude_config_dir config field; default to credentials/claude/ - New scripts: sync-credentials, fix-permissions, check-token Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20docs: update ADR-007 with validation pipeline and nav projectPeter Stone
- Story state machine: SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX - Merge-first strategy: no branch review phase, tests are the confidence mechanism - Elaborator owns validation spec (type, steps, success_criteria) - Validation types: curl | tests | playwright | gradle - Nav project (Android): deploy = push to GitHub, validate = gradle test/lint - Project registry: type + deploy_script fields, initial claudomator + nav entries - Out of scope: branch review deferred, CI polling out of band for nav Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19chore: improve debug-execution script and add ADR-007Peter Stone
- debug-execution: default to most recent execution when no ID given - docs/adr/007: planning layer and story model design decisions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18fix: address final container execution issues and cleanup review docsPeter Stone
2026-03-18fix: address round 3 review feedback for container executionPeter Stone
- Fix push failure swallowing and ensure workspace preservation on push error - Fix wrong session ID in --resume flag and BlockedError - Implement safer shell quoting for instructions in buildInnerCmd - Capture and propagate actual Claude session ID from stream init message - Clean up redundant image resolution and stale TODOs - Mark ADR-005 as Superseded - Consolidate RepositoryURL to Task level (removed from AgentConfig) - Add unit test for session ID extraction in parseStream
2026-03-18fix: address round 2 review feedback for container executionPeter Stone
- Fix host/container path confusion for --env-file - Fix --resume flag to only be used during resumptions - Fix instruction passing to Claude CLI via shell-wrapped cat - Restore streamErr return logic to detect task-level failures - Improve success flag logic for workspace preservation - Remove duplicate RepositoryURL from AgentConfig - Fix app.js indentation and reformat DOMContentLoaded block - Restore behavioral test coverage in container_test.go
2026-03-18feat: implement containerized repository-based execution modelPeter Stone
This commit implements the architectural shift from local directory-based sandboxing to containerized execution using canonical repository URLs. Key changes: - Data Model: Added RepositoryURL and ContainerImage to task/agent configs. - Storage: Updated SQLite schema and queries to handle new fields. - Executor: Implemented ContainerRunner using Docker/Podman for isolation. - API/UI: Overhauled task creation to use Repository URLs and Image selection. - Webhook: Updated GitHub webhook to derive Repository URLs automatically. - Docs: Updated ADR-005 with risk feedback and added ADR-006 to document the new containerized model. - Defaults: Updated serve command to use ContainerRunner for all agents. This fixes systemic task failures caused by build dependency and permission issues on the host system.
2026-03-16fix: permission denied and host key verification errors; add gemini ↵Peter Stone
elaboration fallback
2026-03-16feat: add elaboration_input field to tasks for richer subtask placeholderClaudomator Agent
- Add ElaborationInput field to Task struct (task.go) - Add DB migration and update CREATE/SELECT/scan in storage/db.go - Update handleCreateTask to accept elaboration_input from API - Update renderSubtaskRollup in app.js to prefer elaboration_input over description - Capture elaborate prompt in createTask() form submission - Update subtask-placeholder tests to cover elaboration_input priority - Fix missing io import in gemini.go When a task card is waiting for subtasks, it now shows: 1. The raw user prompt from elaboration (if stored) 2. The task description truncated at word boundary (~120 chars) 3. The task name as fallback 4. 'Waiting for subtasks…' only when all fields are empty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13merge: resolve conflicts with local/master (stats tab + summary styles)Peter Stone
Keep file-based summary approach (CLAUDOMATOR_SUMMARY_FILE) from HEAD. Combine Q&A History and Stats tab CSS from both branches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13feat: resume support, summary extraction, and task state improvementsPeter Stone
- Extend Resume to CANCELLED, FAILED, and BUDGET_EXCEEDED tasks - Add summary extraction from agent stdout stream-json output - Fix storage: persist stdout/stderr/artifact_dir paths in UpdateExecution - Clear question_json on ResetTaskForRetry - Resume BLOCKED tasks in preserved sandbox so Claude finds its session - Add planning preamble: CLAUDOMATOR_SUMMARY_FILE env var + summary step - Update ADR-002 with new state transitions - UI style improvements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11feat: add Stats tab with task distribution and execution health metricsClaudomator Agent
- Export computeTaskStats and computeExecutionStats from app.js - Add renderStatsPanel with state count grid, KPI row (total/success-rate/cost/avg-duration), and outcome bar chart - Wire stats tab into switchTab and poll for live refresh - Add Stats tab button and panel to index.html - Add CSS for .stats-counts, .stats-kpis, .stats-bar-chart using existing state color variables - Add docs/stats-tab-plan.md with component structure and data flow - 14 new unit tests in web/test/stats.test.mjs (140 total, all passing) No backend changes — derives all metrics from existing /api/tasks and /api/executions endpoints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11Merge remote-tracking branch 'local/master'Peter Stone
2026-03-11docs: add discovery notes and implementation plan for summary/QA featureClaudomator Agent
2026-03-11docs: add executor package documentationClaudomator Agent
2026-03-11docs: add system architecture overviewClaudomator Agent
2026-03-10docs: add development narrative and ADRs 004-005Peter Stone
RAW_NARRATIVE.md: comprehensive chronological engineering history reconstructed from the git log covering all 45 major milestones. ADR-004: multi-agent routing — explicit load balancing in code (pickAgent) plus Gemini-based model classification (Classifier), and why the two decisions are intentionally separated. ADR-005: git sandbox execution model — clone isolation, bare-repo push, uncommitted-change enforcement, BLOCKED preservation, and session ID propagation on second resume cycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10task: promote validTransitions to package-level var; fix ADRPeter Stone
Hoists the map out of ValidTransition so it's not reallocated on every call. Adds missing CANCELLED→QUEUED and BUDGET_EXCEEDED→QUEUED entries to the ADR transition table to match the implemented state machine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09docs: update ADR-002 for parent-subtask BLOCKED→READY state behaviorClaudomator Agent
- Transition table: add BLOCKED→READY (trigger: all subtasks COMPLETED) - Transition table: clarify RUNNING→READY only when no subtasks exist - Transition table: add RUNNING→BLOCKED for parent-with-subtasks path - Execution outcome mapping: reflect subtask check - State diagram: show BLOCKED→READY arc - Key Invariants: add #7 parent-with-subtasks goes BLOCKED on runner exit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08docs: add ADR 003 security modelPeter Stone
Documents the trust boundary, API token auth, per-IP rate limiting, WebSocket client cap, and known risks for the Claudomator security posture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06recover: restore untracked work from recovery branch (no Gemini changes)Peter Stone
Recovered files with no Claude→Agent contamination: - docs/adr/002-task-state-machine.md - internal/api/logs.go/logs_test.go: task-level log streaming endpoint - internal/api/validate.go/validate_test.go: POST /api/tasks/validate - internal/api/server_test.go, storage/db_test.go: expanded test coverage - scripts/reset-failed-tasks, reset-running-tasks - web/app.js, index.html, style.css: frontend improvements - web/test/: active-tasks-tab, delete-button, filter-tabs, sort-tasks tests Manually applied from server.go diff (skipping Claude→Agent rename): - taskLogStore field + validateCmdPath field - DELETE /api/tasks/{id} route + handleDeleteTask - GET /api/tasks/{id}/logs/stream route - POST /api/tasks/{id}/resume route + handleResumeTimedOutTask - handleCancelTask: allow cancelling PENDING/QUEUED tasks directly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-08Initial project: task model, executor, API server, CLI, storage, reporterPeter Stone
Claudomator automation toolkit for Claude Code with: - Task model with YAML parsing, validation, state machine (49 tests, 0 races) - SQLite storage for tasks and executions - Executor pool with bounded concurrency, timeout, cancellation - REST API + WebSocket for mobile PWA integration - Webhook/multi-notifier system - CLI: init, run, serve, list, status commands - Console, JSON, HTML reporters with cost tracking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>