claudomator.git/internal/cli, branch main

fix(executor): periodically sweep stale dispatch workspace directories

2026-07-10T22:13:49+00:00

ContainerRunner preserves a failed execution's workspace indefinitely for debugging, with no expiry -- this accumulated 161 stale directories (~17.6GB) and took the host to 100% disk full on 2026-07-10. Pool.RunWorkspaceCleanup sweeps claudomator-workspace-* dirs older than 24h every hour, started from serve.go, mirroring StoryOrchestrator.Run's ticker shape. Never removes a directory still referenced as a currently-BLOCKED task's sandbox_dir, regardless of age.

feat(storage): seed the builder role's system prompt on server startup

2026-07-10T20:22:30+00:00

feat(story,scheduler): add epic-proposal tool + AskUser-timeout escalation (Phase 7c)

2026-07-04T04:39:28+00:00

Two independent pieces, completing Phase 7. Epic-proposal tool: AgentChannel gains a 5th method, ProposeEpic(ctx, EpicProposal{Name, Description, StoryIDs}) (epicID, err), implemented on storeChannel -- matches an existing epic by exact name or creates one (DiscoverySource: "agent"), sets epic_id on each resolvable story (skips, doesn't fail, on an unresolved ID), emits KindEpicProposed attached to the epic's own ID with payload {epic_id, name, story_ids}. Wired into both transports exactly like Phase 6 wired role into spawn_subtask: a new propose_epic tool in the native tool-use loop (internal/agentloop/tools.go) and the MCP transport (internal/executor/agentmcp.go). This is the mechanism for a discovery/planner-role agent to act on its own judgment that several stories it's been given form one cohesive initiative -- the judgment itself lives in the calling agent's instructions/model, not in this code. AskUser-timeout escalation: extends the existing Scheduler (Phase 5's retry-then-escalate watcher) rather than adding a new component, since "stuck task needs escalation" is exactly what it already does. Finds role-typed BLOCKED tasks whose question has been outstanding longer than SchedulerConfig.AskUserTimeoutSeconds (default 10 minutes) using task.UpdatedAt as the outstanding-since timestamp -- no new column needed, since UpdateTaskQuestion already stamps it the instant a question is recorded and nothing else touches the row while BLOCKED. Resolves the next ladder tier from the latest execution's EscalationRung, records the system-authored fallback answer as an audit-trail task.Interaction, clears the question, sets the new tasks.needs_review flag, emits KindEscalated (now carrying a trigger field: "failure" vs "ask_user_timeout" for the existing failure-retry path vs this one), and resumes via Pool.SubmitResume at the escalated tier -- degrading to same-tier resume with final:true if the ladder's exhausted or no role config exists, since unblocking the task takes priority over having somewhere higher to escalate to. GET /api/tasks?needs_review=true surfaces auto-decided tasks for human review. go build/vet/test -race -count=1 all pass, full suite (20 packages), run twice to rule out flakiness in the new tests. (One pre-existing, unrelated test -- TestHandleRunTask_CascadesRetryToFailedDeps, a tempdir-cleanup race -- appeared once under full-suite load per the implementing agent's report and did not reproduce in this verification's runs either; not a regression from this work.) Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(story): add StoryOrchestrator -- Builder->Evaluators->Arbitration->accept (Phase 7b)

2026-07-04T04:08:41+00:00

A deterministic, poll-based watcher (internal/scheduler.StoryOrchestrator, sibling to the Phase 5 Scheduler) that drives a story.Story through its execution pipeline, rather than relying on an LLM agent to correctly orchestrate its own fan-out via tool calls. Mechanism: polling, not a handleRunResult hook. Every task the orchestrator watches (a story's root/Builder task, 4 Evaluators, Arbitration) is top-level (no ParentTaskID), and executor.Pool.handleRunResult only ever lands a top-level task at READY or BLOCKED -- never COMPLETED directly, since that transition normally requires a human/chatbot POST /api/tasks/{id}/accept in a different package. A handleRunResult hook would never observe it; polling doesn't care how/whether a task reached a given state. Stages: Builder COMPLETED -> spawn 4 role-typed Evaluator tasks (evaluator_quality/security/correctness/performance, DependsOn: [builder], no ParentTaskID -- true DAG siblings, not delegated subtasks) + story -> VALIDATING. Each Evaluator COMPLETED -> emit KindEvalVerdict (attached to the story's ID, so one GET /api/stories/{id}/events call surfaces every verdict). All 4 Evaluators COMPLETED -> spawn 1 Arbitration task (role: planner, DependsOn: all 4 evaluator IDs). Arbitration COMPLETED -> emit KindArbitrationDecided, story -> REVIEW_READY. POST /api/stories/{id}/accept (mirrors handleAcceptTask) -> DONE, emits KindHumanAccepted. Fixes a gap caught before merging: since none of Builder/Evaluators/ Arbitration have a ParentTaskID, none of them auto-complete -- each would otherwise need a separate manual /api/tasks/{id}/accept, meaning 6 human clicks per story before ever reaching the intended single story-level gate. StoryOrchestrator.autoAccept now transitions each of these specific tasks READY->COMPLETED itself (via the same validated Store.UpdateTaskState path acceptTask uses), scoped only to tasks already established as part of a story's pipeline (root task, or role-matched dependents from ensureEvaluators/ensureArbitration) -- never a blanket sweep of unrelated READY tasks. This makes POST /api/stories/{id}/accept the system's only required human touchpoint for the whole chain, matching the design goal that story (not task/subtask) is the human-interaction atom. Idempotency: structural for task-creation stages (ensureEvaluators/ ensureArbitration check ListDependents for already-existing role-matched tasks before creating -- crash/restart-safe); story.Status=="VALIDATING" gates the Arbitration->REVIEW_READY write (nothing further downstream to check structurally there); an in-memory handledVerdicts set (mirrors Scheduler.handled) dedupes per-evaluator KindEvalVerdict emission across poll ticks, resetting harmlessly on restart. Documented simplification: finalizeArbitration never parses the Arbitration summary for approve/reject -- always routes to REVIEW_READY; NEEDS_FIX is manually settable via PUT /api/stories/{id}. A later phase could close this with a dedicated verdict-reporting AgentChannel method instead of parsing free text. go build/vet/test -race -count=1 all pass, full suite (20 packages). Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(role): add versioned role configs + escalation ladder + scheduler (Phase 5)

2026-07-03T23:01:50+00:00

Two parts: Part A (fixes a gap from Phase 1): Groq/OpenRouter/OpenAI were documented in docs/api-keys-setup.md as usable once configured, but nothing actually constructed runners for them. internal/cli/cloudrunners.go consolidates anthropic/google/groq/openrouter/openai NativeRunner construction into one table-driven registerCloudRunners() helper, replacing the two hand-written per-provider blocks in serve.go/run.go. Groq/OpenRouter/OpenAI reuse openaicompat (no new adapter code) at SandboxKind: "docker". Part B: the token-husbanding harness's core routing mechanism. - internal/role: RoleConfig/Tier/Rung -- a role's system prompt and a multi-tier (provider, model) escalation ladder, versioned via config_json. - storage: new role_configs table (draft/active/retired, UNIQUE(role, version)) with transactional activate-retires-prior-active semantics; new executions.escalation_rung column. - task.AgentConfig.Role string -- purely additive; every existing task shape (Agent.Role == "") is unaffected, proven by TestPool_Execute_NonRoleTask_Unaffected plus the full pre-existing suite passing unchanged. - executor.Pool.execute(): role-typed tasks with no Agent.Type yet resolve tier 0 of their active ladder (round-robin across multi-candidate tiers, skipping rate-limited providers, falling back to soonest-clearing) before the existing pickAgent/Classifier path runs; SystemPrompt applies to Agent.SystemPromptAppend. Already-resolved role tasks (scheduler resubmits) get their escalation_rung re-derived read-only via findTierIndex. - internal/scheduler: polls role-typed FAILED tasks, retries at the same rung under MaxRetries or escalates to the next tier's first candidate when budget.Accountant.Allow() permits (emitting event.KindEscalated), else leaves the task FAILED with a final:true KindEscalated event. An in-memory per-execution-ID "handled" set keeps the poll loop convergent. Started by `serve` only, config knob [scheduler].poll_interval_seconds. - internal/api: POST/GET /api/roles/{role}/versions, POST /api/roles/{role}/activate -- unauthenticated, matching the existing projects/tasks REST endpoints' auth posture (only chatbot MCP, agent MCP, and WebSocket are api_token-gated in this codebase today). Documented as stored-but-not-yet-enforced (CLAUDE.md Design Debt, matching how task.Priority/RetryConfig are already documented): RoleConfig.Tools/ SandboxKind don't affect dispatch yet; DefaultBudgetUSD is read narrowly as the scheduler's escalation cost estimate, not enforced at initial dispatch; scheduler escalation always targets Candidates[0] (no round-robin, unlike initial-dispatch tier-0 resolution); the scheduler's dedupe is per-process and resets on restart (idempotent, harmless). go build/vet/test -race -count=1 all pass, 21 packages. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(provider): add native Google Gemini API adapter (Phase 4)

2026-07-03T22:33:22+00:00

Adds internal/provider/google, the second native cloud adapter (following internal/provider/anthropic's pattern) on top of Phase 1's provider-neutral tool-use loop, wired to a Docker-sandboxed NativeRunner under agent.type: "google" -- a separate execution path and budget bucket from the existing CLI-subprocess "gemini" ContainerRunner, which is untouched. Wire-format research (the highest-risk part of this adapter): Gemini's multi-turn function-calling shape was resolved by cross-referencing the REST API reference's own generateContent example against the go-genai SDK's struct tags on GitHub -- both agree on functionCall/functionResponse parts keyed by "name" (with an optional "id" for round-tripping ToolCall.ID), with the response fed back inside a "user"-role Content (Gemini has no tool/function role, mirroring Anthropic's lack of one). A separate fetched source (the function-calling guide page) was deliberately discarded as a reference for this shape -- it documents a different, newer "Interactions API" whose call_id/type:"function_result" structure doesn't fit the contents/parts/candidates shape used everywhere else. - internal/provider/google: request/response translation, systemInstruction handling, role mapping (assistant->model, tool-results->user role), per-model-prefix pricing table (2.5 Pro/Flash/Flash-Lite, 2.0, 1.5 tiers) - internal/retry: IsRateLimitError additively extended for RESOURCE_EXHAUSTED - internal/config: RunnersConfig.Google/GoogleEnabled() - internal/cli/serve.go, run.go: runners["google"] construction mirroring the Anthropic wiring exactly (Docker sandbox default) - docs/api-keys-setup.md: Google marked wired-up, budget-bucket/disable/ verify guidance added matching the Anthropic section go build/vet/test -race all pass. No live Gemini API key available in this environment; verified via fake-httptest-server adapter tests (plain text, tool-use round-trip, multi-turn tool-result, rate-limit error matching) plus a Pool/NativeRunner routing test. Live E2E is a follow-up once a key is configured, same as Phase 2's Anthropic adapter. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(sandbox): add DockerSandbox + pre-tool-use guardrail hooks (Phase 3)

2026-07-03T09:21:32+00:00

Gives native-API-driven agents (currently just the Phase 2 Anthropic adapter) real container isolation, decoupled from model invocation -- the model call happens in the Go process via provider.Provider, only tool execution happens in the container, unlike the CLI-subprocess ContainerRunner (left completely untouched) where the claude/gemini CLI runs inside the container. - internal/sandbox/dockersandbox.go: Sandbox via a long-lived `docker run -d ... sleep infinity` container (started once per execution, not per tool call), host-side git clone + bind-mount matching ContainerRunner's existing pattern, docker exec for read/write/bash/glob. Reuses images/agent-base (claudomator-agent:latest) rather than standing up a second image. WorkDir()/resume persists the host bind-mount directory (matching HostSandbox's contract); a resumed sandbox lazily starts a fresh container against that directory rather than trying to reattach to a possibly-gone one. - internal/sandbox/guard.go, hooks.go: Hook interface (CheckBash/CheckWrite), Guarded wrapper, DenylistBashHook (rm -rf /, force-push, curl|sh, sudo, chmod 777, dd if=) and ProtectedPathHook (.git/**, .env*, credentials/, .github/workflows/**). A rejection returns *RejectionError, which agentloop/tools.go now recognizes and feeds back to the model as a normal (non-fatal) tool-error result instead of aborting the run. - NativeRunner wraps whichever Sandbox it builds (Host or Docker) in Guarded{Hooks: DefaultHooks()} uniformly. The "anthropic" runner now uses DockerSandbox; "local" stays on HostSandbox by design (local models are the harness's more-trusted, lower-stakes-to-run tier). Docker is not installed in this dev environment (no docker/podman/containerd on PATH), so DockerSandbox's real container-lifecycle behavior is verified via mocked-command unit tests only -- go test -race ./... passes throughout, with the two real-daemon integration tests gated behind a dockerAvailable(t) check and skipping here. Live verification against an actual Docker host is a follow-up before relying on the "anthropic" agent type in production. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(provider): add native Anthropic Messages API adapter (Phase 2)

2026-07-03T09:03:13+00:00

Adds internal/provider/anthropic, the first genuinely new provider.Provider implementation on top of Phase 1's provider-neutral tool-use loop, alongside (not replacing) the existing Docker/CLI-subprocess ContainerRunner path for the "claude" agent type: - internal/provider/anthropic: translates the neutral ChatRequest/ChatResponse shape to/from Anthropic's Messages API content-block format (system as a top-level field, tool_use/tool_result blocks, no "tool" role -- tool results become user-role messages), with a per-model-prefix pricing table for CostUSD - internal/retry: IsRateLimitError additively extended to recognize Anthropic's rate_limit_error/overloaded_error/529 shapes - internal/config: RunnersConfig.Anthropic/AnthropicEnabled() gate - internal/cli/serve.go, run.go: register runners["anthropic"] as a NativeRunner when [providers.anthropic].api_key is set and enabled -- tracked as a distinct executions.agent="anthropic" budget bucket, separate from the CLI-subprocess "claude" runner even though both bill the same Anthropic account go build/vet/test -race all pass. No live Anthropic API key is available in this environment, so verification is via fake-httptest-server adapter tests (12 cases, incl. multi-turn tool_result round-trip and rate-limit error matching) plus a Pool/NativeRunner routing test proving agent.type: "anthropic" actually reaches the new provider. Live end-to-end verification against the real API is a follow-up once a key is configured. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

refactor(executor): extract provider-neutral tool-use loop (Phase 1)

2026-07-03T08:49:43+00:00

Splits LocalRunner's OpenAI-specific agentic loop into reusable, provider- agnostic pieces so later phases can add native Anthropic/OpenAI/Google/Groq/ OpenRouter adapters without duplicating the control flow: - internal/provider: neutral Provider/ChatRequest/ChatResponse types, plus an openaicompat adapter wrapping the existing internal/llm.Client unchanged - internal/sandbox: Sandbox interface + HostSandbox (git clone/push/cleanup, read_file/write_file/run_bash/glob), lifted verbatim from local.go/localtools.go - internal/agentloop: the extracted tool-use loop (request/response/tool- dispatch/loop, ask_user blocking, stream-json envelope, summary fallback) - internal/agentchannel: AgentChannel/SubtaskSpec/BlockedError/ErrAgentBlocked moved out of internal/executor so agentloop can use them without an import cycle; internal/executor re-exports via type aliases, so no call site changes - internal/executor/nativerunner.go: NativeRunner replaces LocalRunner, wiring agentloop.Loop + openaicompat + HostSandbox together - config.Providers map[string]ProviderConfig added (unused until Phase 2+) Zero intended behavior change: go test -race ./... passes across all packages, and end-to-end stream-json/summary/changestats output was verified byte-compatible against a fake OpenAI-compatible server. Adds test coverage for sandbox tool-dispatch (git clone/push, read/write/bash/glob) that LocalRunner never had. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

chore: remove stories/checker backend dead code

2026-06-05T09:21:32+00:00

Remove the dead stories/checker backend tracked as design debt in CLAUDE.md: - Delete internal/api/stories.go, stories_test.go, elaborate.go - Delete internal/task/story.go, story_test.go - Remove story/checker columns from storage schema and all CRUD methods - Remove checkStoryCompletion, spawnCheckerTask, triggerStoryDeploy, createValidationTask, checkValidationResult, ShipStory, CheckStoryCompletion from executor; simplify handleRunResult - Remove story/checker fields from task.Task (StoryID, AcceptanceCriteria, CheckerForTaskID, CheckerReport) - Remove story routes and geminiBinPath from API server - Move claudeJSONResult/extractJSON from elaborate.go to validate.go - Rename ensureStoryBranch -> ensureBranch in ContainerRunner - Fix git identity in bare-repo tests; fix initial-branch to use main Co-Authored-By: Claude Sonnet 4.6