claudomator.git/internal/api, branch main

fix(api): reject accepting a BLOCKED task directly

2026-07-09T04:07:04+00:00

The nested-subtask-completion fix (ad00e18) necessarily added BLOCKED -> COMPLETED to task.ValidTransition so executor.Pool.maybeUnblockParent can complete a subtask whose own children all finished. That widened acceptTask's gate (shared by POST /api/tasks/{id}/accept and the chatbot MCP accept_task tool, which validates only via task.ValidTransition) to also permit accepting ANY BLOCKED task directly -- one still awaiting ask_user, or with genuinely incomplete subtasks -- bypassing the completion invariant this session's fix was built to protect. Caught by local subagent review of that fix. Not live in production: the deployed claudomator service is still on an earlier commit.

feat(api): add create_story/get_story/list_stories/accept_story to chatbot MCP

2026-07-08T22:19:49+00:00

refactor(api): extract acceptStory service-layer helper, mirroring acceptTask

2026-07-08T22:05:23+00:00

test(api): add missing ReportVerdict stub to fakeAgentChannel

2026-07-08T09:21:19+00:00

Task 4's plan scope covered agentchannel.go/channel.go/channel_test.go but missed this second AgentChannel test double, which broke `go test ./...` (compile failure) even though `go build ./...` was clean.

fix: task submission context cancellation, executions success-rate calc

2026-07-05T23:51:58+00:00

submitTask was using the inbound HTTP/MCP request context to submit work to the executor pool, so every chatbot/REST-submitted task was cancelled the moment the request returned rather than actually running. Use the server's long-lived lifecycle context instead, matching the other Submit call sites. Separately, computeExecutionStats compared execution state against a lowercase 'completed' literal while the backend always emits uppercase state values, so success rate silently read 0% for every execution. Also treat READY as success alongside COMPLETED, since a top-level task's execution lands at READY on success — COMPLETED requires a later, separate accept step. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01VTUSAEKfsPc6WGDq45yPHD

feat(web,api): add Budget/Roles dashboard -- final phase of the harness redesign (Phase 9b)

2026-07-04T08:50:46+00:00

Two new tabs, matching Phase 9a's conventions: Budget (per-provider spend meters, escalation funnel, spend-over-time) and Roles (version history, draft activation, readable escalation-ladder view) -- the human-facing surface for the token-husbanding value proposition and the Phase 5/8 versioned role-config system. New backend (minimal, additive, matching existing endpoint conventions -- unauthenticated like /api/budget and /api/roles/*): - internal/storage/dashboard.go: QueryEscalationFunnel (executions grouped by escalation_rung + agent, count + cost) and QuerySpendTimeseries (cost per provider bucketed hourly/daily, mirroring QueryDashboardStats' existing bucketing expressions). Documented honestly: rung 0 is not exclusively "resolved locally" -- escalation_rung defaults to 0 uniformly, so non-role-typed executions (which never climb a ladder) land there too, alongside role-typed tasks genuinely resolved at tier 0. A role-only variant would need a join against tasks.config_json with no queryable role column on executions -- intentionally out of scope. - internal/api/dashboard.go: GET /api/escalation-funnel, GET /api/spend-timeseries (both ?window=5h|24h|7d|, default 24h). - internal/storage.ListRoleNames + GET /api/roles: the "which roles exist" gap -- there was no way to discover role names before this, only to list versions for a role you already knew the name of. Chart-form decisions (dataviz skill, invoked before writing chart code): horizontal stacked bar for the escalation funnel (rung order already encodes the funnel shape positionally; color only needed for per-provider segments within each rung); multi-line for spend-over-time; a fixed-order categorical palette from the skill's validated palette.md slots for provider identity (re-validated against this app's dark surface, passing); a separate status (good/warning/critical) palette for budget meters, deliberately distinct from both --state-* and the provider palette. Caught and fixed a real bug during visual QA: converging near-zero end-labels on the spend chart were overlapping (an anti-pattern the skill explicitly flags) -- fixed with a 14px minimum-gap check before direct-labeling an endpoint, leaning on the legend/tooltip otherwise. Verified with a real running server, real seeded data (65 executions across rungs 0-2 with a realistic provider mix, 2 roles with active/draft/retired role_configs versions written directly via internal/storage), and a real headless-browser session (reusing Phase 9a's Chromium/proxy scaffinding): confirmed correct rung totals/percentages, provider legends, a 3-line spend chart, live window-selector re-render, correct active-version highlighting on the role panel, and a real Activate click on a draft version -- verified via both DOM re-render and a direct backend GET that it truly persisted. go build/vet/test -race -count=1 all pass, full suite. node --test web/test/*.mjs: 291/291 passing (16 new). Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(story,role): add retro ceremony -- closes the self-improvement loop (Phase 8)

2026-07-04T05:05:36+00:00

The final mechanism the versioned role-config model (Phase 5) was built for: when a story reaches DONE, StoryOrchestrator spawns a retro-role task that reflects on the story's full history and proposes draft role_configs versions for a human to review and activate via the existing (unchanged) POST /api/roles/{role}/activate. - AgentChannel gains a 6th method, ProposeRoleConfig(ctx, role.RoleConfig) (version, err), following ProposeEpic's precedent (Phase 7c): a structured tool call, not summary-parsing. storeChannel.ProposeRoleConfig calls the same Store.CreateRoleConfig the human-facing POST /api/roles/{role}/versions endpoint already uses (proposed_by: "retro"), landing a new draft row without touching whatever's currently active. Wired through both transports exactly like ProposeEpic: internal/agentloop/tools.go (native loop) and internal/executor/agentmcp.go (MCP). - StoryOrchestrator.Tick now routes a story at status DONE to a new processRetro stage instead of processStory -- a sibling stage, not a continuation, since the Builder->Evaluators->Arbitration chain is long settled by then. processRetro only *reads* that settled pipeline (read-only findEvaluators/findArbitration counterparts to ensureEvaluators/ensureArbitration -- it never spawns/mutates Builder-pipeline tasks) to locate the Arbitration task the retro task depends on, then spawns (idempotently -- checks for an existing retro-role dependent first) one retro-role task with instructions assembled from the story's spec/acceptance-criteria, full task tree, per- task cost/escalation history, active role_configs per role encountered, and the story's own event stream (evaluator verdicts, arbitration decision). - event.KindRetroCaptured (attached to the story's ID, matching KindEvalVerdict/KindArbitrationDecided's convention) fires once the retro task completes (auto-accepted like every other pipeline task), aggregating every event.KindRoleConfigProposed the retro task recorded (one per propose_role_config call) into {task_id, proposals: [{role, version}], summary} -- the summary is the "capturing lessons" half of this ceremony, the proposals are the versioned-config half. - Human activation is completely untouched: drafts land through the identical CreateRoleConfig/config_json path Phase 5's endpoints already handle, confirmed via existing role-endpoint tests passing unmodified. go build/vet/test -race -count=1 all pass, full suite (20 packages) -- one run hit a known, pre-existing, intermittent flake under full-suite load (unrelated to this phase's files) that did not reproduce on two immediate reruns, both in isolation and full-suite. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(story,scheduler): add epic-proposal tool + AskUser-timeout escalation (Phase 7c)

2026-07-04T04:39:28+00:00

Two independent pieces, completing Phase 7. Epic-proposal tool: AgentChannel gains a 5th method, ProposeEpic(ctx, EpicProposal{Name, Description, StoryIDs}) (epicID, err), implemented on storeChannel -- matches an existing epic by exact name or creates one (DiscoverySource: "agent"), sets epic_id on each resolvable story (skips, doesn't fail, on an unresolved ID), emits KindEpicProposed attached to the epic's own ID with payload {epic_id, name, story_ids}. Wired into both transports exactly like Phase 6 wired role into spawn_subtask: a new propose_epic tool in the native tool-use loop (internal/agentloop/tools.go) and the MCP transport (internal/executor/agentmcp.go). This is the mechanism for a discovery/planner-role agent to act on its own judgment that several stories it's been given form one cohesive initiative -- the judgment itself lives in the calling agent's instructions/model, not in this code. AskUser-timeout escalation: extends the existing Scheduler (Phase 5's retry-then-escalate watcher) rather than adding a new component, since "stuck task needs escalation" is exactly what it already does. Finds role-typed BLOCKED tasks whose question has been outstanding longer than SchedulerConfig.AskUserTimeoutSeconds (default 10 minutes) using task.UpdatedAt as the outstanding-since timestamp -- no new column needed, since UpdateTaskQuestion already stamps it the instant a question is recorded and nothing else touches the row while BLOCKED. Resolves the next ladder tier from the latest execution's EscalationRung, records the system-authored fallback answer as an audit-trail task.Interaction, clears the question, sets the new tasks.needs_review flag, emits KindEscalated (now carrying a trigger field: "failure" vs "ask_user_timeout" for the existing failure-retry path vs this one), and resumes via Pool.SubmitResume at the escalated tier -- degrading to same-tier resume with final:true if the ladder's exhausted or no role config exists, since unblocking the task takes priority over having somewhere higher to escalate to. GET /api/tasks?needs_review=true surfaces auto-decided tasks for human review. go build/vet/test -race -count=1 all pass, full suite (20 packages), run twice to rule out flakiness in the new tests. (One pre-existing, unrelated test -- TestHandleRunTask_CascadesRetryToFailedDeps, a tempdir-cleanup race -- appeared once under full-suite load per the implementing agent's report and did not reproduce in this verification's runs either; not a regression from this work.) Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(story): add StoryOrchestrator -- Builder->Evaluators->Arbitration->accept (Phase 7b)

2026-07-04T04:08:41+00:00

A deterministic, poll-based watcher (internal/scheduler.StoryOrchestrator, sibling to the Phase 5 Scheduler) that drives a story.Story through its execution pipeline, rather than relying on an LLM agent to correctly orchestrate its own fan-out via tool calls. Mechanism: polling, not a handleRunResult hook. Every task the orchestrator watches (a story's root/Builder task, 4 Evaluators, Arbitration) is top-level (no ParentTaskID), and executor.Pool.handleRunResult only ever lands a top-level task at READY or BLOCKED -- never COMPLETED directly, since that transition normally requires a human/chatbot POST /api/tasks/{id}/accept in a different package. A handleRunResult hook would never observe it; polling doesn't care how/whether a task reached a given state. Stages: Builder COMPLETED -> spawn 4 role-typed Evaluator tasks (evaluator_quality/security/correctness/performance, DependsOn: [builder], no ParentTaskID -- true DAG siblings, not delegated subtasks) + story -> VALIDATING. Each Evaluator COMPLETED -> emit KindEvalVerdict (attached to the story's ID, so one GET /api/stories/{id}/events call surfaces every verdict). All 4 Evaluators COMPLETED -> spawn 1 Arbitration task (role: planner, DependsOn: all 4 evaluator IDs). Arbitration COMPLETED -> emit KindArbitrationDecided, story -> REVIEW_READY. POST /api/stories/{id}/accept (mirrors handleAcceptTask) -> DONE, emits KindHumanAccepted. Fixes a gap caught before merging: since none of Builder/Evaluators/ Arbitration have a ParentTaskID, none of them auto-complete -- each would otherwise need a separate manual /api/tasks/{id}/accept, meaning 6 human clicks per story before ever reaching the intended single story-level gate. StoryOrchestrator.autoAccept now transitions each of these specific tasks READY->COMPLETED itself (via the same validated Store.UpdateTaskState path acceptTask uses), scoped only to tasks already established as part of a story's pipeline (root task, or role-matched dependents from ensureEvaluators/ensureArbitration) -- never a blanket sweep of unrelated READY tasks. This makes POST /api/stories/{id}/accept the system's only required human touchpoint for the whole chain, matching the design goal that story (not task/subtask) is the human-interaction atom. Idempotency: structural for task-creation stages (ensureEvaluators/ ensureArbitration check ListDependents for already-existing role-matched tasks before creating -- crash/restart-safe); story.Status=="VALIDATING" gates the Arbitration->REVIEW_READY write (nothing further downstream to check structurally there); an in-memory handledVerdicts set (mirrors Scheduler.handled) dedupes per-evaluator KindEvalVerdict emission across poll ticks, resetting harmlessly on restart. Documented simplification: finalizeArbitration never parses the Arbitration summary for approve/reject -- always routes to REVIEW_READY; NEEDS_FIX is manually settable via PUT /api/stories/{id}. A later phase could close this with a dedicated verdict-reporting AgentChannel method instead of parsing free text. go build/vet/test -race -count=1 all pass, full suite (20 packages). Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs

feat(story): add Epic/Story data model + REST CRUD (Phase 7a)

2026-07-03T23:46:26+00:00

Pure data model + CRUD -- no orchestration behavior yet (that's Phase 7b). Revives ADR-007's stories concept (adapted; ADR-007 itself was deleted as "more machinery than the usage pattern needed") while staying lean: reuses the existing events/projects/task-tree machinery rather than building a parallel hierarchy. - internal/story: Epic/Story types. Epic is a loosely-scoped initiative (OPEN/CLOSED, no execution semantics). Story is a shippable slice of work realized by a task tree rooted at root_task_id; epic_id/project_id/ root_task_id are loose references (no FK enforcement, matching the codebase's existing tolerance for unmatched reference strings like tasks.project). - storage: new epics/stories tables (additive migrations) + CRUD methods. Story status is intentionally unvalidated pass-through in this phase -- no task.ValidTransition-style state machine, since there's no orchestrator yet to make transitions meaningful. - event: 9 new Kind constants for the ceremony this layer will eventually drive (epic_proposed, discovery_proposed, framing_decided, groomed, prioritized, eval_verdict, arbitration_decided, retro_captured, human_accepted) -- defined but nothing emits them yet. - api: GET/POST /api/epics, GET/PUT /api/epics/{id}, GET /api/epics/{id}/stories, GET/POST /api/stories, GET/PUT /api/stories/{id}, and GET /api/stories/{id}/task-tree -- BFS walk from root_task_id following both parent_task_id children and depends_on-edge dependents (visited-set guarded), returning a flat node list for a later UI phase to render as a graph. Unauthenticated, matching the existing projects/tasks endpoints' posture. go build/vet/test -race -count=1 all pass, full suite. Co-Authored-By: Claude Sonnet 5 Claude-Session: https://claude.ai/code/session_01V1moSNCJRcP6kykA4tyUSs