summaryrefslogtreecommitdiff
path: root/internal
AgeCommit message (Collapse)Author
2026-05-03fix: atomic execution creation + RUNNING state transitionPeter Stone
Add CreateExecutionAndSetRunning to storage.DB and Store interface, replacing the two sequential CreateExecution/UpdateTaskState calls in executor.go. Eliminates the crash window where a task stays PENDING with an orphaned RUNNING execution record. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03fix: prevent SHIPPABLE stories and wrong READY state on failed tasksPeter Stone
Three related bugs fixed: 1. maybeUnblockParent: guard against promoting QUEUED leaf tasks (no subtasks) to READY. The vacuously-true 'all subtasks done' check was advancing tasks that stalled in QUEUED (due to a prior SQLite lock error) to READY on server restart via RecoverStaleBlocked, despite having only failed executions and no commits. 2. checkStoryCompletion: require COMPLETED (not just READY) for all top-level tasks before advancing a story to SHIPPABLE. READY means the checker agent is still pending or the task awaits human review; a story with READY tasks is not ready to ship. 3. handleAcceptTask: call CheckStoryCompletion after a task is accepted so stories with parent tasks (whose subtasks are all done and then the parent is manually accepted) can auto-advance to SHIPPABLE. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11fix: tie pool submissions to server lifecycle contextClaude Agent
Fix 1 (server.go): Replace context.Background() with s.ctx in handleAnswerQuestion, handleResumeTimedOutTask, and handleRunTask. Add a ctx field to Server (defaulting to context.Background()) and a SetContext method so the serve command can wire in the signal- cancellable lifecycle context. Fix 2 (serve.go): Call srv.SetContext(ctx) before StartHub so all pool submissions use the server's root context (already cancelled on SIGTERM/SIGINT). Pool.Shutdown and its wiring were already present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11cleanup: remove dead code (QuestionRegistry, changestats wrappers, scanner.Err)Claudomator Agent
Fix 1: Remove QuestionRegistry and related types (QuestionHandler, PendingQuestion) from question.go -- nothing reads Pool.Questions or uses the registry. Remove NewQuestionRegistry() call from NewPool and the Questions field from Pool. Remove the now-superfluous registry tests; keep stream/parse helpers which are still used by the claude runner. Fix 2: Check scanner.Err() after the parseStream loop so I/O errors from the scanner are not silently swallowed when streamErr is still nil. Fix 3: Delete internal/api/changestats.go -- the parseChangestatFromFile and parseChangestatFromOutput wrappers were only needed to support processResult(), which no longer calls them; they are unreachable dead code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04feat: acceptance_criteria per story task in elaboration and approvalPeter Stone
Add AcceptanceCriteria field to elaboratedStoryTask struct, update buildStoryElaboratePrompt schema and rules, pass the value through handleApproveStory into task.Task, and add a test verifying the field is persisted on approved story tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04fix: ShipStory — use context.Background() to avoid request context ↵Peter Stone
cancellation killing deploy
2026-04-04feat: story ship gate — explicit POST /api/stories/{id}/ship; remove ↵Peter Stone
auto-deploy - checkStoryCompletion now guards against re-running on already-SHIPPABLE stories and no longer auto-triggers triggerStoryDeploy on completion - New Pool.ShipStory method validates SHIPPABLE state then fires triggerStoryDeploy - POST /api/stories/{id}/ship route registered and handleShipStory handler added - Two new tests: 202 for SHIPPABLE story, 409 for non-SHIPPABLE story Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04fix: executor checker — close race, test flakiness, checker report for all ↵Peter Stone
failure states Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04feat: spawn checker task on READY; auto-accept on pass; attach report on failPeter Stone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04feat: add checker task columns, UpdateTaskCheckerReport, GetCheckerTaskPeter Stone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04feat: add AcceptanceCriteria, CheckerForTaskID, CheckerReport to Task structPeter Stone
2026-04-03feat: require repository_url on tasks; fix UpdateTask to persist it; fix ↵Peter Stone
cascade-retry test race Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03fix: remove drain-lock circuit breaker that halted all executions after 3 ↵Peter Stone
consecutive failures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29feat: register modal-shell project; document project registryPeter Stone
Adds modal-shell to SeedProjects() and documents how to find and edit seed.go in .agent/design.md.
2026-03-26fix: story stays PENDING when all subtasks complete but parent task stays QUEUEDPeter Stone
When a story is approved with pre-created subtasks, parent tasks are QUEUED but never run. Their subtasks complete, but: - maybeUnblockParent only handled BLOCKED parents, not QUEUED ones - checkStoryCompletion required ALL tasks (incl. subtasks) to be done Fixes: - maybeUnblockParent now also promotes QUEUED parents to READY when all subtasks are COMPLETED - checkStoryCompletion only checks top-level tasks (parent_task_id="") - RecoverStaleBlocked now also scans QUEUED parents on startup and triggers checkStoryCompletion if it promotes them - Add QUEUED→READY to valid state transitions (subtask delegation path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26feat: graceful shutdown — drain workers before exit (default 3m timeout)Peter Stone
- Add workerWg to Pool; Shutdown() closes workCh and waits for all in-flight execute/executeResume goroutines to finish - Signal handler now shuts down HTTP first, then drains the pool - ShutdownTimeout config field (toml: shutdown_timeout); default 3m - Tests: WaitsForWorkers and TimesOut Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: ensure story branch exists before cloning at task startPeter Stone
Add ensureStoryBranch() that runs git ls-remote to check, then clones into a temp dir to create and push the branch if missing. Called before the task's own clone so checkout is guaranteed to succeed. Removes the post-checkout fallback hack added in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: auto-create story branch if missing at clone timePeter Stone
If git checkout of the story branch fails (branch never pushed to bare repo), create it from HEAD and push to origin instead of hard-failing the task. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26feat: cascade retry deps when running a task with failed dependenciesPeter Stone
When /run is called on a CANCELLED/FAILED task that has deps in a terminal failure state, automatically reset and resubmit those deps so the task isn't immediately re-cancelled by the pool's dep check. Also update reset-failed-tasks script to handle CANCELLED tasks and clean up preserved sandbox workspaces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: cancel waiting tasks when dep hits terminal failure (QUEUED→CANCELLED)Peter Stone
QUEUED→FAILED is not a valid state transition. When a dependency enters a terminal failure state, cancel the waiting task instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: story branch push to bare repo; drain at 3 consecutive failuresPeter Stone
createStoryBranch was pushing to 'origin' which doesn't exist — branches never landed in the bare repo so agents couldn't clone them. Now uses the project's RemoteURL (bare repo path) directly for fetch and push. Raise drain threshold from 2 to 3 consecutive failures to reduce false positives from transient errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26test: add TestPool_DependsOn_NoDeadlockstory/repo-urlPeter Stone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: resolve dep-chain deadlock; broadcast task_started for UI visibilityPeter Stone
With maxPerAgent=1, tasks with DependsOn were entering waitForDependencies while holding the per-agent slot, preventing the dependency from ever running. Fix: check deps before taking the slot. If not ready, requeue without holding activePerAgent. Also accept StateReady (leaf tasks) as a satisfied dependency, not just StateCompleted. Add startedCh to pool and broadcast task_started WebSocket event when a task transitions to RUNNING, so the UI immediately shows the running state during the clone phase instead of waiting for completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: expose drained state in agent status API; fix AgentEvent JSON casingPeter Stone
AgentStatusInfo was missing drained field so UI couldn't show drain lock. AgentEvent had no JSON tags so ev.agent/event/timestamp were undefined in the stats timeline. UI now shows "Drain locked" card state with undrain CTA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: set RepositoryURL on tasks created via story approvePeter Stone
handleApproveStory was creating tasks without RepositoryURL, causing executions to fail with "task has no repository_url". Now looks up the project's RemoteURL and sets it on all tasks and subtasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26fix: story tasks get Project field; elaborate reads worklog; deploy chmod ↵Peter Stone
scripts - handleApproveStory: set Project = input.ProjectID on tasks and subtasks so the executor can resolve RepositoryURL from the project registry (was causing "task has no repository_url" on every story task) - elaborate.go: read .agent/worklog.md instead of SESSION_STATE.md for project context injected into elaboration prompts - deploy: explicitly chmod +x all scripts before restart (same root cause as the binary execute-bit loss — chown -R was stripping it) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25chore: replace all master branch references with mainPeter Stone
- executor.go: merge story branch to main before deploy - container.go: error messages reference git push origin main - api/stories.go: create story branch from origin/main (drop master fallback) - executor_test.go: test setup uses main branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24feat: merge story branch to master before deploy, add doot project to registryPeter Stone
- triggerStoryDeploy: fetch/checkout/merge --no-ff/push before running deploy script (ADR-007) - executor_test: TestPool_StoryDeploy_MergesStoryBranch proves merge happens - seed.go: add doot project with deploy script; wire claudomator deploy script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24Merge branch 'master' of /site/git.terst.org/repos/claudomatorPeter Stone
2026-03-24feat: validation result transitions story to REVIEW_READY or NEEDS_FIX (ADR-007)Claudomator Agent
Add checkValidationResult which inspects the final task.State of a completed validation task and updates the story to REVIEW_READY (pass) or NEEDS_FIX (fail). Wire into handleRunResult so stories in VALIDATING state are dispatched to checkValidationResult instead of checkStoryCompletion, covering both success and FAILED terminal paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24feat: auto-create validation task on story DEPLOYED (ADR-007)Claudomator Agent
2026-03-24feat: trigger deploy script on SHIPPABLE → DEPLOYED (ADR-007)Claudomator Agent
Add triggerStoryDeploy to Pool: fetches story's project, runs its DeployScript via exec.CommandContext, and advances story to DEPLOYED on success. Wire into checkStoryCompletion with go p.triggerStoryDeploy after the SHIPPABLE transition. Covered by TestPool_StoryDeploy_RunsDeployScript. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24fix: resolve merge conflict — integrate agent's story-aware ContainerRunnerPeter Stone
Agent added: Store on ContainerRunner (direct story/project lookup), --reference clone for speed, explicit story branch push, checkStoryCompletion → SHIPPABLE. My additions: BranchName on Task as fallback when Store is nil, tests updated to match checkout-after-clone approach. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24feat: clone story branch in ContainerRunner (ADR-007)Peter Stone
- Add BranchName field to task.Task (populated from story at execution time) - Add GetStory to executor Store interface; resolve BranchName from story in both execute() and executeResume() parallel to RepositoryURL resolution - Pass --branch <name> to git clone when BranchName is set; default clone otherwise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23feat: Phase 4 — story-aware execution, branch clone, story completion ↵Claudomator Agent
check, deployment status - ContainerRunner: add Store field; clone with --reference when story has a local project path; checkout story branch after clone; push to story branch instead of HEAD - executor.Store interface: add GetStory, ListTasksByStory, UpdateStoryStatus - Pool.handleRunResult: trigger checkStoryCompletion when a story task succeeds - Pool.checkStoryCompletion: transitions story to SHIPPABLE when all tasks done - serve.go: wire Store into each ContainerRunner - stories.go: update createStoryBranch to fetch+checkout from origin/master base; add GET /api/stories/{id}/deployment-status endpoint - server.go: register deployment-status route - Tests: TestPool_CheckStoryCompletion_AllComplete/PartialComplete, TestHandleStoryDeploymentStatus Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23feat: populate RepositoryURL from project registry in executor (ADR-007)Peter Stone
- Add GetProject to Store interface used by executor - Resolve RepositoryURL from project registry when task.RepositoryURL is empty - Call SeedProjects at server startup so the project registry is populated - Add GetProject stub to minimalMockStore in executor tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22feat: color logo based on build version hashPeter Stone
Adds GET /api/version endpoint and uses the first 6 hex chars of the commit hash to derive an HSL hue for the header h1 logo color. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22feat: Phase 5 — story elaboration endpoint, approve flow, branch creationClaudomator Agent
- POST /api/stories/elaborate: runs Claude/Gemini against project LocalPath to produce a structured story plan (name, branch_name, tasks, validation) - POST /api/stories/approve: creates story + sequentially-wired tasks/subtasks from the elaborate output and pushes the story branch to origin - createStoryBranch helper: git checkout -b + push -u origin - Tests: TestBuildStoryElaboratePrompt, TestHandleStoryApprove_WiresDepends Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22feat: surface error_msg on failed task cards in UIPeter Stone
2026-03-22feat: Phase 3 — stories data model, ValidStoryTransition, storage CRUD, ↵Claudomator Agent
API endpoints - internal/task/story.go: Story struct, StoryState constants, ValidStoryTransition - internal/task/task.go: add StoryID field - internal/storage/db.go: stories table + story_id on tasks migrations; CreateStory, GetStory, ListStories, UpdateStoryStatus, ListTasksByStory; update all task SELECT/INSERT to include story_id; scanTask extended with sql.NullString for story_id; added modernc timestamp format to GetMaxUpdatedAt - internal/storage/sqlite_cgo.go + sqlite_nocgo.go: build-tag based driver selection (mattn/go-sqlite3 with CGO, modernc.org/sqlite pure-Go fallback) so tests run without a C compiler - internal/api/stories.go: GET/POST /api/stories, GET /api/stories/{id}, GET/POST /api/stories/{id}/tasks (auto-wires depends_on chain), PUT /api/stories/{id}/status (validates transition) - internal/api/server.go: register all story routes - go.mod/go.sum: add modernc.org/sqlite pure-Go dependency Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22fix: make requeueDelay configurable to fix test timeout in ↵Peter Stone
TestPool_MaxPerAgent_BlocksSecondTask
2026-03-21feat: executor reliability — per-agent limit, drain gate, pre-flight ↵Claudomator Agent
creds, auth recovery - maxPerAgent=1: only 1 in-flight execution per agent type at a time; excess tasks are requeued after 30s - Drain gate: after 2 consecutive failures the agent is drained and a question is set on the task; reset on first success; POST /api/pool/agents/{agent}/undrain to acknowledge - Pre-flight credential check: verify .credentials.json and .claude.json exist in agentHome before spinning up a container - Auth error auto-recovery: detect auth errors (Not logged in, OAuth token has expired, etc.) and retry once after running sync-credentials and re-copying fresh credentials - Extracted runContainer() helper from ContainerRunner.Run() to support the retry flow - Wire CredentialSyncCmd in serve.go for all three ContainerRunner instances - Tests: TestPool_MaxPerAgent_*, TestPool_ConsecutiveFailures_*, TestPool_Undrain_*, TestContainerRunner_Missing{Credentials,Settings}_FailsFast, TestIsAuthError_*, TestContainerRunner_AuthError_SyncsAndRetries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21fix: copy .claude.json (account settings) alongside credentialsPeter Stone
ClaudeConfigDir moved from /root/.claude to credentials/claude/, but container.go was still deriving .claude.json from filepath.Dir which no longer pointed anywhere useful. Claude CLI needs .claude.json for OAuth account info or it says "Not logged in". Also update sync-credentials to copy /root/.claude.json into the credentials dir so it stays fresh alongside the token. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21feat: fail loudly when agent leaves uncommitted work in sandboxPeter Stone
After a successful run with no commits pushed, detectUncommittedChanges checks for modified tracked files and untracked source files. If any exist the task fails with an explicit error rather than silently succeeding while the work evaporates when the sandbox is deleted. Scaffold files written by the harness (.claudomator-env, .claudomator-instructions.txt, .agent-home/) are excluded from the check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21feat: Phase 2 — project registry, legacy field cleanup, credential path fixPeter Stone
- task.Project type + storage CRUD + UpsertProject + SeedProjects - Remove AgentConfig.ProjectDir, RepositoryURL, SkipPlanning - Remove ContainerRunner fallback git init logic - Project API endpoints: GET/POST /api/projects, GET/PUT /api/projects/{id} - processResult no longer extracts changestats (pool-side only) - claude_config_dir config field; default to credentials/claude/ - New scripts: sync-credentials, fix-permissions, check-token Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21fix: use configured claude_config_dir for container credentialsPeter Stone
The server runs as www-data whose HOME is /var/www — deriving credentials from $HOME/.claude always produced an empty path. Now reads from ClaudeConfigDir (default: /workspace/claudomator/credentials/claude), which sync-credentials keeps populated with fresh OAuth tokens. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19feat: add errors, throughput, and billing sections to stats dashboardPeter Stone
- GET /api/stats?window=7d: pre-aggregated SQL queries for errors, throughput, billing - Errors section: category summary (quota/rate_limit/timeout/git/failed) + failure table - Throughput section: stacked hourly bar chart (completed/failed/other) over 7d - Billing section: KPIs (7d total, avg/day, cost/run) + daily cost bar chart Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19feat: agent status dashboard with availability timeline and Gemini quota ↵Peter Stone
detection - Detect Gemini TerminalQuotaError (daily quota) as BUDGET_EXCEEDED, not generic FAILED - Surface container stderr tail in error so quota/rate-limit classifiers can match it - Add agent_events table to persist rate-limit start/recovery events across restarts - Add GET /api/agents/status endpoint returning live agent state + 24h event history - Stats dashboard: agent status cards, 24h availability timeline, per-run execution table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18feat: containerized execution with agent tooling and deployment fixesPeter Stone
- ContainerRunner replaces ClaudeRunner/GeminiRunner; all agent types run in Docker containers via claudomator-agent:latest - Writable agentHome staging dir (/home/agent) satisfies home-dir requirements for both claude and gemini CLIs without exposing host creds - Copy .credentials.json and .claude.json into staging dir at run time; GEMINI_API_KEY passed via env file - Fix git clone: remove MkdirTemp-created dir before cloning (git rejects pre-existing dirs even when empty) - Replace localhost with host.docker.internal in APIURL so container can reach host API; add --add-host=host.docker.internal:host-gateway - Run container as --user=$(uid):$(gid) so host-owned workspace files are readable; chmod workspace 0755 and instructions file 0644 after clone - Pre-create .gemini/ in staging dir to avoid atomic-rename ENOENT on first gemini-cli run - Add ct CLI tool to container image: pre-built Bash wrapper for Claudomator API (ct task submit/create/run/wait/status/list) - Document ct tool in CLAUDE.md agent instructions section - Add drain-failed-tasks script: retries failed tasks on a 5-minute interval - Update Dockerfile: Node 22 via NodeSource, Go 1.24, gemini-cli, git safe.directory=*, default ~/.claude.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18fix: address final container execution issues and cleanup review docsPeter Stone