| Age | Commit message (Collapse) | Author |
|
Add CreateExecutionAndSetRunning to storage.DB and Store interface,
replacing the two sequential CreateExecution/UpdateTaskState calls in
executor.go. Eliminates the crash window where a task stays PENDING
with an orphaned RUNNING execution record.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Three related bugs fixed:
1. maybeUnblockParent: guard against promoting QUEUED leaf tasks (no
subtasks) to READY. The vacuously-true 'all subtasks done' check was
advancing tasks that stalled in QUEUED (due to a prior SQLite lock
error) to READY on server restart via RecoverStaleBlocked, despite
having only failed executions and no commits.
2. checkStoryCompletion: require COMPLETED (not just READY) for all
top-level tasks before advancing a story to SHIPPABLE. READY means
the checker agent is still pending or the task awaits human review;
a story with READY tasks is not ready to ship.
3. handleAcceptTask: call CheckStoryCompletion after a task is accepted
so stories with parent tasks (whose subtasks are all done and then
the parent is manually accepted) can auto-advance to SHIPPABLE.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Fix 1 (server.go): Replace context.Background() with s.ctx in
handleAnswerQuestion, handleResumeTimedOutTask, and handleRunTask.
Add a ctx field to Server (defaulting to context.Background()) and
a SetContext method so the serve command can wire in the signal-
cancellable lifecycle context.
Fix 2 (serve.go): Call srv.SetContext(ctx) before StartHub so all
pool submissions use the server's root context (already cancelled
on SIGTERM/SIGINT). Pool.Shutdown and its wiring were already present.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Fix 1: Remove QuestionRegistry and related types (QuestionHandler, PendingQuestion)
from question.go -- nothing reads Pool.Questions or uses the registry. Remove
NewQuestionRegistry() call from NewPool and the Questions field from Pool.
Remove the now-superfluous registry tests; keep stream/parse helpers which are
still used by the claude runner.
Fix 2: Check scanner.Err() after the parseStream loop so I/O errors from the
scanner are not silently swallowed when streamErr is still nil.
Fix 3: Delete internal/api/changestats.go -- the parseChangestatFromFile and
parseChangestatFromOutput wrappers were only needed to support processResult(),
which no longer calls them; they are unreachable dead code.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add AcceptanceCriteria field to elaboratedStoryTask struct, update
buildStoryElaboratePrompt schema and rules, pass the value through
handleApproveStory into task.Task, and add a test verifying the
field is persisted on approved story tasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
cancellation killing deploy
|
|
auto-deploy
- checkStoryCompletion now guards against re-running on already-SHIPPABLE stories
and no longer auto-triggers triggerStoryDeploy on completion
- New Pool.ShipStory method validates SHIPPABLE state then fires triggerStoryDeploy
- POST /api/stories/{id}/ship route registered and handleShipStory handler added
- Two new tests: 202 for SHIPPABLE story, 409 for non-SHIPPABLE story
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
failure states
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
cascade-retry test race
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
consecutive failures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Adds modal-shell to SeedProjects() and documents how to find
and edit seed.go in .agent/design.md.
|
|
When a story is approved with pre-created subtasks, parent tasks are
QUEUED but never run. Their subtasks complete, but:
- maybeUnblockParent only handled BLOCKED parents, not QUEUED ones
- checkStoryCompletion required ALL tasks (incl. subtasks) to be done
Fixes:
- maybeUnblockParent now also promotes QUEUED parents to READY when all
subtasks are COMPLETED
- checkStoryCompletion only checks top-level tasks (parent_task_id="")
- RecoverStaleBlocked now also scans QUEUED parents on startup and
triggers checkStoryCompletion if it promotes them
- Add QUEUED→READY to valid state transitions (subtask delegation path)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add workerWg to Pool; Shutdown() closes workCh and waits for all
in-flight execute/executeResume goroutines to finish
- Signal handler now shuts down HTTP first, then drains the pool
- ShutdownTimeout config field (toml: shutdown_timeout); default 3m
- Tests: WaitsForWorkers and TimesOut
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add ensureStoryBranch() that runs git ls-remote to check, then clones
into a temp dir to create and push the branch if missing. Called before
the task's own clone so checkout is guaranteed to succeed.
Removes the post-checkout fallback hack added in the previous commit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
If git checkout of the story branch fails (branch never pushed to bare
repo), create it from HEAD and push to origin instead of hard-failing
the task.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
When /run is called on a CANCELLED/FAILED task that has deps in a terminal
failure state, automatically reset and resubmit those deps so the task
isn't immediately re-cancelled by the pool's dep check.
Also update reset-failed-tasks script to handle CANCELLED tasks and clean
up preserved sandbox workspaces.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
QUEUED→FAILED is not a valid state transition. When a dependency enters a
terminal failure state, cancel the waiting task instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
createStoryBranch was pushing to 'origin' which doesn't exist — branches
never landed in the bare repo so agents couldn't clone them. Now uses
the project's RemoteURL (bare repo path) directly for fetch and push.
Raise drain threshold from 2 to 3 consecutive failures to reduce false
positives from transient errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
With maxPerAgent=1, tasks with DependsOn were entering waitForDependencies
while holding the per-agent slot, preventing the dependency from ever running.
Fix: check deps before taking the slot. If not ready, requeue without holding
activePerAgent. Also accept StateReady (leaf tasks) as a satisfied dependency,
not just StateCompleted.
Add startedCh to pool and broadcast task_started WebSocket event when a task
transitions to RUNNING, so the UI immediately shows the running state during
the clone phase instead of waiting for completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
AgentStatusInfo was missing drained field so UI couldn't show drain lock.
AgentEvent had no JSON tags so ev.agent/event/timestamp were undefined in
the stats timeline. UI now shows "Drain locked" card state with undrain CTA.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
handleApproveStory was creating tasks without RepositoryURL, causing
executions to fail with "task has no repository_url". Now looks up
the project's RemoteURL and sets it on all tasks and subtasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
scripts
- handleApproveStory: set Project = input.ProjectID on tasks and subtasks so
the executor can resolve RepositoryURL from the project registry (was causing
"task has no repository_url" on every story task)
- elaborate.go: read .agent/worklog.md instead of SESSION_STATE.md for project
context injected into elaboration prompts
- deploy: explicitly chmod +x all scripts before restart (same root cause as
the binary execute-bit loss — chown -R was stripping it)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- executor.go: merge story branch to main before deploy
- container.go: error messages reference git push origin main
- api/stories.go: create story branch from origin/main (drop master fallback)
- executor_test.go: test setup uses main branch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- triggerStoryDeploy: fetch/checkout/merge --no-ff/push before running deploy script (ADR-007)
- executor_test: TestPool_StoryDeploy_MergesStoryBranch proves merge happens
- seed.go: add doot project with deploy script; wire claudomator deploy script
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
Add checkValidationResult which inspects the final task.State of a
completed validation task and updates the story to REVIEW_READY (pass)
or NEEDS_FIX (fail). Wire into handleRunResult so stories in
VALIDATING state are dispatched to checkValidationResult instead of
checkStoryCompletion, covering both success and FAILED terminal paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
Add triggerStoryDeploy to Pool: fetches story's project, runs its
DeployScript via exec.CommandContext, and advances story to DEPLOYED on
success. Wire into checkStoryCompletion with go p.triggerStoryDeploy
after the SHIPPABLE transition. Covered by TestPool_StoryDeploy_RunsDeployScript.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Agent added: Store on ContainerRunner (direct story/project lookup), --reference
clone for speed, explicit story branch push, checkStoryCompletion → SHIPPABLE.
My additions: BranchName on Task as fallback when Store is nil, tests updated
to match checkout-after-clone approach.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add BranchName field to task.Task (populated from story at execution time)
- Add GetStory to executor Store interface; resolve BranchName from story in both
execute() and executeResume() parallel to RepositoryURL resolution
- Pass --branch <name> to git clone when BranchName is set; default clone otherwise
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
check, deployment status
- ContainerRunner: add Store field; clone with --reference when story has a
local project path; checkout story branch after clone; push to story branch
instead of HEAD
- executor.Store interface: add GetStory, ListTasksByStory, UpdateStoryStatus
- Pool.handleRunResult: trigger checkStoryCompletion when a story task succeeds
- Pool.checkStoryCompletion: transitions story to SHIPPABLE when all tasks done
- serve.go: wire Store into each ContainerRunner
- stories.go: update createStoryBranch to fetch+checkout from origin/master base;
add GET /api/stories/{id}/deployment-status endpoint
- server.go: register deployment-status route
- Tests: TestPool_CheckStoryCompletion_AllComplete/PartialComplete,
TestHandleStoryDeploymentStatus
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add GetProject to Store interface used by executor
- Resolve RepositoryURL from project registry when task.RepositoryURL is empty
- Call SeedProjects at server startup so the project registry is populated
- Add GetProject stub to minimalMockStore in executor tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Adds GET /api/version endpoint and uses the first 6 hex chars of the
commit hash to derive an HSL hue for the header h1 logo color.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- POST /api/stories/elaborate: runs Claude/Gemini against project LocalPath
to produce a structured story plan (name, branch_name, tasks, validation)
- POST /api/stories/approve: creates story + sequentially-wired tasks/subtasks
from the elaborate output and pushes the story branch to origin
- createStoryBranch helper: git checkout -b + push -u origin
- Tests: TestBuildStoryElaboratePrompt, TestHandleStoryApprove_WiresDepends
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
API endpoints
- internal/task/story.go: Story struct, StoryState constants, ValidStoryTransition
- internal/task/task.go: add StoryID field
- internal/storage/db.go: stories table + story_id on tasks migrations; CreateStory,
GetStory, ListStories, UpdateStoryStatus, ListTasksByStory; update all task
SELECT/INSERT to include story_id; scanTask extended with sql.NullString for story_id;
added modernc timestamp format to GetMaxUpdatedAt
- internal/storage/sqlite_cgo.go + sqlite_nocgo.go: build-tag based driver selection
(mattn/go-sqlite3 with CGO, modernc.org/sqlite pure-Go fallback) so tests run
without a C compiler
- internal/api/stories.go: GET/POST /api/stories, GET /api/stories/{id},
GET/POST /api/stories/{id}/tasks (auto-wires depends_on chain),
PUT /api/stories/{id}/status (validates transition)
- internal/api/server.go: register all story routes
- go.mod/go.sum: add modernc.org/sqlite pure-Go dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
TestPool_MaxPerAgent_BlocksSecondTask
|
|
creds, auth recovery
- maxPerAgent=1: only 1 in-flight execution per agent type at a time; excess tasks are requeued after 30s
- Drain gate: after 2 consecutive failures the agent is drained and a question is set on the task; reset on first success; POST /api/pool/agents/{agent}/undrain to acknowledge
- Pre-flight credential check: verify .credentials.json and .claude.json exist in agentHome before spinning up a container
- Auth error auto-recovery: detect auth errors (Not logged in, OAuth token has expired, etc.) and retry once after running sync-credentials and re-copying fresh credentials
- Extracted runContainer() helper from ContainerRunner.Run() to support the retry flow
- Wire CredentialSyncCmd in serve.go for all three ContainerRunner instances
- Tests: TestPool_MaxPerAgent_*, TestPool_ConsecutiveFailures_*, TestPool_Undrain_*, TestContainerRunner_Missing{Credentials,Settings}_FailsFast, TestIsAuthError_*, TestContainerRunner_AuthError_SyncsAndRetries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
ClaudeConfigDir moved from /root/.claude to credentials/claude/, but
container.go was still deriving .claude.json from filepath.Dir which
no longer pointed anywhere useful. Claude CLI needs .claude.json for
OAuth account info or it says "Not logged in".
Also update sync-credentials to copy /root/.claude.json into the
credentials dir so it stays fresh alongside the token.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
After a successful run with no commits pushed, detectUncommittedChanges
checks for modified tracked files and untracked source files. If any
exist the task fails with an explicit error rather than silently
succeeding while the work evaporates when the sandbox is deleted.
Scaffold files written by the harness (.claudomator-env,
.claudomator-instructions.txt, .agent-home/) are excluded from the check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- task.Project type + storage CRUD + UpsertProject + SeedProjects
- Remove AgentConfig.ProjectDir, RepositoryURL, SkipPlanning
- Remove ContainerRunner fallback git init logic
- Project API endpoints: GET/POST /api/projects, GET/PUT /api/projects/{id}
- processResult no longer extracts changestats (pool-side only)
- claude_config_dir config field; default to credentials/claude/
- New scripts: sync-credentials, fix-permissions, check-token
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The server runs as www-data whose HOME is /var/www — deriving
credentials from $HOME/.claude always produced an empty path.
Now reads from ClaudeConfigDir (default: /workspace/claudomator/credentials/claude),
which sync-credentials keeps populated with fresh OAuth tokens.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- GET /api/stats?window=7d: pre-aggregated SQL queries for errors, throughput, billing
- Errors section: category summary (quota/rate_limit/timeout/git/failed) + failure table
- Throughput section: stacked hourly bar chart (completed/failed/other) over 7d
- Billing section: KPIs (7d total, avg/day, cost/run) + daily cost bar chart
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
detection
- Detect Gemini TerminalQuotaError (daily quota) as BUDGET_EXCEEDED, not generic FAILED
- Surface container stderr tail in error so quota/rate-limit classifiers can match it
- Add agent_events table to persist rate-limit start/recovery events across restarts
- Add GET /api/agents/status endpoint returning live agent state + 24h event history
- Stats dashboard: agent status cards, 24h availability timeline, per-run execution table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- ContainerRunner replaces ClaudeRunner/GeminiRunner; all agent types run
in Docker containers via claudomator-agent:latest
- Writable agentHome staging dir (/home/agent) satisfies home-dir
requirements for both claude and gemini CLIs without exposing host creds
- Copy .credentials.json and .claude.json into staging dir at run time;
GEMINI_API_KEY passed via env file
- Fix git clone: remove MkdirTemp-created dir before cloning (git rejects
pre-existing dirs even when empty)
- Replace localhost with host.docker.internal in APIURL so container can
reach host API; add --add-host=host.docker.internal:host-gateway
- Run container as --user=$(uid):$(gid) so host-owned workspace files are
readable; chmod workspace 0755 and instructions file 0644 after clone
- Pre-create .gemini/ in staging dir to avoid atomic-rename ENOENT on first
gemini-cli run
- Add ct CLI tool to container image: pre-built Bash wrapper for
Claudomator API (ct task submit/create/run/wait/status/list)
- Document ct tool in CLAUDE.md agent instructions section
- Add drain-failed-tasks script: retries failed tasks on a 5-minute interval
- Update Dockerfile: Node 22 via NodeSource, Go 1.24, gemini-cli,
git safe.directory=*, default ~/.claude.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|