| Age | Commit message (Collapse) | Author |
|
Addresses the cleanup queue captured in docs/plans/local-oss-runner.md
after the local-OSS-models epic landed. After this commit
`go test -race ./...` is green across every package with zero `t.Skip`
calls and no excluded tests.
Real bugs fixed:
- claude.go setupSandbox callsites used `sandboxDir, err := ...` which
shadowed the outer variable, so BlockedError.SandboxDir was always
empty. Resume-after-block was broken for both new and stale-sandbox
paths. TestBlockedError_IncludesSandboxDir now exercises the right
invariant.
- TestPool_ActivePerAgent_DeletesZeroEntries flake under -race: the
cleanup defer in execute()/executeResume() runs AFTER
handleRunResult sends on resultCh, so consumers observing a result
could see a still-counted activePerAgent entry. Extracted
decActiveAgent(agentType, *cleaned) helper; called explicitly before
every resultCh send, defer becomes a no-op via the cleaned flag.
Verified clean over `go test -race -count=10`.
Test infrastructure made hermetic:
- gitSafe now also passes -c commit.gpgsign=false / -c tag.gpgsign=false
so sandbox tests pass on hosts whose global config requires signing.
- Bare repos in tests initialized with `-b main` (HEAD symbolic ref
matched to the branch we push) so `git log` after push works.
- TestSandboxCloneSource_FallsBackToOrigin uses a local-FS origin URL,
matching sandboxCloneSource's intentional filter against network URLs.
- TestGeminiLogs_ParsedCorrectly URL fixed to the actual log route
(/api/executions/{id}/log).
GeminiRunner gap closed (partial):
- parseGeminiStream now walks lines for `result` events, surfacing
is_error as an error and total_cost_usd as the float return value.
- GeminiRunner.Run propagates parsed cost to Execution.CostUSD.
- TestParseGeminiStream_ParsesStructuredOutput unskipped.
Notes:
- GeminiRunner is still simulated end-to-end (Run writes hardcoded
stream data instead of execing the binary). The result/cost parser
now exists; finishing the runner is a smaller, contained follow-up.
Kept on the deferred queue.
- Frontend "Local" agent option and a minor storage.db.go logger TODO
remain on the deferred queue, both intentionally — neither blocks
anything in flight.
https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
|
|
Phase 3 of "local OSS models as agents" plan. When the webhook handler
creates a task for a failed CI run AND a local LLM is configured on
the server, the hardcoded 4-step investigation template is replaced
with a project-aware investigation plan generated by the LLM.
Scope adjustment from the original sketch: the original plan said
"summarize fetched workflow logs", but fetching logs requires GitHub
API auth that isn't wired. Narrowed to project-context triage —
recent git log + CLAUDE.md content + webhook metadata, fed to the
LLM with a system prompt asking for 6-12 lines of concrete next
steps. Deferred GitHub log fetching to post-epic cleanup.
Implementation:
- New internal/api/webhook_llm.go holds enrichCIInstructions and its
helpers (readRecentCommits via `git log`, readProjectDoc).
- enrichCIInstructions is truly additive: any failure mode (no client,
HTTP error, empty body, 10s timeout) returns the original fallback
template unchanged. Existing webhook tests pass byte-for-byte.
- Always preserves a metadata header (repo/branch/SHA/check/URL)
ahead of the LLM body so investigators don't lose context if the
LLM is terse.
- Reuses s.llm (set via Server.SetLLM in Phase 2) — no new config
knob, no per-feature gating. Asymmetric opt-out (yes-elaborate,
no-CI-triage) deferred until there's actual demand.
Tests:
- enrichCIInstructions: nil client, LLM 500, empty body all return
fallback unchanged.
- enrichCIInstructions: success path produces enriched body with
metadata header preserved; user prompt contains repo/branch/SHA.
- enrichCIInstructions: real git repo (init + 2 commits) → recent
commits appear in user prompt.
- Webhook handler regression guard: no-LLM path produces the exact
legacy template substrings.
- Webhook handler with LLM stubbed: task instructions contain LLM
body + metadata header.
Plan: docs/plans/local-oss-runner.md.
https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
|
|
Phase 2 of "local OSS models as agents" plan. Adds a third elaboration
path that calls the local OpenAI-compatible LLM via the internal/llm
client, and reorders dispatch so the cheap path is tried first:
local → claude → gemini, with each next attempt only on hard failure
of the prior.
Wiring is opt-out, not opt-in: when [local_model].endpoint is set,
elaboration prefers local by default. Users with a slow or low-quality
local model can disable just elaboration via:
[local_model]
endpoint = "..."
prefer_for_elaborate = false
without giving up the runner or the classifier path.
Implementation:
- Server gains an optional *llm.Client field via SetLLM (matches the
existing SetNotifier/SetWorkspaceRoot setter pattern, no NewServer
signature break).
- elaborateWithLocal() reuses buildElaboratePrompt verbatim and asks
for response_format=json_object so we skip markdown-fence cleanup.
- handleElaborateTask reorders try chain; existing Claude-first
behavior is preserved exactly when SetLLM is not called.
- LocalModel.UseForElaborate() encapsulates the default-true gating
with a *bool so explicit-false survives TOML parse.
Tests:
- elaborateWithLocal: parses valid response, errors on nil client,
errors on bad JSON.
- handler: local preferred when wired; falls back to claude when
local fails; unchanged behavior when no LLM is configured.
- config: UseForElaborate gating across empty/default/explicit-true/
explicit-false cases.
Pre-existing test failures noted in docs/plans/local-oss-runner.md
(post-epic cleanup): TestGeminiLogs_ParsedCorrectly returns 404 for
gemini execution log fetch — predates this change.
Plan: docs/plans/local-oss-runner.md.
https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
|
|
Adds POST /api/webhooks/github that receives check_run and workflow_run
events and creates a Claudomator task to investigate and fix the failure.
- Config: new webhook_secret and [[projects]] fields in config.toml
- HMAC-SHA256 validation when webhook_secret is configured
- Ignores non-failure events (success, skipped, etc.) with 204
- Matches repo name to configured project dirs (case-insensitive)
- Falls back to single project when no name match found
- 11 new tests covering all acceptance criteria
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Adds GET /api/tasks/{id}/deployment-status which checks whether the
currently-deployed server binary includes the fix commits from the
task's latest execution. Uses git merge-base --is-ancestor to compare
commit hashes against the running version.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
elaboration fallback
|
|
- Add ElaborationInput field to Task struct (task.go)
- Add DB migration and update CREATE/SELECT/scan in storage/db.go
- Update handleCreateTask to accept elaboration_input from API
- Update renderSubtaskRollup in app.js to prefer elaboration_input over description
- Capture elaborate prompt in createTask() form submission
- Update subtask-placeholder tests to cover elaboration_input priority
- Fix missing io import in gemini.go
When a task card is waiting for subtasks, it now shows:
1. The raw user prompt from elaboration (if stored)
2. The task description truncated at word boundary (~120 chars)
3. The task name as fallback
4. 'Waiting for subtasks…' only when all fields are empty
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
updates
|
|
Files changed: CLAUDE.md, internal/api/changestats.go,
internal/executor/executor.go, internal/executor/executor_test.go,
internal/task/changestats.go (new)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add parseChangestatFromOutput/File helpers in internal/api/changestats.go
to parse git diff --stat summary lines from execution stdout logs
- Wire parser in processResult: after each execution completes, scan the
stdout log for git diff stats and persist via UpdateExecutionChangestats
- Tests: TestGetTask_IncludesChangestats (verifies processResult wiring),
TestListExecutions_IncludesChangestats (verifies storage round-trip)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
- Added an agent selector (Auto, Claude, Gemini) to the Start Next Task button.
- Updated the backend to pass query parameters as environment variables to scripts.
- Modified the executor pool to skip classification when a specific agent is requested.
- Added --agent flag to claudomator start command.
- Updated tests to cover the new functionality.
|
|
Previously appendRawNarrative was called with the server's default workDir
(os.Getwd()) when no project_dir was in the request, causing test runs and
any elaboration without a project to pollute the repo's own RAW_NARRATIVE.md.
The narrative is per-project human input — only write it when the caller
explicitly specifies which project they're working in.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Keep file-based summary approach (CLAUDOMATOR_SUMMARY_FILE) from HEAD.
Combine Q&A History and Stats tab CSS from both branches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
practices
Add sanitizeElaboratedTask() called after every elaboration response:
- Infers missing allowed_tools from instruction keywords (Write/Edit/Read/Bash/Grep/Glob)
- Auto-adds Read when Edit is present
- Appends Acceptance Criteria section if none present
- Appends TDD reminder for coding tasks without test mention
Also tighten buildElaboratePrompt to require acceptance criteria and
list concrete tool examples, reducing how often the model omits tools.
Fixes class of failures where agents couldn't create files because
the elaborator omitted Write from allowed_tools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Extend Resume to CANCELLED, FAILED, and BUDGET_EXCEEDED tasks
- Add summary extraction from agent stdout stream-json output
- Fix storage: persist stdout/stderr/artifact_dir paths in UpdateExecution
- Clear question_json on ResetTaskForRetry
- Resume BLOCKED tasks in preserved sandbox so Claude finds its session
- Add planning preamble: CLAUDOMATOR_SUMMARY_FILE env var + summary step
- Update ADR-002 with new state transitions
- UI style improvements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Interrupted tasks (CANCELLED, FAILED, BUDGET_EXCEEDED) now support session
resume in addition to restart. Both buttons are shown on the task card.
- executor: extend resumablePoolStates to include CANCELLED, FAILED, BUDGET_EXCEEDED
- api: extend handleResumeTimedOutTask to accept all resumable states with
state-specific resume messages; replace hard-coded TIMED_OUT check with a
resumableStates map
- web: add RESUME_STATES set; render Resume + Restart buttons for interrupted
states; TIMED_OUT keeps Resume only
- tests: 5 new Go tests (TestResumeInterrupted_*); updated task-actions.test.mjs
with 17 tests covering dual-button behaviour
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Export computeTaskStats and computeExecutionStats from app.js
- Add renderStatsPanel with state count grid, KPI row (total/success-rate/cost/avg-duration), and outcome bar chart
- Wire stats tab into switchTab and poll for live refresh
- Add Stats tab button and panel to index.html
- Add CSS for .stats-counts, .stats-kpis, .stats-bar-chart using existing state color variables
- Add docs/stats-tab-plan.md with component structure and data flow
- 14 new unit tests in web/test/stats.test.mjs (140 total, all passing)
No backend changes — derives all metrics from existing /api/tasks and /api/executions endpoints.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
When a task ran in a sandbox (/tmp/claudomator-sandbox-*) and went BLOCKED,
Claude stored its session under the sandbox path as the project slug. The
resume execution was running in project_dir, causing Claude to look for the
session in the wrong project directory and fail with "No conversation found".
Fix: carry SandboxDir through BlockedError → Execution → resume execution,
and run the resume in that directory so the session lookup succeeds.
- BlockedError gains SandboxDir field; claude.go sets it on BLOCKED exit
- storage.Execution gains SandboxDir (persisted via new sandbox_dir column)
- executor.go stores blockedErr.SandboxDir in the execution record
- server.go copies SandboxDir from latest execution to the resume execution
- claude.go uses e.SandboxDir as working dir for resume when set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
executor: add 7 tests for sandboxCloneSource, setupSandbox, and
teardownSandbox (uncommitted-changes error, clean-no-commits removal).
api: fix two data races in WebSocket tests — wsPingInterval/Deadline
are now captured as locals before goroutine start; maxWsClients is
moved from a package-level var into Hub.maxClients (with SetMaxClients
method) so concurrent tests don't stomp each other.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The elaborator now logs every user prompt to docs/RAW_NARRATIVE.md within the project directory. This is done in a background goroutine to ensure it doesn't delay the response.
|
|
The elaborator now reads CLAUDE.md and SESSION_STATE.md from the project directory (if they exist) and prepends their content to the user prompt. This allows the AI to generate tasks that are more context-aware.
|
|
Updated handleRunTask to use ResetTaskForRetry, which clears the agent type and model. This ensures that manually restarted tasks are always re-classified, allowing the system to switch to a different agent if the previous one is rate-limited. Also improved Claude quota-exhaustion detection.
|
|
response shapes
- handleListTasks: validate ?state= against known states, return 400 with clear
error for unrecognized values (e.g. ?state=BOGUS)
- handleCancelTask: replace {"status":"cancelling"|"cancelled"} with
{"message":"...","task_id":"..."} to match run/resume shape
- handleAnswerQuestion: replace {"status":"queued"} with
{"message":"task queued for resume","task_id":"..."}
- Tests: add TestListTasks_InvalidState_Returns400, TestListTasks_ValidState_Returns200,
TestCancelTask_ResponseShape, TestAnswerQuestion_ResponseShape,
TestRunTask_ResponseShape, TestResumeTimedOut_ResponseShape
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Replace the no-op mockRunner in server_test.go with a configurable
version that supports err and sleep fields. Add testServerWithRunner
helper and a pollState utility for async assertions.
Add three new tests that exercise the pool's error paths end-to-end:
- TestRunTask_AgentFails_TaskSetToFailed
- TestRunTask_AgentTimesOut_TaskSetToTimedOut
- TestRunTask_AgentCancelled_TaskSetToCancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Add workspaceRoot field (default "/workspace") to Server struct
- Add SetWorkspaceRoot method on Server
- Update handleListWorkspaces to use s.workspaceRoot
- Add WorkspaceRoot field to Config with default "/workspace"
- Wire cfg.WorkspaceRoot into server in serve.go
- Expose --workspace-root flag on the serve command
- Add TestListWorkspaces_UsesConfiguredRoot integration test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
Removed all template-related code from frontend (tabs, modals, logic) and backend (routes, files, DB table). Updated BUDGET_EXCEEDED tasks to be requeueable with a Restart button. Fixed ReferenceError in isUserEditing for Node.js tests.
|
|
Update the default Gemini model and classification prompt to use gemini-2.5-flash-lite, which is the current available model. Improved the classifier's parsing logic to correctly handle the JSON envelope returned by the gemini CLI (stripping 'response' wrapper and 'Loaded cached credentials' noise).
|
|
Remove the MaxAttempts check from POST /api/tasks/{id}/run. A user
explicitly triggering a run is a manual action and should not be gated
by the retry limit. Retry limits will be enforced in the (future)
automatic retry path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- handleCreateTask: add legacy "claude" key fallback in input struct so
old clients and YAML files sending claude:{...} still work
- cli/create: send "agent" key instead of "claude"; add --agent-type flag
- storage/db_test: fix ClaudeConfig → AgentConfig after rename
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Resolve conflicts in API server, CLI, and executor.
- Maintain Gemini classification and assignment logic.
- Update UI to use generic agent config and project_dir.
- Fix ProjectDir/WorkingDir inconsistencies in Gemini runner.
- All tests passing after merge.
|
|
- handleListRecentExecutions: add since/limit/task_id query params
- handleStreamLogs: tighten SSE framing and cleanup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Extract questionStore interface for testability of handleAnswerQuestion
- Add SetAPIToken/SetNotifier methods for post-construction wiring
- Extract processResult() from forwardResults() for direct testability
- Add ipRateLimiter with token-bucket per IP; applied to /elaborate and /validate
- Fix tests for running-task deletion and retry-limit that relied on
invalid state transitions in setup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Replace hardcoded handleStartNextTask/handleDeploy with a single
handleScript handler keyed by name from a ScriptRegistry map.
Scripts are now configured via Server.SetScripts() rather than
individual setter fields.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- Require bearer token on WebSocket connections when apiToken is set
- Cap concurrent WebSocket clients at maxWsClients (1000, overridable)
- Send periodic pings every 30s; close dead connections after 10s write deadline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- ClaudeConfig.WorkingDir → ProjectDir (json: project_dir)
- UnmarshalJSON fallback reads legacy working_dir from DB records
- New executions with project_dir clone into a temp sandbox via git clone --local
- Non-git project_dirs get git init + initial commit before clone
- After success: verify clean working tree, merge --ff-only back to project_dir, remove sandbox
- On failure/BLOCKED: sandbox preserved, path included in error message
- Resume executions run directly in project_dir (no re-clone)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
|
|
|
|
|
|
|
|
The elaborate call now sends working_dir from the Project dropdown.
The backend uses it (falling back to server workDir) when building
the system prompt, so AI-drafted tasks are contextualised to the
selected project.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
/workspace/claudomator
- Moved working directory to first field, renamed to "Project"
- Replaced text input with a select populated from GET /api/workspaces
(lists subdirs of /workspace dynamically)
- "Create new project…" option reveals a custom path input
- elaborate result handler sets select or falls back to new-project input
- Added GET /api/workspaces endpoint in server.go
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Recovered files with no Claude→Agent contamination:
- docs/adr/002-task-state-machine.md
- internal/api/logs.go/logs_test.go: task-level log streaming endpoint
- internal/api/validate.go/validate_test.go: POST /api/tasks/validate
- internal/api/server_test.go, storage/db_test.go: expanded test coverage
- scripts/reset-failed-tasks, reset-running-tasks
- web/app.js, index.html, style.css: frontend improvements
- web/test/: active-tasks-tab, delete-button, filter-tabs, sort-tasks tests
Manually applied from server.go diff (skipping Claude→Agent rename):
- taskLogStore field + validateCmdPath field
- DELETE /api/tasks/{id} route + handleDeleteTask
- GET /api/tasks/{id}/logs/stream route
- POST /api/tasks/{id}/resume route + handleResumeTimedOutTask
- handleCancelTask: allow cancelling PENDING/QUEUED tasks directly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
restart
Two bugs:
1. SubmitResume was called with r.Context(), which is cancelled as soon
as the HTTP handler returns, immediately cancelling the resume execution.
Switch to context.Background() so the execution runs to completion.
2. CANCELLED→QUEUED was missing from ValidTransition, so the Restart
button on cancelled tasks always returned 409. Added the transition.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
POST /api/tasks/{id}/cancel now works. Pool tracks a cancel func per
running task ID; Cancel(taskID) calls it and returns false if the task
isn't running. The execute goroutine registers/deregisters the cancel
func around the runner call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
When an agent needs user input it writes a question to
$CLAUDOMATOR_QUESTION_FILE and exits. The runner detects the file and
returns BlockedError; the pool transitions the task to BLOCKED and
stores the question JSON on the task record.
The user answers via POST /api/tasks/{id}/answer. The server looks up
the claude session_id from the most recent execution and submits a
resume execution (claude --resume <session-id> "<answer>"), freeing the
executor slot entirely while waiting.
Changes:
- task: add StateBlocked, transitions RUNNING→BLOCKED, BLOCKED→QUEUED
- storage: add session_id to executions, question_json to tasks;
add GetLatestExecution and UpdateTaskQuestion methods
- executor: BlockedError type; ClaudeRunner pre-assigns --session-id,
sets CLAUDOMATOR_QUESTION_FILE env var, detects question file on exit;
buildArgs handles --resume mode; Pool.SubmitResume for resume path
- api: handleAnswerQuestion rewritten to create resume execution
- preamble: add question protocol instructions for agents
- web: BLOCKED state badge (indigo), question text + option buttons or
free-text input with Submit on the task card footer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Merges features developed in /site/doot.terst.org/claudomator-work (a
stale clone) into the canonical repo:
- executor: QuestionRegistry for human-in-the-loop answers, rate limit
detection and exponential backoff retry (ratelimit.go, question.go)
- executor/claude.go: process group isolation (SIGKILL orphans on cancel),
os.Pipe for reliable stdout drain, backoff retry on rate limits
- api/scripts.go: POST /api/scripts/start-next-task handler
- api/server.go: startNextTaskScript field, answer-question route,
BroadcastQuestion for WebSocket question events
- web: Cancel/Restart buttons, question banner UI, log viewer, validate
section, WebSocket auto-connect
All tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Top-level tasks now land in READY after successful execution instead of
going directly to COMPLETED. Subtasks (with parent_task_id) skip the gate
and remain COMPLETED. Users accept or reject via new API endpoints:
POST /api/tasks/{id}/accept → READY → COMPLETED
POST /api/tasks/{id}/reject → READY → PENDING (with rejection_comment)
- task: add StateReady, RejectionComment field, update ValidTransition
- storage: migrate rejection_comment column, add RejectTask method
- executor: route top-level vs subtask to READY vs COMPLETED
- api: /accept and /reject handlers with 409 on invalid state
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|