# Local OSS Models as a Third Runner

## Context

Today the executor only knows about subprocess CLI agents (Claude, with a stubbed Gemini). Internal LLM-shaped work — model classification, free-form prompt elaboration, webhook CI summarization, execution summary — either shells out to the `gemini` CLI (`internal/executor/classifier.go:60`) or sits in `internal/api/elaborate.go` doing the same. That's expensive in latency and dollars for what are essentially helper completions, and there's no path to keep "internal" reasoning private/local.

This change adds a local OSS model backend (any OpenAI-compatible HTTP server: Ollama, vLLM, LM Studio, llama.cpp server) as a first-class third Runner alongside Claude and Gemini. The unified harness model wins over a separate "internal LLM service" because it preserves a single `Runner` abstraction, single `executions` table, and one set of pool semantics (rate-limit handling, observability, WebSocket events) for any task whose `agent.type == "local"`.

Outcome: a `LocalRunner` for user-facing tasks, plus a lower-level `LocalLLMClient` that internal helpers call directly without paying Pool/Execution overhead. First migration target is the classifier (sub-second, high-volume, lowest blast radius). Elaboration, webhook summarization, and execution summary follow in subsequent passes using the same client.

## Architectural decision: two layers, one backend

`LocalRunner implements Runner` is the user-visible contract. But the classifier runs *inside* `Pool.execute()` (at `internal/executor/executor.go:437`), so submitting recursively to `Pool` would deadlock against `workCh`'s slot accounting and pollute the `executions` table with sub-second rows for every classification.

Resolution: introduce a `LocalLLMClient` (HTTP, no Pool, no DB) as the workhorse. `LocalRunner` is a thin adapter over it for full Pool-managed executions. Internal callers — classifier now, elaborate/webhook/summary later — call `LocalLLMClient` directly. Two code paths to local, but path lengths are wildly unequal (the runner is ~150 lines of glue) and they share one HTTP round-tripper.

Capabilities (e.g. "this runner can edit code, that one can't") are deferred. `LocalRunner` simply leaves `SandboxDir` empty; the Pool already tolerates that. Revisit only when a third non-coding runner appears.

## End state

- **`internal/llm`** (new package) — `LocalLLMClient` with `Chat` and `ChatStream` over OpenAI-compat `/chat/completions`. Handles retries via the existing backoff helper, JSON mode, SSE streaming, optional bearer token.
- **`internal/executor/local.go`** (new) — `LocalRunner` implements `Runner`. Streams response deltas into the same stream-json envelope Claude uses (`{"type":"assistant","message":{"content":[{"type":"text","text":"..."}]}}`) so existing parsers (`internal/executor/summary.go:13`, `internal/task/changestats.go`) keep working unchanged.
- **`Classifier`** (`internal/executor/classifier.go`) — now holds a `*llm.Client`. When set, classification goes through it with `response_format: json_object`; markdown-fence cleanup is skipped on this path. Gemini-CLI path stays as a fallback when `[local_model]` config is empty.
- **Storage** — `executions.tokens_in` and `tokens_out` added (additive `ALTER`, schema pattern at `internal/storage/db.go:78-89`). `cost_usd` stays 0 for local. `session_id`/`sandbox_dir` remain nullable; `LocalRunner` simply doesn't populate them.
- **`AgentConfig`** — adds `Temperature *float64` (pointer so 0 means "unset") and `MaxTokens int` at `internal/task/task.go:30`. Existing Claude-shaped fields (`PermissionMode`, `AllowedTools`, etc.) are silently ignored by `LocalRunner`.
- **Config** — new `[local_model]` TOML section in `internal/config/config.go:18`: `endpoint`, `model`, `timeout_seconds`, `default_temperature`, `api_key`. If `endpoint` is empty, the runner is not registered and the classifier falls back to Gemini-CLI.
- **Routing** — `executor.go:428`'s hardcoded `t.Agent.Type == "claude" || == "gemini"` widens to include `"local"` (or, cleaner, becomes `t.Agent.Type != ""`).
- **Wiring** — `cmd/claudomator/main.go`, `internal/cli/serve.go:60-78`, and `internal/cli/run.go:75-90` build the `*llm.Client` from config and register both `runners["local"]` and `pool.Classifier.LLM`.
- **GeminiRunner** (`internal/executor/gemini.go`) — kept and finished alongside as a separate concern. The shared backoff helper move (below) and the `LogPather` interface it already implements (`gemini.go:26`) are unaffected. Real subprocess invocation replacing the simulated stdout block at `gemini.go:107-116` is a follow-up commit, not gated by this change.

Shared utility move: `runWithBackoff` currently lives at `internal/executor/ratelimit.go:60`. Move it to a new tiny `internal/retry` package so both `internal/executor` and `internal/llm` use it. One-line change at the existing call site in `claude.go`.

## Migration phases

**Phase 1 — this pass. Classifier swap.** All the `internal/llm` + `internal/executor/local.go` + `Classifier` work above. Gated by config: if `[local_model].endpoint` is unset, behavior is unchanged. Net new files; no breaking changes to existing runners.

**Phase 2 — task elaboration.** `internal/api/elaborate.go:208-275` currently has Claude and Gemini paths. Add `elaborateWithLocal`; new try-order is local → claude → gemini, controlled by a `prefer_local_for_elaborate` config flag. `Server` (`internal/api/server.go:76`) gains an `llm *llm.Client` field passed via `NewServer`.

**Phase 3 — webhook CI summarization.** `createCIFailureTask` at `internal/api/webhook.go:154` builds task instructions from a hardcoded template. Add an optional summarization step calling `s.llm.Chat` over the fetched workflow logs to produce a tighter `instructions` body. Pure additive.

**Phase 4 — execution summary.** `extractSummary` (`internal/executor/summary.go:13`) is text-pattern based. Add `summarizeExecution(ctx, *llm.Client, stdoutPath) string` that synthesizes a summary when no `## Summary` section exists. Hook lives in `Pool.handleRunResult` at `executor.go:347-355`; pass `*llm.Client` through `Pool` construction.

## Critical files

**New:**
- `internal/llm/client.go` — `Client`, `Chat`, `ChatStream`, request/response types
- `internal/llm/client_test.go` — `httptest`-driven coverage
- `internal/executor/local.go` — `LocalRunner`
- `internal/executor/local_test.go` — runner tests with stub `*llm.Client`
- `internal/retry/backoff.go` — relocated `runWithBackoff`

**Modified:**
- `internal/executor/classifier.go` — add `LLM *llm.Client` field, route through it when set, keep Gemini fallback path
- `internal/executor/classifier_test.go` — add httptest-backed test
- `internal/executor/executor.go:428` — broaden `skipClassification` predicate
- `internal/executor/ratelimit.go` — remove `runWithBackoff` (moved); update import in `claude.go`
- `internal/task/task.go:30-43` — add `Temperature`, `MaxTokens` to `AgentConfig`
- `internal/config/config.go:18-52` — add `LocalModel` struct + field to `Config`
- `internal/storage/db.go:78-89` — two additive `ALTER` migrations; add `TokensIn`/`TokensOut` to `Execution` struct; update SELECT/INSERT/UPDATE SQL in same file
- `internal/cli/serve.go:60-78`, `internal/cli/run.go:75-90`, `cmd/claudomator/main.go` — build client, register runner, wire classifier

## Reuse, not reinvention

- `runWithBackoff` (`internal/executor/ratelimit.go:60`) → relocated and reused by `LocalLLMClient`
- `isRateLimitError`/`isQuotaExhausted` (`executor.go:271-283`) → emit compatible error strings from `LocalLLMClient` so Pool's existing rate-limit handling treats local 429/503 identically
- Stream-json envelope from `claude.go:600` parsing → `LocalRunner` writes the same envelope so `extractSummary` and `ParseChangestatFromFile` work unchanged
- Existing nullable `session_id`/`sandbox_dir` columns → no schema rework needed for non-coding runners
- `LogPather` interface (`executor.go:38`) → `LocalRunner` implements it for log path pre-population, just like `GeminiRunner` already does

## Verification

**Unit tests:**
- `internal/llm/client_test.go`: httptest server returns canned chat-completion JSON; assert `Chat` returns parsed `Content`, prompt/output tokens, model. Second test: SSE stream (data: lines, terminating `data: [DONE]`); assert `onDelta` called per chunk and final `ChatResponse` aggregated. Third: HTTP 429 with `Retry-After: 1` → assert one retry then success.
- `internal/executor/classifier_test.go`: httptest backend returning JSON-mode response → assert `Classification` parsed correctly. Existing mock-binary test stays for the Gemini fallback path.
- `internal/executor/local_test.go`: stub `*llm.Client` returning fixed text → `Run` writes correct stream-json envelope to `stdout.log`; verify `extractSummary` finds `## Summary` from that envelope.
- `go test -race ./...` passes (Pool reentrancy is the risk this design avoids; race detector would catch slips).

**Manual end-to-end against Ollama:**
1. `ollama pull llama3.1:8b && ollama serve`
2. Add to `~/.claudomator/config.toml`:
   ```toml
   [local_model]
   endpoint = "http://localhost:11434/v1"
   model = "llama3.1:8b"
   ```
3. `./claudomator serve` → submit a normal task → observe a single classification request hit Ollama (no `gemini` subprocess spawned) and a model selection logged at `executor.go:440`.
4. Submit a task with `agent.type = "local"`, `instructions = "Summarize: 2+2"`. Expect `READY`/`COMPLETED` execution, populated `stdout.log` with stream-json text deltas, `cost_usd = 0`, non-zero `tokens_out` in the `executions` row.
5. Stop Ollama → submit another task → classifier should fall back to `gemini` invocation (or fail with a rate-limit-style error if no Gemini binary present). Confirms the `endpoint == ""` and runtime-failure fallback paths both work.

**Build sanity:** `go build ./...` and `go test -race ./...` (CGo / `gcc` required per CLAUDE.md).

---

# Phase 1 — Focused Plan

This is the only phase we execute in this pass. Phases 2–4 will get their own focused plans when we start them; the sketches above are forward intent, not commitments.

## Phase 1 scope (what ships)

- New `internal/llm` package with `Client.Chat` and `Client.ChatStream`
- New `internal/executor/local.go` with `LocalRunner` implementing `Runner`
- New `internal/retry` package holding the relocated `runWithBackoff`
- Classifier (`internal/executor/classifier.go`) routes through `*llm.Client` when configured; Gemini-CLI fallback preserved
- Two additive `executions` migrations: `tokens_in`, `tokens_out`
- `AgentConfig` gains `Temperature *float64`, `MaxTokens int`
- `Config` gains `[local_model]` section (`endpoint`, `model`, `timeout_seconds`, `default_temperature`, `api_key`)
- `executor.go:428` `skipClassification` predicate broadens to all non-empty agent types
- Wiring in `cmd/claudomator/main.go`, `internal/cli/serve.go`, `internal/cli/run.go`

## Phase 1 explicit non-goals

- No changes to `internal/api/elaborate.go` (Phase 2)
- No changes to `internal/api/webhook.go` (Phase 3)
- No changes to `internal/executor/summary.go` summary-generation logic (Phase 4)
- No GeminiRunner completion work (cost parsing, sandbox, real subprocess) — separate parallel commit
- No frontend changes — UI still says "Auto / Claude / Gemini"; "Local" dropdown option deferred until token telemetry surfaces
- No capabilities interface on `Runner`
- No new `executions` columns beyond the two token counters

## Phase 1 task list (in execution order)

1. **Persist this plan to the workspace.** Copy `/root/.claude/plans/major-revision-we-re-going-quizzical-newell.md` to `docs/plans/local-oss-runner.md`. This is the durable record that lives with the codebase. Phase 2/3/4 focused plans will be appended to the same file when started.

2. **Create branch.** `git checkout -b claude/local-oss-model-agents-MEBqj` (already designated; create if it doesn't exist).

3. **`internal/retry/backoff.go`** — relocate `runWithBackoff` from `internal/executor/ratelimit.go:60`. Update the existing call site in `internal/executor/claude.go` to import from the new path. Keep all signature and behavior unchanged. Run `go build ./...` and `go test ./internal/executor/...` to confirm zero behavioral change.

4. **`internal/llm/client.go`** — implement the package. Types from the design:
   - `Client{Endpoint, Model, HTTPClient, APIKey, Logger}`
   - `ChatRequest{Model, Messages, Temperature, MaxTokens, ResponseJSON, Stream}`
   - `Message{Role, Content}`
   - `ChatResponse{Content, PromptTokens, OutputTokens, Model, FinishReason}`
   - `Chat(ctx, req)` — POSTs `/chat/completions`, wraps in `retry.RunWithBackoff`, maps 429/503 to `isRateLimitError`-compatible error strings
   - `ChatStream(ctx, req, onDelta)` — same endpoint with `stream: true`, parses SSE `data:` lines, calls `onDelta(text)` per chunk, terminates on `data: [DONE]`, aggregates final response

5. **`internal/llm/client_test.go`** — three tests:
   - Canned chat-completion JSON → assert `Chat` returns parsed `Content`, prompt/output tokens, model
   - SSE stream of `data:` lines terminated by `data: [DONE]` → assert `onDelta` called per chunk, final `ChatResponse` aggregated
   - HTTP 429 with `Retry-After: 1` → assert one retry then success

6. **`internal/storage/db.go:78-89`** — append two `ALTER TABLE executions ADD COLUMN` migrations for `tokens_in INTEGER` and `tokens_out INTEGER`. Add `TokensIn`, `TokensOut int64` to `Execution` struct. Update SELECT, INSERT, UPDATE SQL in the same file. Existing `isColumnExistsError` swallows duplicate-column errors so re-running is safe.

7. **`internal/task/task.go:30-43`** — add `Temperature *float64` and `MaxTokens int` to `AgentConfig` with appropriate yaml/json tags. Pointer for Temperature so 0 means "unset, use server default."

8. **`internal/config/config.go:18-52`** — add `LocalModel` struct (`Endpoint`, `Model`, `TimeoutSeconds`, `DefaultTemperature`, `APIKey`) and `LocalModel LocalModel` field on `Config`. `Default()` leaves `Endpoint` empty.

9. **`internal/executor/local.go`** — `LocalRunner` struct with `Client *llm.Client`, `Logger`, `LogDir`. Implement `Run(ctx, *task.Task, *storage.Execution) error`:
   - Build messages from `t.Agent.SystemPromptAppend` + `Instructions`
   - Call `Client.ChatStream` with `onDelta` writing `{"type":"assistant","message":{"content":[{"type":"text","text":"<delta>"}]}}` lines to `e.StdoutPath`
   - On completion, write a final `{"type":"result", ...}` line so existing parsers see a recognizable terminator
   - Set `e.TokensIn`, `e.TokensOut`, `e.CostUSD = 0`, `e.Status = "READY"`
   - Implement `LogPather` so log paths pre-populate consistently with other runners

10. **`internal/executor/local_test.go`** — runner tests with a stub `*llm.Client` (use a small interface or test-injected `HTTPClient`):
    - Stub returns fixed text containing a `## Summary` section
    - Assert `Run` writes correct stream-json envelope to `stdout.log`
    - Assert `extractSummary(stdoutPath)` (from `internal/executor/summary.go`) finds the summary
    - Assert `e.TokensOut > 0` and `e.CostUSD == 0`

11. **`internal/executor/classifier.go`** — add `LLM *llm.Client` field on `Classifier`. In `Classify`, when `c.LLM != nil`, use `LLM.Chat` with `ResponseJSON: true`, skip the markdown-fence cleanup. When nil, fall through to the existing Gemini-CLI subprocess path. Existing prompt template stays (already lists Claude+Gemini models, which is what the classifier still picks among).

12. **`internal/executor/classifier_test.go`** — add httptest-backed test for the LLM path. Existing mock-binary test (if present) stays for the Gemini fallback path.

13. **`internal/executor/executor.go:428`** — change `skipClassification := t.Agent.Type == "claude" || t.Agent.Type == "gemini"` to `skipClassification := t.Agent.Type != ""`. This generalizes correctly: any explicitly-set agent type skips selection; unset still goes through `pickAgent` + `Classifier`.

14. **Wire registration** in three files:
    - `cmd/claudomator/main.go` — build `*llm.Client` from `cfg.LocalModel` if `Endpoint != ""`, pass to pool construction
    - `internal/cli/serve.go:60-78` — register `runners["local"] = &executor.LocalRunner{...}`, set `pool.Classifier = &executor.Classifier{LLM: localClient, GeminiBinaryPath: cfg.GeminiBinaryPath}`
    - `internal/cli/run.go:75-90` — same pattern

15. **`go test -race ./...`** — full suite passes. The race detector is the safety net for the reentrancy-avoidance design.

16. **Manual smoke test against Ollama** — five steps documented in the Verification section above. Confirm the fallback path by stopping Ollama mid-session and watching classification fall back to Gemini.

17. **Commit and push** to `claude/local-oss-model-agents-MEBqj`. Single commit covering Phase 1, message in the form: `feat(executor): add LocalRunner and OpenAI-compat LLM client`. Body describes the two-layer split (Client + Runner), the classifier swap, and the config gating.

## Stop conditions for Phase 1

- All unit tests pass under `-race`
- `go build ./...` clean
- Smoke test against a running Ollama instance produces a `READY` execution with non-zero `tokens_out` and `cost_usd = 0`
- Smoke test with `[local_model]` empty produces unchanged behavior (Gemini classifier path, no LocalRunner registered)
- Branch pushed to remote

After Phase 1 lands, we stop and decide whether to begin Phase 2 (elaboration). At that point we'll write a Phase 2 focused plan in `docs/plans/local-oss-runner.md`.

---

# Post-epic follow-up: deep cleanup

After all four phases land, plan and execute a deep cleanup pass. Things noticed in flight that we deliberately did not chase mid-epic:

- **Sandbox/git tests fail in this environment** because `git commit` invokes a signing server that returns 400 ("missing source"). Affected: `TestSandboxCloneSource_*`, `TestSetupSandbox_*`, `TestTeardownSandbox_*`, `TestBlockedError_IncludesSandboxDir`, `TestClaudeRunner_Run_StaleSandboxDir_ClonesAfresh`. Fix: set `commit.gpgsign=false` in test setup so sandbox tests run hermetically.
- **`TestParseGeminiStream_ParsesStructuredOutput` is currently `t.Skip`** as a pre-existing gemini-stub gap. Either implement result-error/cost parsing in `parseGeminiStream` or delete the test until the stub is finished.
- **`TestPool_ActivePerAgent_DeletesZeroEntries` flakes** under `-race` when run with the full suite (passes in isolation and on `-count=3`). Likely goroutine-ordering in the `activePerAgent` map cleanup path. Audit dispatch/finish ordering.
- **`setupSandbox` test signature drift** was just fixed; audit other tests for similar staleness from prior refactors.
- **Pre-existing `executor` tests didn't compile on trunk** until the setupSandbox fix landed. Verify CI reality — is it green via something we're missing, or quietly broken?
- **GeminiRunner is still simulated** (`gemini.go:107-116`). Decide: finish it (real subprocess + cost parsing + sandbox) or delete it and leave only Claude + Local.
- **Frontend "Local" agent option** — UI dropdown still says "Auto / Claude / Gemini". Add Local once token telemetry has a place to render.
- **Audit `*_test.go` for `t.Skip` and other dormant breakage** before shipping more code on top.
- **`TestGeminiLogs_ParsedCorrectly`** in `internal/api` returns 404 from `GET /log` for a gemini execution — pre-existing on Phase 1 baseline. Some routing or log-path resolution mismatch specific to gemini executions. Likely related to the GeminiRunner stub status above.

Goal: clean `go test -race ./...` with zero skips and zero environmental failures on whatever platform CI runs on.

## Cleanup pass — DONE

All eight items in the cleanup queue above have been addressed in the post-epic cleanup commit. Summary of fixes:

- `gitSafe` now disables `commit.gpgsign` and `tag.gpgsign` so sandbox tests pass on hosts with surprise signing config; matching `safe.directory=*` literals in test helpers updated for parity.
- Real bug found and fixed: `setupSandbox(...)` callsites in `claude.go` used `sandboxDir, err := ...` which shadowed the outer variable. `BlockedError.SandboxDir` was always empty as a result; `TestBlockedError_IncludesSandboxDir` now passes for the right reason.
- `parseGeminiStream` now parses `result` events for `is_error`/`total_cost_usd` and returns errors/cost accordingly; `TestParseGeminiStream_ParsesStructuredOutput` is unskipped.
- `GeminiRunner.Run` propagates parsed cost to `Execution.CostUSD`.
- `TestGeminiLogs_ParsedCorrectly` test URL fixed (`/api/tasks/{id}/executions/{exec-id}/log` → `/api/executions/{id}/log`, matching the actual route).
- `TestPool_ActivePerAgent_DeletesZeroEntries` flake root-caused: `handleRunResult` was sending on `resultCh` before `execute()`'s deferred cleanup ran, so consumers could observe a zero-count map entry. Extracted `decActiveAgent(agentType, *cleaned)` helper, called explicitly before each `resultCh` send, defer becomes no-op via the cleaned flag. Verified clean over `-count=10` under `-race`.
- `TestSandboxCloneSource_FallsBackToOrigin` updated to use a local-FS origin URL, matching `sandboxCloneSource`'s actual semantics (it filters non-local URLs to avoid network clones).
- All bare repos in tests created with `git init --bare -b main` so `HEAD` symbolically points at `main` (not the default `master`), unblocking the `git log` queries the tests perform after pushing.

Test-suite state after cleanup: `go test -race ./...` is green across all packages with zero `t.Skip` calls and zero excluded tests.

Items not chased (deferred deliberately):
- **GeminiRunner is still simulated** (`gemini.go` `Run` writes hardcoded stream data instead of executing the binary). The result/cost parsing now exists, so finishing the runner is a smaller, contained change. Kept on the queue but doesn't block anything else.
- **Frontend "Local" agent option** — UI dropdown still says "Auto / Claude / Gemini". Pending token telemetry surface.
- **`storage.db.go:706` TODO comment** — minor logger plumbing nit. Skipping unless it blocks something.

## Deferred work — DONE

Follow-up commit closed the three deferred items above:

- `GeminiRunner.execOnce` now invokes the actual `gemini` binary via `exec.CommandContext`, mirroring the `ClaudeRunner` pattern: pipe stdout to `parseGeminiStream`, kill the process group on context cancel, capture stderr to file, surface exit codes. Hardcoded simulation removed.
- Test infrastructure bug uncovered and fixed in passing: the mock gemini script in `testServerWithGeminiMockRunner` was using `"\``json\`"` which bash interpreted as command substitution, so the script always produced empty output. Switched to a single-quoted heredoc. The bug was masked previously because the runner ignored the script entirely.
- Frontend `index.html` dropdown gains a `Local` option. No JS branching changes needed — the value flows through to `agent.type` verbatim and downstream display reads the type string as-is.
- Stale debug-comment scaffolding around `storage.db.go:706` deleted.

---

# Phase 2 — Focused Plan (Elaboration)

## Phase 2 scope

`internal/api/elaborate.go` currently has two paths: Claude and Gemini. Add a third (local) and make it the preferred path when local model is configured. Try-order: local → claude → gemini, with each next attempt only on hard failure of the prior.

Second-cheapest, second-highest-volume LLM call after classification (one per task creation, sub-second target). Routing through local removes another cost line and lets elaboration work offline.

## What ships

- `Server` (`internal/api/server.go`) gains `llm *llm.Client` threaded through `NewServer`
- `internal/api/elaborate.go` gains `elaborateWithLocal(ctx, *llm.Client, input string) (string, error)`
- Dispatch in `Server.elaborate` reorders to: local → claude → gemini, gated by `PreferLocalForElaborate`
- `Config` gains `PreferLocalForElaborate bool`, defaulted true when `LocalModel.Endpoint != ""`
- Wiring in `internal/cli/serve.go` passes the LLM client into `NewServer`

## Explicit non-goals

- No prompt rework — reuse existing elaboration prompt template verbatim
- No streaming the response into SSE/WebSocket (one-shot RPC)
- No changes to webhook (Phase 3) or summary (Phase 4)
- No UI changes — `/elaborate` endpoint signature stays the same

## Task list

1. Read `internal/api/elaborate.go` end-to-end: dispatch site, Claude path, Gemini path, prompt template
2. Read `internal/api/server.go` `NewServer` signature and `Server` fields
3. Thread `llm *llm.Client` through `NewServer` and update callers (`internal/cli/serve.go`)
4. Implement `elaborateWithLocal` using the same prompt template as Claude/Gemini, returning `(string, error)`
5. Add `PreferLocalForElaborate bool` to `config.Config`, default true when local endpoint configured
6. Reorder dispatch: `if s.llm != nil && cfg.PreferLocalForElaborate { try local; else fall through }` then existing claude → gemini chain
7. httptest-based unit test for `elaborateWithLocal`
8. Dispatch fallback test: local fails → claude attempted
9. `go build ./... && go test -race ./...`
10. Commit Phase 2 on the same branch
11. Push

## Stop conditions

- Tests green under `-race`
- `prefer_local_for_elaborate=false` short-circuits to Claude path (preserves current behavior when user opts out)
- Local-failure fallback to Claude verified by test
- Branch pushed

---

# Phase 3 — Focused Plan (CI Failure Triage)

## Scope adjustment from the original sketch

The original Phase 3 sketch was "summarize fetched workflow logs". Fetching GitHub workflow logs requires authenticated GitHub API access (PAT or app token), which is out of scope and would balloon this phase into a GitHub-integration epic. Narrow Phase 3 to **project-context-based triage** — use signals we already have without new dependencies.

What we have at webhook time: `repository.full_name`, `branch`, `SHA`, `check_name`, `html_url`, plus (when matched) a project directory we can read locally.

What the LLM can do with that: produce a tighter, project-aware investigation prompt that names the recent commits, points at suspect files, and gives the agent better starting hypotheses than the current generic 4-step template.

## What ships

- New helper `enrichCIInstructions(ctx, *llm.Client, ciContext, projectDir, fallback string) string`
- `createCIFailureTask` calls it when `s.llm != nil`; on any error, returns the existing hardcoded template (truly additive — webhook tests for the no-LLM path stay passing unchanged)
- Helper uses: recent git log (last 5 commits from project_dir if it's a git repo), CLAUDE.md content if present, plus all webhook metadata
- One configuration knob: reuse `LocalModel.UseForElaborate()` semantics? No — separate flag. Add `LocalModel.PreferForCITriage *bool` defaulting true when endpoint set, opt-out symmetrical with `PreferForElaborate`.

## Explicit non-goals

- No GitHub API integration (no log fetching, no auth)
- No changes to webhook routing, signature validation, project matching, or task scheduling
- No changes to the task schema (instructions stays a string)
- No streaming — one-shot LLM call, sub-2s target

## Task list

1. Add `LocalModel.PreferForCITriage *bool` and `UseForCITriage()` helper, mirroring elaborate
2. Add `enrichCIInstructions` in `internal/api/webhook.go` (or `webhook_llm.go` if it grows)
3. Read recent git log from project_dir via `git log --oneline -n 5` (best-effort, swallow errors)
4. Read CLAUDE.md from project_dir (best-effort)
5. Build a focused prompt: "CI just failed on this project. Here's metadata + recent commits + project context. Produce a 6-12 line investigation plan that names suspect files/commits when you can, otherwise gives concrete starting steps." Plain text out, not JSON.
6. Update `createCIFailureTask` to call enrichment when `s.llm != nil && cfg.LocalModel.UseForCITriage()`. Note: the server doesn't currently see the cfg directly — pass the gate as a setter `SetCITriageEnabled(bool)` from serve.go, OR (simpler) just gate on `s.llm != nil` and let users opt out by not calling `SetLLM`. Going with the simpler option since it matches the elaborate split: same `s.llm` for both, server doesn't track per-feature gates.
7. Wiring in `serve.go`: when `cfg.LocalModel.Endpoint != ""`, `SetLLM(localClient)`. (Already done in Phase 2.) Per-feature opt-out via the `PreferFor*` config flags is read at wire time and could conditionally not call SetLLM, but that gives elaborate/CI an all-or-nothing toggle which is wrong. Better: introduce a separate setter `SetLLMForCITriage` so each feature can be controlled independently.

   Actually, simplest and cleanest: keep one `SetLLM` setter, and gate each call site (`elaborateWithLocal`, `enrichCIInstructions`) by reading a per-feature config flag passed via separate setters. That's getting fiddly. Step back.

   **Final decision:** the per-feature gate doesn't pull its weight in Phase 3. Ship it as: `s.llm != nil` enables both elaborate and CI triage. Users who want elaborate-yes/CI-triage-no can revisit later. The deferred per-feature toggles get added in the post-epic cleanup along with token telemetry — there's no real demand for the asymmetric case yet.

   Revised: drop `PreferForCITriage` entirely; ship a simpler thing.
8. Tests:
   - `enrichCIInstructions` with stub LLM returns the LLM body
   - `enrichCIInstructions` with failing LLM returns `fallback` unchanged
   - `enrichCIInstructions` includes recent git log when project_dir is a real git repo (use `t.TempDir()` + `git init` + a commit)
   - Webhook handler test: LLM configured → instructions reflect LLM output
   - Webhook handler test: LLM not configured → instructions match the existing template byte-for-byte (regression guard)
9. `go build ./... && go test -race ./...`
10. Commit as Phase 3 on the same branch
11. Push

## Stop conditions

- All new tests green under `-race`
- Existing webhook tests pass byte-for-byte when LLM not configured
- Build clean; pushed

---

# Phase 4 — Focused Plan (Execution Summary)

## Scope

`extractSummary` in `internal/executor/summary.go` is text-pattern based: it returns the body following the last `## Summary` heading in any assistant text block. When the agent didn't write one, summary stays empty. This is fine for Claude (which is prompted to write a summary), but not for arbitrary local-runner outputs, and not for cases where Claude exits early or hits a budget cap before the summary section.

Phase 4 adds an LLM-based fallback: when `extractSummary` returns "" and the Pool has an LLM client, synthesize a 2-4 sentence summary from the tail of the stdout log.

## What ships

- New `synthesizeSummary(ctx, *llm.Client, stdoutPath string) string` in `internal/executor/summary.go`. Reads the last ~16 KB of the stdout log, strips stream-json envelopes to extract just the text content, and asks the LLM to summarize.
- New `LLM *llm.Client` field on `executor.Pool` (wired identically to `Classifier.LLM` in Phase 1).
- Hook into `Pool.handleRunResult` at the existing summary block: after `extractSummary` returns "", call `synthesizeSummary` if `p.LLM != nil`.
- Wiring in `cmd/claudomator/main.go` (none — main.go is a thin wrapper), `internal/cli/serve.go`, `internal/cli/run.go`: pass `localClient` to Pool.

## Explicit non-goals

- No changes to the Claude prompt or the `## Summary` extraction (that path stays primary)
- No changes to the storage schema (summary is already a `tasks.summary` TEXT column via `UpdateTaskSummary`)
- No streaming the summary — one-shot 2-4 sentence completion
- No new config knob for "prefer local for summary" — same `s.llm`/`p.LLM` gate applies; users opt out by not setting LocalModel.Endpoint
- No retroactive backfill of summaries on existing executions

## Task list

1. Add `LLM *llm.Client` field on `executor.Pool` (matches the `Classifier` pattern from Phase 1)
2. Implement `synthesizeSummary(ctx, *llm.Client, stdoutPath) string` in `internal/executor/summary.go`. Reads last ~16 KB, parses each line as a stream-json event, joins the assistant text content, calls `Chat` with a 6-second timeout asking for 2-4 sentences plain text. Returns "" on any error so the caller's existing empty-summary path stays unchanged.
3. Modify `Pool.handleRunResult`: after `extractSummary` returns empty, if `p.LLM != nil`, try `synthesizeSummary(ctx, p.LLM, exec.StdoutPath)`. If it returns non-empty, persist via `UpdateTaskSummary`.
4. Wire `Pool.LLM = localClient` in `internal/cli/serve.go` and `internal/cli/run.go`
5. Tests in `internal/executor/summary_test.go` (or a new file):
   - `synthesizeSummary` with stub LLM: stdout.log containing stream-json text → assistant content extracted → LLM called → returned summary
   - `synthesizeSummary` with no `## Summary` heading anywhere → still produces synthesized summary
   - `synthesizeSummary` LLM failure → returns ""
   - `synthesizeSummary` empty stdout file → returns ""
   - Pool integration test: LocalRunner produces a stdout with no `## Summary` section, Pool's LLM is set, after handleRunResult the task's summary is non-empty
6. `go build ./... && go test -race ./...`
7. Commit as Phase 4 on the branch
8. Push

## Stop conditions

- New tests green under `-race`
- Existing tests unchanged (the extractSummary primary path keeps winning whenever a `## Summary` heading exists)
- Build clean; pushed
- Epic complete: `## Local OSS Models as a Third Runner` shipped end-to-end

After Phase 4 lands, execute the post-epic deep cleanup using the queue at the top of this section.