diff options
Diffstat (limited to 'docs/plans')
| -rw-r--r-- | docs/plans/local-oss-runner.md | 50 |
1 files changed, 50 insertions, 0 deletions
diff --git a/docs/plans/local-oss-runner.md b/docs/plans/local-oss-runner.md index c065483..c3d6291 100644 --- a/docs/plans/local-oss-runner.md +++ b/docs/plans/local-oss-runner.md @@ -304,3 +304,53 @@ What the LLM can do with that: produce a tighter, project-aware investigation pr - All new tests green under `-race` - Existing webhook tests pass byte-for-byte when LLM not configured - Build clean; pushed + +--- + +# Phase 4 — Focused Plan (Execution Summary) + +## Scope + +`extractSummary` in `internal/executor/summary.go` is text-pattern based: it returns the body following the last `## Summary` heading in any assistant text block. When the agent didn't write one, summary stays empty. This is fine for Claude (which is prompted to write a summary), but not for arbitrary local-runner outputs, and not for cases where Claude exits early or hits a budget cap before the summary section. + +Phase 4 adds an LLM-based fallback: when `extractSummary` returns "" and the Pool has an LLM client, synthesize a 2-4 sentence summary from the tail of the stdout log. + +## What ships + +- New `synthesizeSummary(ctx, *llm.Client, stdoutPath string) string` in `internal/executor/summary.go`. Reads the last ~16 KB of the stdout log, strips stream-json envelopes to extract just the text content, and asks the LLM to summarize. +- New `LLM *llm.Client` field on `executor.Pool` (wired identically to `Classifier.LLM` in Phase 1). +- Hook into `Pool.handleRunResult` at the existing summary block: after `extractSummary` returns "", call `synthesizeSummary` if `p.LLM != nil`. +- Wiring in `cmd/claudomator/main.go` (none — main.go is a thin wrapper), `internal/cli/serve.go`, `internal/cli/run.go`: pass `localClient` to Pool. + +## Explicit non-goals + +- No changes to the Claude prompt or the `## Summary` extraction (that path stays primary) +- No changes to the storage schema (summary is already a `tasks.summary` TEXT column via `UpdateTaskSummary`) +- No streaming the summary — one-shot 2-4 sentence completion +- No new config knob for "prefer local for summary" — same `s.llm`/`p.LLM` gate applies; users opt out by not setting LocalModel.Endpoint +- No retroactive backfill of summaries on existing executions + +## Task list + +1. Add `LLM *llm.Client` field on `executor.Pool` (matches the `Classifier` pattern from Phase 1) +2. Implement `synthesizeSummary(ctx, *llm.Client, stdoutPath) string` in `internal/executor/summary.go`. Reads last ~16 KB, parses each line as a stream-json event, joins the assistant text content, calls `Chat` with a 6-second timeout asking for 2-4 sentences plain text. Returns "" on any error so the caller's existing empty-summary path stays unchanged. +3. Modify `Pool.handleRunResult`: after `extractSummary` returns empty, if `p.LLM != nil`, try `synthesizeSummary(ctx, p.LLM, exec.StdoutPath)`. If it returns non-empty, persist via `UpdateTaskSummary`. +4. Wire `Pool.LLM = localClient` in `internal/cli/serve.go` and `internal/cli/run.go` +5. Tests in `internal/executor/summary_test.go` (or a new file): + - `synthesizeSummary` with stub LLM: stdout.log containing stream-json text → assistant content extracted → LLM called → returned summary + - `synthesizeSummary` with no `## Summary` heading anywhere → still produces synthesized summary + - `synthesizeSummary` LLM failure → returns "" + - `synthesizeSummary` empty stdout file → returns "" + - Pool integration test: LocalRunner produces a stdout with no `## Summary` section, Pool's LLM is set, after handleRunResult the task's summary is non-empty +6. `go build ./... && go test -race ./...` +7. Commit as Phase 4 on the branch +8. Push + +## Stop conditions + +- New tests green under `-race` +- Existing tests unchanged (the extractSummary primary path keeps winning whenever a `## Summary` heading exists) +- Build clean; pushed +- Epic complete: `## Local OSS Models as a Third Runner` shipped end-to-end + +After Phase 4 lands, execute the post-epic deep cleanup using the queue at the top of this section. |
