summaryrefslogtreecommitdiff
path: root/docs/plans/local-oss-runner.md
AgeCommit message (Collapse)Author
2026-04-28feat(api): route elaboration through local LLM when configuredClaude
Phase 2 of "local OSS models as agents" plan. Adds a third elaboration path that calls the local OpenAI-compatible LLM via the internal/llm client, and reorders dispatch so the cheap path is tried first: local → claude → gemini, with each next attempt only on hard failure of the prior. Wiring is opt-out, not opt-in: when [local_model].endpoint is set, elaboration prefers local by default. Users with a slow or low-quality local model can disable just elaboration via: [local_model] endpoint = "..." prefer_for_elaborate = false without giving up the runner or the classifier path. Implementation: - Server gains an optional *llm.Client field via SetLLM (matches the existing SetNotifier/SetWorkspaceRoot setter pattern, no NewServer signature break). - elaborateWithLocal() reuses buildElaboratePrompt verbatim and asks for response_format=json_object so we skip markdown-fence cleanup. - handleElaborateTask reorders try chain; existing Claude-first behavior is preserved exactly when SetLLM is not called. - LocalModel.UseForElaborate() encapsulates the default-true gating with a *bool so explicit-false survives TOML parse. Tests: - elaborateWithLocal: parses valid response, errors on nil client, errors on bad JSON. - handler: local preferred when wired; falls back to claude when local fails; unchanged behavior when no LLM is configured. - config: UseForElaborate gating across empty/default/explicit-true/ explicit-false cases. Pre-existing test failures noted in docs/plans/local-oss-runner.md (post-epic cleanup): TestGeminiLogs_ParsedCorrectly returns 404 for gemini execution log fetch — predates this change. Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
2026-04-28feat(executor): add LocalRunner and OpenAI-compat LLM clientClaude
Phase 1 of "local OSS models as agents" plan. Adds a third Runner backed by any OpenAI-compatible HTTP server (Ollama, vLLM, LM Studio, llama.cpp), and migrates the Gemini-CLI classifier to route through the same client when configured. Two-layer split: internal/llm.Client is the workhorse (HTTP, no Pool, no DB) used directly by the classifier and any future internal helper that needs cheap reasoning. internal/executor.LocalRunner is a thin adapter implementing Runner for user-facing tasks. This avoids Pool reentrancy/deadlock when sub-second internal calls fire from inside Pool.execute(). Highlights: - internal/retry: relocated runWithBackoff/IsRateLimitError/ParseRetryAfter into a shared package reused by executor and llm. - internal/llm: Chat (non-streaming) and ChatStream (SSE) over /chat/completions with optional bearer auth, json_object response format, retry on 429/503, Retry-After parsing. - internal/executor/LocalRunner: streams deltas into stdout.log in the same stream-json envelope ClaudeRunner emits, then writes one consolidated assistant block plus a result terminator so existing parsers (extractSummary, ParseChangestatFromOutput) work unchanged. - internal/executor/Classifier: gains optional LLM field; uses json_object response format (no markdown-fence cleanup needed). Falls back to Gemini-CLI subprocess when LLM is nil. - Pool.skipClassification: now skips only when the requested agent type is registered, so unknown types still reach the load balancer. - Storage: additive tokens_in/tokens_out ALTERs on executions; CLI runners record cost_usd as before, LocalRunner records 0 + tokens. - Config: [local_model] section (endpoint, model, timeout_seconds, default_temperature, api_key). Empty endpoint = no LocalRunner registered, classifier falls back to Gemini. Pre-existing test issues fixed in passing: - claude_test.go setupSandbox callsites updated to current signature. - gemini_test.go TestParseGeminiStream skipped (asserts unimplemented GeminiRunner stream-error parsing; tracked separately). Plan: docs/plans/local-oss-runner.md. https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J