| Age | Commit message (Collapse) | Author |
|
Phase 1 of "local OSS models as agents" plan. Adds a third Runner
backed by any OpenAI-compatible HTTP server (Ollama, vLLM, LM Studio,
llama.cpp), and migrates the Gemini-CLI classifier to route through
the same client when configured.
Two-layer split: internal/llm.Client is the workhorse (HTTP, no Pool,
no DB) used directly by the classifier and any future internal helper
that needs cheap reasoning. internal/executor.LocalRunner is a thin
adapter implementing Runner for user-facing tasks. This avoids
Pool reentrancy/deadlock when sub-second internal calls fire from
inside Pool.execute().
Highlights:
- internal/retry: relocated runWithBackoff/IsRateLimitError/ParseRetryAfter
into a shared package reused by executor and llm.
- internal/llm: Chat (non-streaming) and ChatStream (SSE) over
/chat/completions with optional bearer auth, json_object response
format, retry on 429/503, Retry-After parsing.
- internal/executor/LocalRunner: streams deltas into stdout.log in the
same stream-json envelope ClaudeRunner emits, then writes one
consolidated assistant block plus a result terminator so existing
parsers (extractSummary, ParseChangestatFromOutput) work unchanged.
- internal/executor/Classifier: gains optional LLM field; uses
json_object response format (no markdown-fence cleanup needed).
Falls back to Gemini-CLI subprocess when LLM is nil.
- Pool.skipClassification: now skips only when the requested agent
type is registered, so unknown types still reach the load balancer.
- Storage: additive tokens_in/tokens_out ALTERs on executions; CLI
runners record cost_usd as before, LocalRunner records 0 + tokens.
- Config: [local_model] section (endpoint, model, timeout_seconds,
default_temperature, api_key). Empty endpoint = no LocalRunner
registered, classifier falls back to Gemini.
Pre-existing test issues fixed in passing:
- claude_test.go setupSandbox callsites updated to current signature.
- gemini_test.go TestParseGeminiStream skipped (asserts unimplemented
GeminiRunner stream-error parsing; tracked separately).
Plan: docs/plans/local-oss-runner.md.
https://claude.ai/code/session_017Edeq947TpSm1vQTxMhi1J
|
|
pickAgent() deterministically selects the agent with the fewest active tasks,
skipping rate-limited agents. The classifier now only selects the model for the
pre-assigned agent, so Gemini gets tasks from the start rather than only as a
fallback when Claude's quota is exhausted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Updated isQuotaExhausted to detect more Claude quota messages. Added 'rate limit reached (rejected)' to quota exhausted checks. Strengthened classifier prompt to explicitly forbid selecting rate-limited agents. Improved Pool to set 5h rate limit on quota exhaustion.
|
|
|
|
Updated parseStream to detect 'rate_limit_event' and 'assistant' error:rate_limit messages from the Claude CLI. Updated Classifier to strongly prefer non-rate-limited agents. Added logging to Pool to track rate-limit status during classification.
|
|
Update the default Gemini model and classification prompt to use gemini-2.5-flash-lite, which is the current available model. Improved the classifier's parsing logic to correctly handle the JSON envelope returned by the gemini CLI (stripping 'response' wrapper and 'Loaded cached credentials' noise).
|
|
- Add Classifier using gemini-2.0-flash-lite to automatically select agent/model.
- Update Pool to track per-agent active tasks and rate limit status.
- Enable classification for all tasks (top-level and subtasks).
- Refine SystemStatus to be dynamic across all supported agents.
- Add unit tests for the classifier and updated pool logic.
- Minor UI improvements for project selection and 'Start Next' action.
|