# ADR-004: Multi-Agent Routing and Gemini-Based Classification ## Status Accepted ## Context Claudomator started as a Claude-only system. As Gemini became a viable coding agent, the architecture needed to support multiple agent backends without requiring operators to manually select an agent or model for each task. Two distinct problems needed solving: 1. **Which agent should run this task?** — Claude and Gemini have different API quotas and rate limits. When Claude is rate-limited, tasks should flow to Gemini automatically. 2. **Which model tier should the agent use?** — Both agents offer a spectrum from fast/cheap to slow/powerful models. Using the wrong tier wastes money or produces inferior results. ## Decision The two problems are solved independently: ### Agent selection: explicit load balancing in code (`pickAgent`) `pickAgent(SystemStatus)` selects the agent with the fewest active tasks, preferring non-rate-limited agents. The algorithm is: 1. First pass: consider only non-rate-limited agents; pick the one with the fewest active tasks (alphabetical tie-break for determinism). 2. Fallback: if all agents are rate-limited, pick the least-active regardless of rate-limit status. This is deterministic code, not an AI call. It runs in-process with no I/O and cannot fail in ways that would block task execution. ### Model selection: Gemini-based classifier (`Classifier`) Once the agent is selected, `Classifier.Classify` invokes the Gemini CLI with `gemini-2.5-flash-lite` to select the best model tier for the task. The classifier receives the task name, instructions, and the required agent type, and returns a `Classification` with `agent_type`, `model`, and `reason`. The classifier uses a cheap, fast model for classification to minimise the cost overhead. The response is parsed from JSON, with fallback handling for markdown code blocks and credential noise in the output. ### Separation of concerns These two decisions were initially merged (the classifier picked both agent and model). They were separated in commit `e033504` because: - Load balancing must be reliable even when the Gemini API is unavailable. - Classifier failures are non-fatal: if classification fails, the pool logs the error and proceeds with the agent's default model. ### Re-classification on manual restart When an operator manually restarts a task from a non-`QUEUED` state (e.g. `FAILED`, `CANCELLED`), the task goes through `execute()` again and is re-classified. This ensures restarts pick up any changes to agent availability or rate-limit status. ## Rationale - **AI-picks-model**: the model selection decision is genuinely complex and subjective. Using an AI classifier avoids hardcoding heuristics that would need constant tuning. - **Code-picks-agent**: load balancing is a scheduling problem with measurable inputs (active task counts, rate-limit deadlines). Delegating this to an AI would introduce unnecessary non-determinism and latency. - **Gemini for classification**: using Gemini to classify Claude tasks (and vice versa) prevents circular dependencies. Using the cheapest available Gemini model keeps classification cost negligible. ## Alternatives Considered - **Operator always picks agent and model**: too much manual overhead. Operators should be able to submit tasks without knowing which agent is currently rate-limited. - **Single classifier picks both agent and model**: rejected after operational experience showed that load balancing needs to work even when the Gemini API is unavailable or returning errors. - **Round-robin agent selection**: simpler but does not account for rate limits or imbalanced task durations. ## Consequences - Agent selection is deterministic and testable without mocking AI APIs. - Classification failures are logged but non-fatal; the task runs with the agent's default model. - The classifier adds ~1–2 seconds of latency to task start (one Gemini API call). - Tasks with `agent.type` pre-set in YAML still go through load balancing; `pickAgent` may override the requested type if the requested type is not a registered runner. This is by design: the operator's type hint is overridden by the load balancer to ensure tasks are always routable. ## Relevant Code Locations | Concern | File | |---|---| | `pickAgent` | `internal/executor/executor.go` | | `Classifier` | `internal/executor/classifier.go` | | Load balancing in `execute()` | `internal/executor/executor.go` | | Re-classification gate | `internal/api/server.go` (handleRunTask) | | `pickAgent` tests | `internal/executor/executor_test.go` | | `Classifier` mock test | `internal/executor/classifier_test.go` |