summaryrefslogtreecommitdiff
path: root/docs/adr/004-multi-agent-routing-and-classification.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/adr/004-multi-agent-routing-and-classification.md')
-rw-r--r--docs/adr/004-multi-agent-routing-and-classification.md107
1 files changed, 107 insertions, 0 deletions
diff --git a/docs/adr/004-multi-agent-routing-and-classification.md b/docs/adr/004-multi-agent-routing-and-classification.md
new file mode 100644
index 0000000..7afb10d
--- /dev/null
+++ b/docs/adr/004-multi-agent-routing-and-classification.md
@@ -0,0 +1,107 @@
+# ADR-004: Multi-Agent Routing and Gemini-Based Classification
+
+## Status
+Accepted
+
+## Context
+
+Claudomator started as a Claude-only system. As Gemini became a viable coding
+agent, the architecture needed to support multiple agent backends without requiring
+operators to manually select an agent or model for each task.
+
+Two distinct problems needed solving:
+
+1. **Which agent should run this task?** — Claude and Gemini have different API
+ quotas and rate limits. When Claude is rate-limited, tasks should flow to
+ Gemini automatically.
+2. **Which model tier should the agent use?** — Both agents offer a spectrum from
+ fast/cheap to slow/powerful models. Using the wrong tier wastes money or
+ produces inferior results.
+
+## Decision
+
+The two problems are solved independently:
+
+### Agent selection: explicit load balancing in code (`pickAgent`)
+
+`pickAgent(SystemStatus)` selects the agent with the fewest active tasks,
+preferring non-rate-limited agents. The algorithm is:
+
+1. First pass: consider only non-rate-limited agents; pick the one with the
+ fewest active tasks (alphabetical tie-break for determinism).
+2. Fallback: if all agents are rate-limited, pick the least-active regardless
+ of rate-limit status.
+
+This is deterministic code, not an AI call. It runs in-process with no I/O and
+cannot fail in ways that would block task execution.
+
+### Model selection: Gemini-based classifier (`Classifier`)
+
+Once the agent is selected, `Classifier.Classify` invokes the Gemini CLI with
+`gemini-2.5-flash-lite` to select the best model tier for the task. The classifier
+receives the task name, instructions, and the required agent type, and returns
+a `Classification` with `agent_type`, `model`, and `reason`.
+
+The classifier uses a cheap, fast model for classification to minimise the cost
+overhead. The response is parsed from JSON, with fallback handling for markdown
+code blocks and credential noise in the output.
+
+### Separation of concerns
+
+These two decisions were initially merged (the classifier picked both agent and
+model). They were separated in commit `e033504` because:
+
+- Load balancing must be reliable even when the Gemini API is unavailable.
+- Classifier failures are non-fatal: if classification fails, the pool logs the
+ error and proceeds with the agent's default model.
+
+### Re-classification on manual restart
+
+When an operator manually restarts a task from a non-`QUEUED` state (e.g. `FAILED`,
+`CANCELLED`), the task goes through `execute()` again and is re-classified. This
+ensures restarts pick up any changes to agent availability or rate-limit status.
+
+## Rationale
+
+- **AI-picks-model**: the model selection decision is genuinely complex and
+ subjective. Using an AI classifier avoids hardcoding heuristics that would need
+ constant tuning.
+- **Code-picks-agent**: load balancing is a scheduling problem with measurable
+ inputs (active task counts, rate-limit deadlines). Delegating this to an AI
+ would introduce unnecessary non-determinism and latency.
+- **Gemini for classification**: using Gemini to classify Claude tasks (and vice
+ versa) prevents circular dependencies. Using the cheapest available Gemini model
+ keeps classification cost negligible.
+
+## Alternatives Considered
+
+- **Operator always picks agent and model**: too much manual overhead. Operators
+ should be able to submit tasks without knowing which agent is currently
+ rate-limited.
+- **Single classifier picks both agent and model**: rejected after operational
+ experience showed that load balancing needs to work even when the Gemini API
+ is unavailable or returning errors.
+- **Round-robin agent selection**: simpler but does not account for rate limits
+ or imbalanced task durations.
+
+## Consequences
+
+- Agent selection is deterministic and testable without mocking AI APIs.
+- Classification failures are logged but non-fatal; the task runs with the
+ agent's default model.
+- The classifier adds ~1–2 seconds of latency to task start (one Gemini API call).
+- Tasks with `agent.type` pre-set in YAML still go through load balancing;
+ `pickAgent` may override the requested type if the requested type is not a
+ registered runner. This is by design: the operator's type hint is overridden
+ by the load balancer to ensure tasks are always routable.
+
+## Relevant Code Locations
+
+| Concern | File |
+|---|---|
+| `pickAgent` | `internal/executor/executor.go` |
+| `Classifier` | `internal/executor/classifier.go` |
+| Load balancing in `execute()` | `internal/executor/executor.go` |
+| Re-classification gate | `internal/api/server.go` (handleRunTask) |
+| `pickAgent` tests | `internal/executor/executor_test.go` |
+| `Classifier` mock test | `internal/executor/classifier_test.go` |