# Improvement Plan

Comprehensive assessment of stability, architectural coherence, simplicity and cleanliness issues, with prioritized fixes.

---

## Critical Bugs (fix immediately)

### 1. `sandboxDir` variable shadowing — git clone failures

**File:** `internal/executor/claude.go`, lines ~119–137

Both `setupSandbox` calls inside `Run()` use short variable declarations (`:=`) inside nested blocks, creating inner variables that shadow the outer `var sandboxDir string`. The outer variable is always `""`.

**Effects:**
- `teardownSandbox` is never called → agent work discarded, changes never pushed to origin
- Sandboxes accumulate in `/tmp/claudomator-sandbox-*` until disk is full
- BLOCKED task resume always clones fresh → loses agent's partial work from first run
- Error messages omit sandbox path, making debugging harder

**Fix:**

```go
// In the resume path (inside `if projectDir != ""`):
var err error
sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger)  // = not :=

// In the non-resume path (inside `else if projectDir != ""`):
var err error
sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger)  // = not :=
```

Both sites already declare `var err error` on the line above. Simply drop the `:` from `:=`.

### 2. Hardcoded `master` branch in sandbox teardown

**File:** `internal/executor/claude.go`, `teardownSandbox()`, line ~377

```go
git pull --rebase origin master   // fails on repos using "main"
```

**Fix:** Detect the remote default branch before pushing:

```go
branch, _ := exec.Command("git", "-C", sandboxDir, "rev-parse", "--abbrev-ref", "HEAD").Output()
defaultBranch := strings.TrimSpace(string(branch))
if defaultBranch == "" || defaultBranch == "HEAD" {
    defaultBranch = "main"
}
// then use defaultBranch instead of "master"
```

---

## High Priority — Stability & Correctness

### 3. GeminiRunner is a non-functional stub in production code

**File:** `internal/executor/gemini.go`

`execOnce()` spawns a goroutine writing hardcoded JSON to a pipe instead of running the `gemini` binary. Any task with `agent.type: "gemini"` silently returns fake output.

**Options (choose one):**
- **Implement properly:** Mirror ClaudeRunner structure — subprocess + process group + stream parsing + sandbox support. GeminiRunner must also implement `LogPather`.
- **Remove until ready:** Delete `gemini.go`, remove `GeminiRunner` from `NewPool` in `cli/serve.go`, and gate the `Classifier` on Claude-only operation. Add `// TODO: GeminiRunner not implemented` clearly.

The stub in production is a correctness hazard. Removing it is safer than leaving it in place.

### 4. Atomic execution creation + state update

**Files:** `internal/executor/executor.go` `execute()`, `internal/storage/db.go`

`CreateExecution` and `UpdateTaskState(RUNNING)` are two separate DB writes with no transaction. A crash between them leaves the task PENDING with an orphaned RUNNING execution record. `RecoverStaleRunning` handles the RUNNING-task case but not the PENDING-task-with-running-execution case.

**Fix:** Add `storage.DB.CreateExecutionAndSetRunning(exec, taskID)` that wraps both writes in a single transaction.

### 5. `context.Background()` in resume submission

**File:** `internal/api/server.go`, `handleAnswerQuestion()`

```go
p.SubmitResume(context.Background(), task, exec)
```

This detaches the resume from any server lifecycle context. If the server has a root context with a shutdown deadline, resumes submitted this way will not be cancelled.

**Fix:** Pass the server's root context (or a derived context) instead of `Background()`.

### 6. Pool `dispatch()` goroutine leaks on server shutdown

**File:** `internal/executor/executor.go`

`NewPool` starts `go p.dispatch()` but `workCh` is never closed. On server shutdown, `dispatch()` blocks forever on `<-p.workCh`. The pool has no `Shutdown()` method.

**Fix:** Add `Pool.Shutdown(ctx context.Context)` that closes `workCh` and waits for active workers to drain.

---

## Medium Priority — Architectural Coherence

### 7. Priority field is stored but never dispatched

**Files:** `internal/task/task.go`, `internal/executor/executor.go`

Priority is parsed, stored in SQLite, returned via API — but `dispatch()` uses a single FIFO channel, ignoring priority entirely.

**Options:**
- Use a priority queue (e.g., `container/heap`) keyed by `Priority` for `workCh`
- Or remove `Priority` from `Task` and documentation until implemented — don't ship half-features

### 8. RetryConfig is stored but retry is manual

**Files:** `internal/task/task.go`, `internal/executor/executor.go`

`RetryConfig.MaxAttempts` and `Backoff` are stored but never read. Every "retry" is a manual user action via `/resume`.

**Options:**
- Implement automatic retry in `handleRunResult` when `MaxAttempts > 1` and `currentAttempt < MaxAttempts`
- Or remove `RetryConfig` from the schema/API until implemented

### 9. Changestats extracted in two places

**Files:** `internal/executor/executor.go` `handleRunResult()`, `internal/api/server.go` `processResult()`

Both call `task.ParseChangestatFromFile()` on the same file and write to the same row. The second write is idempotent but the duplication is confusing.

**Fix:** Remove the extraction from `api.Server.processResult()`. The pool already handles it reliably before broadcasting the result.

### 10. `withFailureHistory` shallow copy is unsafe

**File:** `internal/executor/executor.go`

```go
copy := *t      // copies struct
copy.Agent = t.Agent  // copies AgentConfig struct, slices share backing array
```

If any downstream code appends to `copy.Agent.AllowedTools`, it will mutate the original. The current code only writes to `SystemPromptAppend` (a string), so this is safe today but fragile.

**Fix:** Deep-copy slices in AgentConfig:
```go
agent := t.Agent
agent.AllowedTools = append([]string(nil), t.Agent.AllowedTools...)
agent.DisallowedTools = append([]string(nil), t.Agent.DisallowedTools...)
agent.ContextFiles = append([]string(nil), t.Agent.ContextFiles...)
agent.AdditionalArgs = append([]string(nil), t.Agent.AdditionalArgs...)
copy.Agent = agent
```

### 11. Additive migration strategy has no version tracking

**File:** `internal/storage/db.go` `migrate()`

Migrations are applied in code order; the only idempotency guard is ignoring "column already exists" errors. There is no way to detect if a migration was partially applied, rolled back incorrectly, or was applied out of order by a different code version.

**Fix (minimal):** Use a `schema_migrations` table with a version counter. Apply only migrations with version > current max. This prevents re-running migrations and makes the migration history explicit.

### 12. Server test-seam fields in production struct

**File:** `internal/api/server.go`

```go
elaborateCmdPath string   // overrides claudeBinPath in tests
validateCmdPath  string   // overrides claudeBinPath in tests
```

These exist solely to inject test paths. They are an internal testing seam leaked into the production struct.

**Fix:** Extract an `Elaborator` interface (e.g., `Elaborate(ctx, request, workDir string) (string, error)`) and inject it. Tests inject a fake; production builds inject the real Claude-backed implementation.

---

## Lower Priority — Simplicity & Cleanliness

### 13. Classifier output parsing is brittle string manipulation

**File:** `internal/executor/classifier.go`

The classifier strips `"Loaded cached credentials."` by line scanning, strips markdown fences by `strings.TrimPrefix/TrimSuffix`, and falls back through multiple parsing paths. This will silently break if Gemini CLI output format changes.

**Fix:** Request `--output-format json` and parse the structured response. Treat any parse failure as a classification error (fall back to default model), and log the raw output for debugging.

### 14. `gitSafe` is inconsistently applied in teardownSandbox

**File:** `internal/executor/claude.go`

`setupSandbox` wraps all git commands with `gitSafe(...)` (adds `-c safe.directory=*`). `teardownSandbox` calls `git` directly without `gitSafe` on most commands. This means teardown can fail with "dubious ownership" errors in mixed-owner environments.

**Fix:** Use `gitSafe` consistently in `teardownSandbox`.

### 15. `tailFile` loads entire file into memory

**File:** `internal/executor/claude.go`

```go
func tailFile(path string, n int) string {
    // scans all lines, keeps last n
}
```

For large log files this reads everything before discarding early lines. Fine in practice (stderr is typically small) but fragile if anything large is written to stderr.

**Fix:** Use a circular buffer of n lines to avoid O(filesize) memory.

### 16. `execute()` and `executeResume()` duplicate context/cancel/defer setup

**File:** `internal/executor/executor.go`

Both methods contain identical blocks for: context timeout + cancel setup, `activePerAgent` increment/decrement, `doneCh` signal, cancel registration/deregistration. This is ~40 lines duplicated.

**Fix:** Extract `withWorkerSlot(ctx, t, fn func(ctx context.Context) error) error` that handles the common setup and calls `fn`. Both paths become 5–10 lines.

### 17. `QuestionRegistry` and `QuestionHandler` are unused

**Files:** `internal/executor/question.go`, `internal/executor/executor.go`

`QuestionRegistry` is created in `NewPool` and stored on `Pool.Questions` but nothing reads from it. `QuestionHandler` interface exists but is never instantiated or wired to anything. The intended design (pool detects `tool_use` and invokes a handler) is not implemented.

**Fix:** Remove `QuestionRegistry`, `QuestionHandler`, and `Pool.Questions` until the feature is actually needed. The current question flow (agent writes file, pool reads BlockedError) already works without them.

### 18. Script registry endpoint is undocumented and empty

**File:** `internal/api/scripts.go`

The `ScriptRegistry` type and `POST /api/scripts/{name}` endpoint are defined but the registry is never populated from config. There are no example scripts. The endpoint exists in production code but can't be used.

**Fix:** Either populate scripts from config (`[[scripts]]` entries in config.toml) with documentation, or remove the endpoint until it's ready.

### 19. `parseStream` ignores scanner errors

**File:** `internal/executor/claude.go`

`parseStream()` loops over `scanner.Scan()` but never checks `scanner.Err()` after the loop. If the scanner hits a read error mid-stream, it silently stops. The function returns `nil` error, making the task appear to complete successfully when it may have lost output.

**Fix:**
```go
if err := scanner.Err(); err != nil {
    if streamErr == nil {
        streamErr = fmt.Errorf("reading claude stdout: %w", err)
    }
}
```

### 20. Log files have no size limit or rotation

**Files:** `internal/executor/claude.go`, `internal/storage/db.go`

Execution logs in `~/.claudomator/executions/` have no size limit, rotation, or TTL. Long-running or repeatedly-retried tasks can produce arbitrarily large log files. Old executions are never cleaned up.

**Fix (minimal):** Add a `CleanupOldExecutions(olderThan time.Duration)` method to `storage.DB` and call it on server startup (or on a ticker). Delete execution log directories for executions older than the TTL.

---

## Summary by Effort

| # | Issue | Effort | Impact |
|---|-------|--------|--------|
| 1 | `sandboxDir` shadowing | XS (2 chars: `=` not `:=`) | Critical — root cause of git clone failures |
| 2 | Hardcoded `master` branch | S | High — breaks all `main`-branch repos |
| 3 | GeminiRunner stub | M–L | High — silent wrong output in production |
| 4 | Atomic exec creation | S | Medium — crash-recovery edge case |
| 5 | context.Background() resume | XS | Low — shutdown correctness |
| 6 | Pool shutdown | S | Medium — goroutine leak on shutdown |
| 7 | Priority dispatch | M | Medium — misleading API |
| 8 | RetryConfig unused | S | Low — misleading API |
| 9 | Changestats duplication | XS | Low — confusing code |
| 10 | Shallow copy in withFailureHistory | S | Low — latent bug |
| 11 | Migration versioning | M | Medium — operational risk |
| 12 | Test seams in Server struct | M | Low — architecture cleanliness |
| 13 | Brittle classifier parsing | S | Medium — silent classification failures |
| 14 | gitSafe inconsistency | XS | Medium — owner errors in teardown |
| 15 | tailFile memory | XS | Low — theoretical |
| 16 | execute/executeResume duplication | M | Low — maintainability |
| 17 | QuestionRegistry dead code | XS | Low — confusion |
| 18 | Script registry empty | S | Low — dead endpoint |
| 19 | parseStream ignores scanner.Err | XS | Medium — silent data loss |
| 20 | Log file cleanup | S | Medium — disk usage |