docs/IMPROVEMENT_PLAN.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278

# Improvement Plan

Comprehensive assessment of stability, architectural coherence, simplicity and cleanliness issues, with prioritized fixes.

---

## Critical Bugs (fix immediately)

### 1. `sandboxDir` variable shadowing — git clone failures

**File:** `internal/executor/claude.go`, lines ~119–137

Both `setupSandbox` calls inside `Run()` use short variable declarations (`:=`) inside nested blocks, creating inner variables that shadow the outer `var sandboxDir string`. The outer variable is always `""`.

**Effects:**
- `teardownSandbox` is never called → agent work discarded, changes never pushed to origin
- Sandboxes accumulate in `/tmp/claudomator-sandbox-*` until disk is full
- BLOCKED task resume always clones fresh → loses agent's partial work from first run
- Error messages omit sandbox path, making debugging harder

**Fix:**

```go
// In the resume path (inside `if projectDir != ""`):
var err error
sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger)  // = not :=

// In the non-resume path (inside `else if projectDir != ""`):
var err error
sandboxDir, err = setupSandbox(t.Agent.ProjectDir, r.Logger)  // = not :=
```

Both sites already declare `var err error` on the line above. Simply drop the `:` from `:=`.

### 2. Hardcoded `master` branch in sandbox teardown

**File:** `internal/executor/claude.go`, `teardownSandbox()`, line ~377

```go
git pull --rebase origin master   // fails on repos using "main"
```

**Fix:** Detect the remote default branch before pushing:

```go
branch, _ := exec.Command("git", "-C", sandboxDir, "rev-parse", "--abbrev-ref", "HEAD").Output()
defaultBranch := strings.TrimSpace(string(branch))
if defaultBranch == "" || defaultBranch == "HEAD" {
    defaultBranch = "main"
}
// then use defaultBranch instead of "master"
```

---

## High Priority — Stability & Correctness

### 3. GeminiRunner is a non-functional stub in production code

**File:** `internal/executor/gemini.go`

`execOnce()` spawns a goroutine writing hardcoded JSON to a pipe instead of running the `gemini` binary. Any task with `agent.type: "gemini"` silently returns fake output.

**Options (choose one):**
- **Implement properly:** Mirror ClaudeRunner structure — subprocess + process group + stream parsing + sandbox support. GeminiRunner must also implement `LogPather`.
- **Remove until ready:** Delete `gemini.go`, remove `GeminiRunner` from `NewPool` in `cli/serve.go`, and gate the `Classifier` on Claude-only operation. Add `// TODO: GeminiRunner not implemented` clearly.

The stub in production is a correctness hazard. Removing it is safer than leaving it in place.

### 4. Atomic execution creation + state update

**Files:** `internal/executor/executor.go` `execute()`, `internal/storage/db.go`

`CreateExecution` and `UpdateTaskState(RUNNING)` are two separate DB writes with no transaction. A crash between them leaves the task PENDING with an orphaned RUNNING execution record. `RecoverStaleRunning` handles the RUNNING-task case but not the PENDING-task-with-running-execution case.

**Fix:** Add `storage.DB.CreateExecutionAndSetRunning(exec, taskID)` that wraps both writes in a single transaction.

### 5. `context.Background()` in resume submission

**File:** `internal/api/server.go`, `handleAnswerQuestion()`

```go
p.SubmitResume(context.Background(), task, exec)
```

This detaches the resume from any server lifecycle context. If the server has a root context with a shutdown deadline, resumes submitted this way will not be cancelled.

**Fix:** Pass the server's root context (or a derived context) instead of `Background()`.

### 6. Pool `dispatch()` goroutine leaks on server shutdown

**File:** `internal/executor/executor.go`

`NewPool` starts `go p.dispatch()` but `workCh` is never closed. On server shutdown, `dispatch()` blocks forever on `<-p.workCh`. The pool has no `Shutdown()` method.

**Fix:** Add `Pool.Shutdown(ctx context.Context)` that closes `workCh` and waits for active workers to drain.

---

## Medium Priority — Architectural Coherence

### 7. Priority field is stored but never dispatched

**Files:** `internal/task/task.go`, `internal/executor/executor.go`

Priority is parsed, stored in SQLite, returned via API — but `dispatch()` uses a single FIFO channel, ignoring priority entirely.

**Options:**
- Use a priority queue (e.g., `container/heap`) keyed by `Priority` for `workCh`
- Or remove `Priority` from `Task` and documentation until implemented — don't ship half-features

### 8. RetryConfig is stored but retry is manual

**Files:** `internal/task/task.go`, `internal/executor/executor.go`

`RetryConfig.MaxAttempts` and `Backoff` are stored but never read. Every "retry" is a manual user action via `/resume`.

**Options:**
- Implement automatic retry in `handleRunResult` when `MaxAttempts > 1` and `currentAttempt < MaxAttempts`
- Or remove `RetryConfig` from the schema/API until implemented

### 9. Changestats extracted in two places

**Files:** `internal/executor/executor.go` `handleRunResult()`, `internal/api/server.go` `processResult()`

Both call `task.ParseChangestatFromFile()` on the same file and write to the same row. The second write is idempotent but the duplication is confusing.

**Fix:** Remove the extraction from `api.Server.processResult()`. The pool already handles it reliably before broadcasting the result.

### 10. `withFailureHistory` shallow copy is unsafe

**File:** `internal/executor/executor.go`

```go
copy := *t      // copies struct
copy.Agent = t.Agent  // copies AgentConfig struct, slices share backing array
```

If any downstream code appends to `copy.Agent.AllowedTools`, it will mutate the original. The current code only writes to `SystemPromptAppend` (a string), so this is safe today but fragile.

**Fix:** Deep-copy slices in AgentConfig:
```go
agent := t.Agent
agent.AllowedTools = append([]string(nil), t.Agent.AllowedTools...)
agent.DisallowedTools = append([]string(nil), t.Agent.DisallowedTools...)
agent.ContextFiles = append([]string(nil), t.Agent.ContextFiles...)
agent.AdditionalArgs = append([]string(nil), t.Agent.AdditionalArgs...)
copy.Agent = agent
```

### 11. Additive migration strategy has no version tracking

**File:** `internal/storage/db.go` `migrate()`

Migrations are applied in code order; the only idempotency guard is ignoring "column already exists" errors. There is no way to detect if a migration was partially applied, rolled back incorrectly, or was applied out of order by a different code version.

**Fix (minimal):** Use a `schema_migrations` table with a version counter. Apply only migrations with version > current max. This prevents re-running migrations and makes the migration history explicit.

### 12. Server test-seam fields in production struct

**File:** `internal/api/server.go`

```go
elaborateCmdPath string   // overrides claudeBinPath in tests
validateCmdPath  string   // overrides claudeBinPath in tests
```

These exist solely to inject test paths. They are an internal testing seam leaked into the production struct.

**Fix:** Extract an `Elaborator` interface (e.g., `Elaborate(ctx, request, workDir string) (string, error)`) and inject it. Tests inject a fake; production builds inject the real Claude-backed implementation.

---

## Lower Priority — Simplicity & Cleanliness

### 13. Classifier output parsing is brittle string manipulation

**File:** `internal/executor/classifier.go`

The classifier strips `"Loaded cached credentials."` by line scanning, strips markdown fences by `strings.TrimPrefix/TrimSuffix`, and falls back through multiple parsing paths. This will silently break if Gemini CLI output format changes.

**Fix:** Request `--output-format json` and parse the structured response. Treat any parse failure as a classification error (fall back to default model), and log the raw output for debugging.

### 14. `gitSafe` is inconsistently applied in teardownSandbox

**File:** `internal/executor/claude.go`

`setupSandbox` wraps all git commands with `gitSafe(...)` (adds `-c safe.directory=*`). `teardownSandbox` calls `git` directly without `gitSafe` on most commands. This means teardown can fail with "dubious ownership" errors in mixed-owner environments.

**Fix:** Use `gitSafe` consistently in `teardownSandbox`.

### 15. `tailFile` loads entire file into memory

**File:** `internal/executor/claude.go`

```go
func tailFile(path string, n int) string {
    // scans all lines, keeps last n
}
```

For large log files this reads everything before discarding early lines. Fine in practice (stderr is typically small) but fragile if anything large is written to stderr.

**Fix:** Use a circular buffer of n lines to avoid O(filesize) memory.

### 16. `execute()` and `executeResume()` duplicate context/cancel/defer setup

**File:** `internal/executor/executor.go`

Both methods contain identical blocks for: context timeout + cancel setup, `activePerAgent` increment/decrement, `doneCh` signal, cancel registration/deregistration. This is ~40 lines duplicated.

**Fix:** Extract `withWorkerSlot(ctx, t, fn func(ctx context.Context) error) error` that handles the common setup and calls `fn`. Both paths become 5–10 lines.

### 17. `QuestionRegistry` and `QuestionHandler` are unused

**Files:** `internal/executor/question.go`, `internal/executor/executor.go`

`QuestionRegistry` is created in `NewPool` and stored on `Pool.Questions` but nothing reads from it. `QuestionHandler` interface exists but is never instantiated or wired to anything. The intended design (pool detects `tool_use` and invokes a handler) is not implemented.

**Fix:** Remove `QuestionRegistry`, `QuestionHandler`, and `Pool.Questions` until the feature is actually needed. The current question flow (agent writes file, pool reads BlockedError) already works without them.

### 18. Script registry endpoint is undocumented and empty

**File:** `internal/api/scripts.go`

The `ScriptRegistry` type and `POST /api/scripts/{name}` endpoint are defined but the registry is never populated from config. There are no example scripts. The endpoint exists in production code but can't be used.

**Fix:** Either populate scripts from config (`[[scripts]]` entries in config.toml) with documentation, or remove the endpoint until it's ready.

### 19. `parseStream` ignores scanner errors

**File:** `internal/executor/claude.go`

`parseStream()` loops over `scanner.Scan()` but never checks `scanner.Err()` after the loop. If the scanner hits a read error mid-stream, it silently stops. The function returns `nil` error, making the task appear to complete successfully when it may have lost output.

**Fix:**
```go
if err := scanner.Err(); err != nil {
    if streamErr == nil {
        streamErr = fmt.Errorf("reading claude stdout: %w", err)
    }
}
```

### 20. Log files have no size limit or rotation

**Files:** `internal/executor/claude.go`, `internal/storage/db.go`

Execution logs in `~/.claudomator/executions/` have no size limit, rotation, or TTL. Long-running or repeatedly-retried tasks can produce arbitrarily large log files. Old executions are never cleaned up.

**Fix (minimal):** Add a `CleanupOldExecutions(olderThan time.Duration)` method to `storage.DB` and call it on server startup (or on a ticker). Delete execution log directories for executions older than the TTL.

---

## Summary by Effort

| # | Issue | Effort | Impact |
|---|-------|--------|--------|
| 1 | `sandboxDir` shadowing | XS (2 chars: `=` not `:=`) | Critical — root cause of git clone failures |
| 2 | Hardcoded `master` branch | S | High — breaks all `main`-branch repos |
| 3 | GeminiRunner stub | M–L | High — silent wrong output in production |
| 4 | Atomic exec creation | S | Medium — crash-recovery edge case |
| 5 | context.Background() resume | XS | Low — shutdown correctness |
| 6 | Pool shutdown | S | Medium — goroutine leak on shutdown |
| 7 | Priority dispatch | M | Medium — misleading API |
| 8 | RetryConfig unused | S | Low — misleading API |
| 9 | Changestats duplication | XS | Low — confusing code |
| 10 | Shallow copy in withFailureHistory | S | Low — latent bug |
| 11 | Migration versioning | M | Medium — operational risk |
| 12 | Test seams in Server struct | M | Low — architecture cleanliness |
| 13 | Brittle classifier parsing | S | Medium — silent classification failures |
| 14 | gitSafe inconsistency | XS | Medium — owner errors in teardown |
| 15 | tailFile memory | XS | Low — theoretical |
| 16 | execute/executeResume duplication | M | Low — maintainability |
| 17 | QuestionRegistry dead code | XS | Low — confusion |
| 18 | Script registry empty | S | Low — dead endpoint |
| 19 | parseStream ignores scanner.Err | XS | Medium — silent data loss |
| 20 | Log file cleanup | S | Medium — disk usage |