diff options
Diffstat (limited to 'docs/adr')
| -rw-r--r-- | docs/adr/005-sandbox-execution-model.md | 25 | ||||
| -rw-r--r-- | docs/adr/006-containerized-execution.md | 51 | ||||
| -rw-r--r-- | docs/adr/007-planning-layer-and-story-model.md | 265 |
3 files changed, 333 insertions, 8 deletions
diff --git a/docs/adr/005-sandbox-execution-model.md b/docs/adr/005-sandbox-execution-model.md index b374561..0c9ef14 100644 --- a/docs/adr/005-sandbox-execution-model.md +++ b/docs/adr/005-sandbox-execution-model.md @@ -1,7 +1,7 @@ # ADR-005: Git Sandbox Execution Model ## Status -Accepted +Superseded by [ADR-006](006-containerized-execution.md) ## Context @@ -69,9 +69,13 @@ state), the sandbox is **not** torn down. The preserved sandbox allows the resumed execution to pick up the same working tree state, including any in-progress file changes made before the agent asked its question. -Resume executions (`SubmitResume`) skip sandbox setup entirely and run -directly in `project_dir`, passing `--resume <session-id>` to the agent -so Claude can continue its previous conversation. +**Known Risk: Resume skips sandbox.** Current implementation of +Resume executions (`SubmitResume`) skips sandbox setup entirely and runs +directly in `project_dir`. This is a significant behavioral divergence: if a +resumed task makes further changes, they land directly in the canonical working +copy, reintroducing the concurrent corruption and partial-work leak risks +identified in the Context section. A future iteration should ensure resumed +tasks pick up the preserved sandbox instead. ### Session ID propagation on resume @@ -113,10 +117,15 @@ The fix is in `ClaudeRunner.Run`: if `e.ResumeSessionID != ""`, use it as directory the server process inherited. - If a sandbox's push repeatedly fails (e.g. due to a bare repo that is itself broken), the task is failed with the sandbox preserved. -- If `/tmp` runs out of space (many large sandboxes), tasks will fail at - clone time. This is a known operational risk with no current mitigation. -- The `project_dir` field in task YAML must point to a git repository with - a configured `"local"` or `"origin"` remote that accepts pushes. +- **If `/tmp` runs out of space** (many large sandboxes), tasks will fail at + clone time. This is a known operational risk. Mitigations such as periodic + cleanup of old sandboxes (cron) or pre-clone disk space checks are required + as follow-up items. +- **The `project_dir` field in task YAML** must point to a git repository with + a configured `"local"` or `"origin"` remote that accepts pushes. If neither + remote exists or the push is rejected for other reasons, the task will be + marked as `FAILED` and the sandbox will be preserved for manual recovery. + ## Relevant Code Locations diff --git a/docs/adr/006-containerized-execution.md b/docs/adr/006-containerized-execution.md new file mode 100644 index 0000000..cdd1cc2 --- /dev/null +++ b/docs/adr/006-containerized-execution.md @@ -0,0 +1,51 @@ +# ADR-006: Containerized Repository-Based Execution Model + +## Status +Accepted (Supersedes ADR-005) + +## Context +ADR-005 introduced a sandbox execution model based on local git clones and pushes back to a local project directory. While this provided isolation, it had several flaws identified during early adoption: +1. **Host pollution**: Build dependencies (Node, Go, etc.) had to be installed on the host and were subject to permission issues (e.g., `/root/.nvm` access for `www-data`). +2. **Fragile Pushes**: Pushing to a checked-out local branch is non-standard and requires risky git configs. +3. **Resume Divergence**: Resumed tasks bypassed the sandbox, reintroducing corruption risks. +4. **Scale**: Local directory-based "project selection" is a hack that doesn't scale to multiple repos or environments. + +## Decision +We will move to a containerized execution model where projects are defined by canonical repository URLs and executed in isolated containers. + +### 1. Repository-Based Projects +- The `Task` model now uses `RepositoryURL` as the source of truth for the codebase. +- This replaces the fragile reliance on local `ProjectDir` paths. + +### 2. Containerized Sandboxes +- Each task execution runs in a fresh container (Docker/Podman). +- The runner clones the repository into a host-side temporary workspace and mounts it into the container. +- The container provides a "bare system" with the full build stack (Node, Go, etc.) pre-installed, isolating the host from build dependencies. + +### 3. Unified Workspace Management (including RESUME) +- Unlike ADR-005, the containerized model is designed to handle **Resume** by re-attaching to or re-mounting the same host-side workspace. +- This ensures that resumed tasks **do not** bypass the sandbox and never land directly in a production directory. + +### 4. Push to Actual Remotes +- Agents commit changes within the sandbox. +- The runner pushes these commits directly to the `RepositoryURL` (actual remote). +- If the remote is missing or the push fails, the task is marked `FAILED` and the host-side workspace is preserved for inspection. + +## Rationale +- **Isolation**: Containers prevent host pollution and ensure a consistent build environment. +- **Safety**: Repository URLs provide a standard way to manage codebases across environments. +- **Consistency**: Unified workspace management for initial runs and resumes eliminates the behavioral divergence found in ADR-005. + +## Consequences +- Requires a container runtime (Docker) on the host. +- Requires pre-built agent images (e.g., `claudomator-agent:latest`). +- **Disk Space Risk**: Host-side clones still consume `/tmp` space. Mitigation requires periodic cleanup of old workspaces or disk-space monitoring. +- **Git Config**: Repositories no longer require `receive.denyCurrentBranch = updateInstead` because we push to the remote, not a local worktree. + +## Relevant Code Locations +| Concern | File | +|---|---| +| Container Lifecycle | `internal/executor/container.go` | +| Runner Registration | `internal/cli/serve.go` | +| Task Model | `internal/task/task.go` | +| API Integration | `internal/api/server.go` | diff --git a/docs/adr/007-planning-layer-and-story-model.md b/docs/adr/007-planning-layer-and-story-model.md new file mode 100644 index 0000000..7efb66d --- /dev/null +++ b/docs/adr/007-planning-layer-and-story-model.md @@ -0,0 +1,265 @@ +# ADR-007: Planning Layer, Task Hierarchy, and Story-Gated Deployment + +**Status:** Draft +**Date:** 2026-03-19 +**Context:** Design discussion exploring the integration of Claudomator with Doot and a richer task hierarchy model. + +--- + +## Context + +Claudomator currently operates as a flat queue of tasks, each with optional subtasks (`parent_task_id`). There is no concept of grouping tasks into shippable units, no deploy automation, and no integration with personal planning tools. Separately, Doot is a personal dashboard that aggregates tasks, meals, calendar events, and bugs from third-party services (Todoist, Trello, PlanToEat, Google Calendar) into a unified `Atom` model. + +The goal of this ADR is to capture a design direction that: + +1. Integrates Claudomator into Doot as a first-class data source +2. Introduces a four-level task hierarchy (Epic → Story → Task → Subtask) +3. Defines a branching and execution model for stories +4. Establishes stories as the unit that gates deployment + +--- + +## Decision + +### 1. Claudomator as an Atom Source in Doot + +Doot already normalizes heterogeneous data sources into a unified `Atom` model (see `internal/models/atom.go`). Claudomator tasks are a natural peer to Todoist and Trello — they are their own source of truth (SQLite, full execution history) and should be surfaced in Doot's aggregation views without duplication elsewhere. + +**Design:** +- Add `SourceClaudomator AtomSource = "claudomator"` to Doot's atom model +- Implement a Claudomator API client in `internal/api/claudomator.go` (analogous to `todoist.go`, `trello.go`) +- Map Claudomator tasks to `Atom` with appropriate priority, status, and source icon +- Individual subtasks are **not** surfaced in the Doot timeline — they are execution-level details, not planning-level items + +**Rationale:** Claudomator is a peer to other task sources, not subordinate to them. Users should not need a Todoist card to track agent work — Claudomator is the source of truth for that domain. + +--- + +### 2. Four-Level Task Hierarchy + +The current flat model (task + optional subtask) is insufficient for feature-scale work. The following hierarchy is adopted: + +| Level | Name | Description | +|---|---|---| +| 4 | **Epic** | Large design initiative requiring back-and-forth, resulting in a set of stories. Lives primarily in the planning layer (Doot). Not an execution unit. | +| 3 | **Story** | A shippable slice of work. Independent and deployable on its own. Groups tasks that together constitute a releasable change. The unit that gates deployment. | +| 2 | **Task** | A feature- or bug-level unit of work. Individually buildable, but may not make sense to ship alone. Belongs to a story. | +| 1 | **Subtask** | A discrete, ordered agent action. The actual Claudomator execution unit. Belongs to a task. Performed in sequence. | + +**Key properties:** +- Stories are independently shippable — deployment is gated at this level +- Tasks are individually buildable but do not gate deployment alone +- Subtasks are the agent execution primitive — what `ContainerRunner` actually runs +- Epics are planning artifacts; they live in Doot or a future planning layer, not in Claudomator's execution model +- Scheduling prefers picking up subtasks from **already-started stories** before beginning new ones (WIP limiting) + +**Claudomator data model changes required:** +- Add `stories` table with deploy configuration and status +- Add `story_id` to tasks (foreign key to stories) +- `repository_url` moves from individual tasks to stories (all tasks in a story operate on the same repo) +- Story status is derived: all tasks completed → story is shippable + +--- + +### 3. Story-Level Branching Model + +Each story has a dedicated Git branch. Subtasks execute sequentially, each cloning the repository at the story branch's current HEAD, making commits, and pushing back before the next subtask begins. + +**Model:** One branch per story. Fresh clone + container per subtask. Subtasks commit to the story branch in sequence. + +**Properties:** + +- **Each subtask sees all prior subtask work** — it clones the story branch at HEAD, which includes all previous subtask commits +- **Clean environment per subtask** — no filesystem state leaks between subtasks; the container is ephemeral +- **Ordered execution enforced** — subtasks run strictly in order; each depends on the previous commit +- **Reviewable history** — the story branch accumulates one commit per subtask, giving a clean, auditable record before merge +- **Clear recovery points** — if subtask N fails, roll back to subtask N-1's commit, fix the subtask definition, rerun +- **Resilient to transient API failures** — transient rate-limit errors (429) do not fail the story or subtask; the executor requeues the task and "pauses" the story until the agent is unblocked, preserving the sequential chain. + +**Tradeoffs accepted:** +- Clone and container creation cost is paid per subtask (not amortized across the story). Acceptable at current usage scale. +- No parallelism within a story — subtasks are strictly sequential by design +- Concurrency lock required at the story level to prevent two subtasks running simultaneously (e.g., on retry races) + +**Rejected alternatives:** + +*Isolated commit model (fresh clone per subtask, independent branches):* Clean but subtasks cannot build on each other's work. Requires careful branch ordering and merging to assemble a story. + +*Persistent workspace per story (one container, one clone for the life of the story):* More efficient, natural continuity, but a bad subtask can corrupt the workspace for subsequent subtasks. Recovery is harder. Loses the discipline of enforced commit points. + +### Sequential Subtask Execution + +Subtasks within a story execute sequentially. This is enforced via `depends_on` links set automatically at task creation time — each subtask added to a story gets `depends_on: [previous_subtask_id]`, forming a linear chain. The existing pool dependency mechanism handles the rest. + +**Rejected alternative — pool-level story concurrency lock:** Would require the executor to become story-aware, lock state would be in-memory (fragile across restarts), and the ordering would be invisible in the data model. The `depends_on` approach is durable, inspectable, and reuses existing infrastructure. The 5-second polling delay between subtasks is an accepted tradeoff. + +--- + +### 4. Story-Gated Deployment and Agent Validation + +Deployment is triggered at the story level, not the task or subtask level. + +#### State Machine + +``` +PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY + ↘ NEEDS_FIX → IN_PROGRESS (retry) +``` + +- **SHIPPABLE:** All tasks completed. Ready to merge and deploy. +- **DEPLOYED:** Merged to main, deploy triggered. +- **VALIDATING:** Validation agent is running. +- **REVIEW_READY:** Validation passed. Awaiting human sign-off. +- **NEEDS_FIX:** Validation failed. Story returns to `IN_PROGRESS` with the validation report attached. + +#### Merge Strategy + +Merge to main first, then validate against the live deployment. No branch review phase — tests are the confidence mechanism. If test coverage is insufficient for a given story, the implementor is responsible for adding tests before marking it shippable. Branch review may be introduced later if needed. + +#### Deploy Configuration + +Stored on the story. Two project types are handled: + +| Project | Deploy trigger | What "deployed" means | +|---|---|---| +| claudomator | `git push` to local bare repo → systemd pulls and restarts | Live at `doot.terst.org` | +| nav (Android) | `git push` to GitHub → CI build action fires | APK distributed to testers via Play Store testing track | + +For nav, Claudomator does not interact with GitHub CI directly — it pushes the branch/commits; the CI action is an external trigger. "Deployed" is declared once the push succeeds; the CI result is not polled. + +#### Agent Validation + +After a story is deployed, a validation subtask is automatically created. The elaborator is responsible for specifying how validation should be performed — it has full context of what changed and can prescribe the appropriate check level. + +**Validation spec** (produced by elaborator, stored on the story): + +```yaml +validation: + type: curl # curl | tests | playwright | gradle + steps: + - "GET /api/stats — expect 200, body contains throughput[]" + - "GET /api/agents/status — expect agents array non-empty" + success_criteria: "All steps return expected responses with correct structure" +``` + +**Validation types by project:** + +| Type | When to use | What the agent does | +|---|---|---| +| `curl` | API changes, data model additions, simple UI text | HTTP requests, check status codes and response shape | +| `tests` | Logic changes with existing test coverage | Runs the project test suite against the live deployment or codebase | +| `playwright` | Subtle UI changes, interactive flows, visual correctness | Browser automation against the deployed URL | +| `gradle` | nav (Android) — any change | `./gradlew test`, `./gradlew lint`; optionally `./gradlew assembleDebug` | + +The elaborator selects `type` based on change scope. Curl is the default for small targeted changes; playwright is reserved for changes where visual or interactive correctness cannot be inferred from API responses alone. + +**Validation agent inputs:** +- The validation spec (type, steps, success_criteria) +- Deployed URL or project path +- Summary of what changed (story name + task list) + +**Validation agent outputs:** +- Structured pass/fail per step +- Evidence (response bodies, test output excerpts, screenshots for playwright) +- Overall verdict: pass → story moves to `REVIEW_READY`; fail → story moves to `NEEDS_FIX` with report attached + +Validation subtasks are governed by the same pool-level rate-limit resilience; a 429 during validation will requeue the subtask rather than failing the story. + +#### Failure Recovery + +If a subtask fails mid-story: pause the story and require human review before resuming. The options at that point are: +- Roll back to the previous subtask's commit and retry +- Amend the subtask definition and requeue + +Policy beyond this is deferred until failure patterns are observed in practice. + +--- + +## Consequences + +**Claudomator changes:** +- New `stories` table: `id, name, branch_name, project_id, deploy_config, validation_json, status` +- New `projects` table: `id, name, remote_url, local_path, type, deploy_script` +- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story → project) +- Sequential subtask ordering via auto-wired `depends_on` at task creation time +- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → merge + deploy trigger +- Post-deploy: auto-create validation subtask from story's `validation_json` spec +- Validation subtask completes → story transitions to REVIEW_READY or NEEDS_FIX +- Story state machine: PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX +- `ContainerRunner`: clone at story branch HEAD; push back to story branch after each subtask +- Deployment status check moves from task level to story level +- Elaborator output extended: `validation` block (type, steps, success_criteria) stored as `validation_json` on story +- Remove `Agent.RepositoryURL`, `Agent.ProjectDir` legacy fields, `skip_planning`, `fallbackGitInit()` +- Remove duplicate changestats extraction (keep pool-side, remove API server-side) +- Pool-level "requeue-and-skip" logic for rate-limited agents: tasks return to `QUEUED` and release worker slots if all candidate agents are blocked, allowing the system to "wait out" 429 errors without failing stories. +- Background "Recovery Scheduler" goroutine: periodically (every 30m or as hinted by API) runs minimal "test tasks" to verify agent availability and unblock the pool. + +**Doot changes:** +- New `SourceClaudomator` atom source +- Claudomator API client (`internal/api/claudomator.go`) +- Story → Atom mapper (title = story name, description = task progress e.g. "3/5 tasks done", priority from story config, deploy status) +- Task → Atom mapper (optional, feature-level visibility) +- Individual subtasks explicitly excluded from all views + +**Doot removals (dead code / superseded):** +- `bugs` table, `BugToAtom`, `SourceBug`, `TypeBug` — bug reporting becomes a thin UI shim that submits to Claudomator; nothing stored in Doot's data model +- `notes` table and all Obsidian skeleton code — never wired up +- `AddMealToPlanner()` stub — never called +- `UpdateCard()` stub — never called +- All bug handlers, templates, and store methods + +**Planning layer (future):** +- Epics live here, not in Claudomator +- Story creation via elaboration + validation flow (see below) +- WIP-limiting scheduler that prefers subtasks from started stories + +### Story Creation: Elaboration and Validation Flow + +Story creation is driven by a beefed-up version of Claudomator's existing elaboration and validation pipeline, not a YAML file or form. + +**Flow:** +1. User describes the story goal (rough, high-level) in the UI +2. Elaboration agent runs against a **local working copy** of the project (read-only mount, no clone) — reads the codebase, understands current state, produces story + task + subtask breakdown with `depends_on` chain wired +3. Validation agent checks the structure: tasks are independently buildable, subtasks properly scoped, story has a clear shippable definition, no dependency cycles +4. User reviews and approves in the UI +5. On approval: story branch created (`git checkout -b story/xxx origin/main`, pushed to remote); subtasks queued + +**Responsiveness:** +- Elaboration uses a local working copy — no clone cost, near-instant container start +- A `git fetch` (not pull) at elaboration start updates remote refs without touching the working tree +- Branch creation is deferred to approval — elaboration agent is purely read-only +- Execution clones use `git clone --reference /local/path <remote>` — reuses local object store, fetches only the delta; significantly faster than cold clone +- Rate-limit aware — if the elaboration agent is blocked, the UI surfaces the status and resumes automatically once unblocked via the Recovery Scheduler. + +### Project Registry + +The local working copy model requires a formal project concept. A `projects` table replaces the current ad-hoc `repository_url` + `working_dir` fields: + +| Field | Purpose | +|---|---| +| `id` | UUID | +| `name` | Human-readable label | +| `remote_url` | Git remote (clone target for execution) | +| `local_path` | Local working copy path (read cache for elaboration, object store for `--reference` clones) | +| `type` | `web` \| `android` — controls available validation types and deploy semantics | +| `deploy_script` | Optional path to project-specific deploy script | + +`repository_url` on stories becomes a FK to `projects`. The existing `project` string field on tasks (currently just a label) is replaced by `project_id`. `Agent.RepositoryURL`, `Agent.ProjectDir`, and `Task.RepositoryURL` are all removed — project is the single source of truth for repo location. + +**Initial registered projects:** + +| Name | Local path | Remote | Type | +|---|---|---|---| +| claudomator | `/workspace/claudomator` | local bare repo | web | +| nav | `/workspace/nav` | GitHub | android | + +--- + +## Out of Scope (for now) + +- Voice interface (noted as a future exploration, not an architectural requirement) +- Epic management tooling +- Parallelism within stories +- Branch review before merge — deferred; merge-first is the current strategy. May be revisited if confidence requires it. +- Polling GitHub CI result for nav deploys — Claudomator declares "deployed" on push success; CI outcome is out of band +- ADB / emulator-based UI validation for nav — `gradle` type covers unit and integration tests; device UI testing deferred |
