diff options
| author | Peter Stone <thepeterstone@gmail.com> | 2026-03-20 08:42:19 +0000 |
|---|---|---|
| committer | Peter Stone <thepeterstone@gmail.com> | 2026-03-20 08:42:19 +0000 |
| commit | b547a2f13a8416c09937f18afdfcfd8e80102f7c (patch) | |
| tree | 6aad0462e01045c861b92c7f38d732862f4546a0 /docs | |
| parent | 1e4649ebbeb1a51cc48f32c7195fe854847d1b10 (diff) | |
docs: update ADR-007 with validation pipeline and nav project
- Story state machine: SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX
- Merge-first strategy: no branch review phase, tests are the confidence mechanism
- Elaborator owns validation spec (type, steps, success_criteria)
- Validation types: curl | tests | playwright | gradle
- Nav project (Android): deploy = push to GitHub, validate = gradle test/lint
- Project registry: type + deploy_script fields, initial claudomator + nav entries
- Out of scope: branch review deferred, CI polling out of band for nav
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/adr/007-planning-layer-and-story-model.md | 102 |
1 files changed, 90 insertions, 12 deletions
diff --git a/docs/adr/007-planning-layer-and-story-model.md b/docs/adr/007-planning-layer-and-story-model.md index 8b0d16b..2ca2afd 100644 --- a/docs/adr/007-planning-layer-and-story-model.md +++ b/docs/adr/007-planning-layer-and-story-model.md @@ -94,31 +94,98 @@ Subtasks within a story execute sequentially. This is enforced via `depends_on` --- -### 4. Story-Gated Deployment +### 4. Story-Gated Deployment and Agent Validation Deployment is triggered at the story level, not the task or subtask level. -**Trigger:** All tasks belonging to a story reach `COMPLETED` status → story transitions to `SHIPPABLE` → deploy is triggered. +#### State Machine -**Deploy configuration:** Stored on the story (e.g., target environment, deploy script/command, repo). This replaces per-task `repository_url` which moves up to the story level. +``` +PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY + ↘ NEEDS_FIX → IN_PROGRESS (retry) +``` -**Failure recovery policy (TBD):** If a subtask fails mid-story, the options are: -- Roll back to the previous subtask's commit and retry the failed subtask -- Pause the story and require human review before resuming +- **SHIPPABLE:** All tasks completed. Ready to merge and deploy. +- **DEPLOYED:** Merged to main, deploy triggered. +- **VALIDATING:** Validation agent is running. +- **REVIEW_READY:** Validation passed. Awaiting human sign-off. +- **NEEDS_FIX:** Validation failed. Story returns to `IN_PROGRESS` with the validation report attached. -Policy is not decided here — this is flagged for a future decision once the branching model is implemented and failure patterns are observed. +#### Merge Strategy + +Merge to main first, then validate against the live deployment. No branch review phase — tests are the confidence mechanism. If test coverage is insufficient for a given story, the implementor is responsible for adding tests before marking it shippable. Branch review may be introduced later if needed. + +#### Deploy Configuration + +Stored on the story. Two project types are handled: + +| Project | Deploy trigger | What "deployed" means | +|---|---|---| +| claudomator | `git push` to local bare repo → systemd pulls and restarts | Live at `doot.terst.org` | +| nav (Android) | `git push` to GitHub → CI build action fires | APK distributed to testers via Play Store testing track | + +For nav, Claudomator does not interact with GitHub CI directly — it pushes the branch/commits; the CI action is an external trigger. "Deployed" is declared once the push succeeds; the CI result is not polled. + +#### Agent Validation + +After a story is deployed, a validation subtask is automatically created. The elaborator is responsible for specifying how validation should be performed — it has full context of what changed and can prescribe the appropriate check level. + +**Validation spec** (produced by elaborator, stored on the story): + +```yaml +validation: + type: curl # curl | tests | playwright | gradle + steps: + - "GET /api/stats — expect 200, body contains throughput[]" + - "GET /api/agents/status — expect agents array non-empty" + success_criteria: "All steps return expected responses with correct structure" +``` + +**Validation types by project:** + +| Type | When to use | What the agent does | +|---|---|---| +| `curl` | API changes, data model additions, simple UI text | HTTP requests, check status codes and response shape | +| `tests` | Logic changes with existing test coverage | Runs the project test suite against the live deployment or codebase | +| `playwright` | Subtle UI changes, interactive flows, visual correctness | Browser automation against the deployed URL | +| `gradle` | nav (Android) — any change | `./gradlew test`, `./gradlew lint`; optionally `./gradlew assembleDebug` | + +The elaborator selects `type` based on change scope. Curl is the default for small targeted changes; playwright is reserved for changes where visual or interactive correctness cannot be inferred from API responses alone. + +**Validation agent inputs:** +- The validation spec (type, steps, success_criteria) +- Deployed URL or project path +- Summary of what changed (story name + task list) + +**Validation agent outputs:** +- Structured pass/fail per step +- Evidence (response bodies, test output excerpts, screenshots for playwright) +- Overall verdict: pass → story moves to `REVIEW_READY`; fail → story moves to `NEEDS_FIX` with report attached + +#### Failure Recovery + +If a subtask fails mid-story: pause the story and require human review before resuming. The options at that point are: +- Roll back to the previous subtask's commit and retry +- Amend the subtask definition and requeue + +Policy beyond this is deferred until failure patterns are observed in practice. --- ## Consequences **Claudomator changes:** -- New `stories` table: `id, name, branch_name, repository_url, deploy_config, status` -- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story) +- New `stories` table: `id, name, branch_name, project_id, deploy_config, validation_json, status` +- New `projects` table: `id, name, remote_url, local_path, type, deploy_script` +- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story → project) - Sequential subtask ordering via auto-wired `depends_on` at task creation time -- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → deploy trigger +- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → merge + deploy trigger +- Post-deploy: auto-create validation subtask from story's `validation_json` spec +- Validation subtask completes → story transitions to REVIEW_READY or NEEDS_FIX +- Story state machine: PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX - `ContainerRunner`: clone at story branch HEAD; push back to story branch after each subtask - Deployment status check moves from task level to story level +- Elaborator output extended: `validation` block (type, steps, success_criteria) stored as `validation_json` on story - Remove `Agent.RepositoryURL`, `Agent.ProjectDir` legacy fields, `skip_planning`, `fallbackGitInit()` - Remove duplicate changestats extraction (keep pool-side, remove API server-side) @@ -168,14 +235,25 @@ The local working copy model requires a formal project concept. A `projects` tab | `name` | Human-readable label | | `remote_url` | Git remote (clone target for execution) | | `local_path` | Local working copy path (read cache for elaboration, object store for `--reference` clones) | +| `type` | `web` \| `android` — controls available validation types and deploy semantics | +| `deploy_script` | Optional path to project-specific deploy script | `repository_url` on stories becomes a FK to `projects`. The existing `project` string field on tasks (currently just a label) is replaced by `project_id`. `Agent.RepositoryURL`, `Agent.ProjectDir`, and `Task.RepositoryURL` are all removed — project is the single source of truth for repo location. +**Initial registered projects:** + +| Name | Local path | Remote | Type | +|---|---|---|---| +| claudomator | `/workspace/claudomator` | local bare repo | web | +| nav | `/workspace/nav` | GitHub | android | + --- -## Out of Scope +## Out of Scope (for now) - Voice interface (noted as a future exploration, not an architectural requirement) - Epic management tooling -- Failure recovery policy for mid-story subtask failures - Parallelism within stories +- Branch review before merge — deferred; merge-first is the current strategy. May be revisited if confidence requires it. +- Polling GitHub CI result for nav deploys — Claudomator declares "deployed" on push success; CI outcome is out of band +- ADB / emulator-based UI validation for nav — `gradle` type covers unit and integration tests; device UI testing deferred |
