docs: update ADR-007 with validation pipeline and nav project

- Story state machine: SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX - Merge-first strategy: no branch review phase, tests are the confidence mechanism - Elaborator owns validation spec (type, steps, success_criteria) - Validation types: curl | tests | playwright | gradle - Nav project (Android): deploy = push to GitHub, validate = gradle test/lint - Project registry: type + deploy_script fields, initial claudomator + nav entries - Out of scope: branch review deferred, CI polling out of band for nav Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
author: Peter Stone <thepeterstone@gmail.com> 2026-03-20 08:42:19 +0000
committer: Peter Stone <thepeterstone@gmail.com> 2026-03-20 08:42:19 +0000
commit: b547a2f13a8416c09937f18afdfcfd8e80102f7c (patch)
tree: 6aad0462e01045c861b92c7f38d732862f4546a0 /docs
parent: 1e4649ebbeb1a51cc48f32c7195fe854847d1b10 (diff)
1 files changed, 90 insertions, 12 deletions
diff --git a/docs/adr/007-planning-layer-and-story-model.md b/docs/adr/007-planning-layer-and-story-model.md
index 8b0d16b..2ca2afd 100644
--- a/docs/adr/007-planning-layer-and-story-model.md
+++ b/docs/adr/007-planning-layer-and-story-model.md
@@ -94,31 +94,98 @@ Subtasks within a story execute sequentially. This is enforced via `depends_on`
 
 ---
 
-### 4. Story-Gated Deployment
+### 4. Story-Gated Deployment and Agent Validation
 
 Deployment is triggered at the story level, not the task or subtask level.
 
-**Trigger:** All tasks belonging to a story reach `COMPLETED` status → story transitions to `SHIPPABLE` → deploy is triggered.
+#### State Machine
 
-**Deploy configuration:** Stored on the story (e.g., target environment, deploy script/command, repo). This replaces per-task `repository_url` which moves up to the story level.
+```
+PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY
+                                                           ↘ NEEDS_FIX → IN_PROGRESS (retry)
+```
 
-**Failure recovery policy (TBD):** If a subtask fails mid-story, the options are:
-- Roll back to the previous subtask's commit and retry the failed subtask
-- Pause the story and require human review before resuming
+- **SHIPPABLE:** All tasks completed. Ready to merge and deploy.
+- **DEPLOYED:** Merged to main, deploy triggered.
+- **VALIDATING:** Validation agent is running.
+- **REVIEW_READY:** Validation passed. Awaiting human sign-off.
+- **NEEDS_FIX:** Validation failed. Story returns to `IN_PROGRESS` with the validation report attached.
 
-Policy is not decided here — this is flagged for a future decision once the branching model is implemented and failure patterns are observed.
+#### Merge Strategy
+
+Merge to main first, then validate against the live deployment. No branch review phase — tests are the confidence mechanism. If test coverage is insufficient for a given story, the implementor is responsible for adding tests before marking it shippable. Branch review may be introduced later if needed.
+
+#### Deploy Configuration
+
+Stored on the story. Two project types are handled:
+
+| Project | Deploy trigger | What "deployed" means |
+|---|---|---|
+| claudomator | `git push` to local bare repo → systemd pulls and restarts | Live at `doot.terst.org` |
+| nav (Android) | `git push` to GitHub → CI build action fires | APK distributed to testers via Play Store testing track |
+
+For nav, Claudomator does not interact with GitHub CI directly — it pushes the branch/commits; the CI action is an external trigger. "Deployed" is declared once the push succeeds; the CI result is not polled.
+
+#### Agent Validation
+
+After a story is deployed, a validation subtask is automatically created. The elaborator is responsible for specifying how validation should be performed — it has full context of what changed and can prescribe the appropriate check level.
+
+**Validation spec** (produced by elaborator, stored on the story):
+
+```yaml
+validation:
+  type: curl         # curl | tests | playwright | gradle
+  steps:
+    - "GET /api/stats — expect 200, body contains throughput[]"
+    - "GET /api/agents/status — expect agents array non-empty"
+  success_criteria: "All steps return expected responses with correct structure"
+```
+
+**Validation types by project:**
+
+| Type | When to use | What the agent does |
+|---|---|---|
+| `curl` | API changes, data model additions, simple UI text | HTTP requests, check status codes and response shape |
+| `tests` | Logic changes with existing test coverage | Runs the project test suite against the live deployment or codebase |
+| `playwright` | Subtle UI changes, interactive flows, visual correctness | Browser automation against the deployed URL |
+| `gradle` | nav (Android) — any change | `./gradlew test`, `./gradlew lint`; optionally `./gradlew assembleDebug` |
+
+The elaborator selects `type` based on change scope. Curl is the default for small targeted changes; playwright is reserved for changes where visual or interactive correctness cannot be inferred from API responses alone.
+
+**Validation agent inputs:**
+- The validation spec (type, steps, success_criteria)
+- Deployed URL or project path
+- Summary of what changed (story name + task list)
+
+**Validation agent outputs:**
+- Structured pass/fail per step
+- Evidence (response bodies, test output excerpts, screenshots for playwright)
+- Overall verdict: pass → story moves to `REVIEW_READY`; fail → story moves to `NEEDS_FIX` with report attached
+
+#### Failure Recovery
+
+If a subtask fails mid-story: pause the story and require human review before resuming. The options at that point are:
+- Roll back to the previous subtask's commit and retry
+- Amend the subtask definition and requeue
+
+Policy beyond this is deferred until failure patterns are observed in practice.
 
 ---
 
 ## Consequences
 
 **Claudomator changes:**
-- New `stories` table: `id, name, branch_name, repository_url, deploy_config, status`
-- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story)
+- New `stories` table: `id, name, branch_name, project_id, deploy_config, validation_json, status`
+- New `projects` table: `id, name, remote_url, local_path, type, deploy_script`
+- `tasks.story_id` FK; `repository_url` removed from tasks (inherited from story → project)
 - Sequential subtask ordering via auto-wired `depends_on` at task creation time
-- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → deploy trigger
+- Post-task-completion check: all story tasks COMPLETED → story transitions to SHIPPABLE → merge + deploy trigger
+- Post-deploy: auto-create validation subtask from story's `validation_json` spec
+- Validation subtask completes → story transitions to REVIEW_READY or NEEDS_FIX
+- Story state machine: PENDING → IN_PROGRESS → SHIPPABLE → DEPLOYED → VALIDATING → REVIEW_READY | NEEDS_FIX
 - `ContainerRunner`: clone at story branch HEAD; push back to story branch after each subtask
 - Deployment status check moves from task level to story level
+- Elaborator output extended: `validation` block (type, steps, success_criteria) stored as `validation_json` on story
 - Remove `Agent.RepositoryURL`, `Agent.ProjectDir` legacy fields, `skip_planning`, `fallbackGitInit()`
 - Remove duplicate changestats extraction (keep pool-side, remove API server-side)
 
@@ -168,14 +235,25 @@ The local working copy model requires a formal project concept. A `projects` tab
 | `name` | Human-readable label |
 | `remote_url` | Git remote (clone target for execution) |
 | `local_path` | Local working copy path (read cache for elaboration, object store for `--reference` clones) |
+| `type` | `web` \| `android` — controls available validation types and deploy semantics |
+| `deploy_script` | Optional path to project-specific deploy script |
 
 `repository_url` on stories becomes a FK to `projects`. The existing `project` string field on tasks (currently just a label) is replaced by `project_id`. `Agent.RepositoryURL`, `Agent.ProjectDir`, and `Task.RepositoryURL` are all removed — project is the single source of truth for repo location.
 
+**Initial registered projects:**
+
+| Name | Local path | Remote | Type |
+|---|---|---|---|
+| claudomator | `/workspace/claudomator` | local bare repo | web |
+| nav | `/workspace/nav` | GitHub | android |
+
 ---
 
-## Out of Scope
+## Out of Scope (for now)
 
 - Voice interface (noted as a future exploration, not an architectural requirement)
 - Epic management tooling
-- Failure recovery policy for mid-story subtask failures
 - Parallelism within stories
+- Branch review before merge — deferred; merge-first is the current strategy. May be revisited if confidence requires it.
+- Polling GitHub CI result for nav deploys — Claudomator declares "deployed" on push success; CI outcome is out of band
+- ADB / emulator-based UI validation for nav — `gradle` type covers unit and integration tests; device UI testing deferred
author	Peter Stone <thepeterstone@gmail.com>	2026-03-20 08:42:19 +0000
committer	Peter Stone <thepeterstone@gmail.com>	2026-03-20 08:42:19 +0000
commit	b547a2f13a8416c09937f18afdfcfd8e80102f7c (patch)
tree	6aad0462e01045c861b92c7f38d732862f4546a0 /docs
parent	1e4649ebbeb1a51cc48f32c7195fe854847d1b10 (diff)