1 files changed, 6 insertions, 0 deletions
diff --git a/docs/adr/007-planning-layer-and-story-model.md b/docs/adr/007-planning-layer-and-story-model.md
index 2ca2afd..7efb66d 100644
--- a/docs/adr/007-planning-layer-and-story-model.md
+++ b/docs/adr/007-planning-layer-and-story-model.md
@@ -74,6 +74,7 @@ Each story has a dedicated Git branch. Subtasks execute sequentially, each cloni
 - **Ordered execution enforced** — subtasks run strictly in order; each depends on the previous commit
 - **Reviewable history** — the story branch accumulates one commit per subtask, giving a clean, auditable record before merge
 - **Clear recovery points** — if subtask N fails, roll back to subtask N-1's commit, fix the subtask definition, rerun
+- **Resilient to transient API failures** — transient rate-limit errors (429) do not fail the story or subtask; the executor requeues the task and "pauses" the story until the agent is unblocked, preserving the sequential chain.
 
 **Tradeoffs accepted:**
 - Clone and container creation cost is paid per subtask (not amortized across the story). Acceptable at current usage scale.
@@ -162,6 +163,8 @@ The elaborator selects `type` based on change scope. Curl is the default for sma
 - Evidence (response bodies, test output excerpts, screenshots for playwright)
 - Overall verdict: pass → story moves to `REVIEW_READY`; fail → story moves to `NEEDS_FIX` with report attached
 
+Validation subtasks are governed by the same pool-level rate-limit resilience; a 429 during validation will requeue the subtask rather than failing the story.
+
 #### Failure Recovery
 
 If a subtask fails mid-story: pause the story and require human review before resuming. The options at that point are:
@@ -188,6 +191,8 @@ Policy beyond this is deferred until failure patterns are observed in practice.
 - Elaborator output extended: `validation` block (type, steps, success_criteria) stored as `validation_json` on story
 - Remove `Agent.RepositoryURL`, `Agent.ProjectDir` legacy fields, `skip_planning`, `fallbackGitInit()`
 - Remove duplicate changestats extraction (keep pool-side, remove API server-side)
+- Pool-level "requeue-and-skip" logic for rate-limited agents: tasks return to `QUEUED` and release worker slots if all candidate agents are blocked, allowing the system to "wait out" 429 errors without failing stories.
+- Background "Recovery Scheduler" goroutine: periodically (every 30m or as hinted by API) runs minimal "test tasks" to verify agent availability and unblock the pool.
 
 **Doot changes:**
 - New `SourceClaudomator` atom source
@@ -224,6 +229,7 @@ Story creation is driven by a beefed-up version of Claudomator's existing elabor
 - A `git fetch` (not pull) at elaboration start updates remote refs without touching the working tree
 - Branch creation is deferred to approval — elaboration agent is purely read-only
 - Execution clones use `git clone --reference /local/path <remote>` — reuses local object store, fetches only the delta; significantly faster than cold clone
+- Rate-limit aware — if the elaboration agent is blocked, the UI surfaces the status and resumes automatically once unblocked via the Recovery Scheduler.
 
 ### Project Registry