-
Notifications
You must be signed in to change notification settings - Fork 1
plan: space feature end-to-end happy path — single space, single task #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lsm
merged 17 commits into
dev
from
plan/space-feature-end-to-end-happy-path-single-space-single-task
Mar 27, 2026
Merged
Changes from 7 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
a4452c1
plan: space feature end-to-end happy path — single space, single task…
lsm 012f73c
fix: address P0/P1 review feedback on space happy path plan
lsm be52a52
fix: address iteration 2 review feedback (P1/P2)
lsm 45666b3
ci: trigger rebuild after rebase onto dev (SDK 0.2.84)
lsm 9b89e68
fix: address P0/P1 review feedback on space happy path plan
lsm 4385150
fix: reconcile iteration cap status with M5 Task 5.1
lsm f386ec2
docs: revise space plan with gate+channel architecture
lsm 95a2afd
fix: address all P1/P2 review feedback from both reviewers
lsm 74dffae
fix: address final P2 consistency issues from reviewers
lsm de84898
refactor: unify gate design — one Gate entity with composable conditions
lsm 1ed0c47
fix: replace legacy gate terminology with unified gate IDs
lsm 966f3dd
feat: add composite gate conditions (all/any) and task-title worktree…
lsm 02e948a
fix: update stale "three condition types" references to five
lsm 4a7ee93
feat: separate channels and gates, add bidirectional comms and struct…
lsm 3e0911b
fix: remove channelId from Gate, specify migration paths, define mode…
lsm 7d07ad4
feat: add numeric task IDs and space slugs to data model
lsm 96564a0
fix: resolve P1/P2 issues from iteration 17 review
lsm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
220 changes: 220 additions & 0 deletions
220
...ans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,220 @@ | ||
| # Space Feature: End-to-End Happy Path | ||
|
|
||
| ## Goal Summary | ||
|
|
||
| Make the happy path for a single space with a single task using a single workflow work end-to-end: human converses with Space Agent, creates a task, Space Agent selects the default coding workflow, and the workflow runs through the full pipeline with proper gate enforcement, agent-to-agent messaging, and completion detection. | ||
|
|
||
| **Scope constraints**: Single task, single space, single workflow run. No goals/missions involved. | ||
|
|
||
| ## Target Workflow Pipeline | ||
|
|
||
| ``` | ||
| Planning → [PR Gate] → Plan Review (reviewer agents) → [Human Gate] → Coding Agent → [PR Gate] → 3 Coding Reviewers (parallel) → [Aggregate Gate: 3 yes votes required] → QA → Task Agent (Done) | ||
| ``` | ||
|
|
||
| **Gate types**: | ||
| - **PR Gate**: Blocks until a PR is created. Stores the PR URL. Agents downstream can read it. | ||
| - **Human Gate**: Blocks until a human approves. Shows artifacts view with all changes in the worktree. | ||
| - **Aggregate Gate**: Blocks until a quorum is met (e.g., 3/3 reviewers vote "yes"). Stores each reviewer's vote. | ||
| - **Task Result Gate**: Simple pass/fail based on agent's `report_done` result. | ||
|
|
||
| ## Core Architecture: Gates + Channels | ||
|
|
||
| **CRITICAL DESIGN DECISION**: The Space workflow uses a Gate + Channel model instead of a complex state machine. This is fundamentally simpler and more composable than tracking many states with complex transition rules. | ||
|
|
||
| ### Gates | ||
|
|
||
| A **Gate** is a simple condition that can pass or not, **with a data store**. Gates hold the data they need (PR URLs, review results, approval status). Agents can read and write gate data. | ||
|
|
||
| ```typescript | ||
| interface Gate { | ||
| id: string; | ||
| type: 'pr' | 'human' | 'aggregate' | 'task_result' | 'always'; | ||
| // The gate's data store — agents can read/write this | ||
| data: Record<string, unknown>; | ||
| // Evaluate whether the gate passes | ||
| evaluate(): boolean; | ||
| } | ||
| ``` | ||
|
|
||
| **Gate data examples**: | ||
| - PR Gate: `{ prUrl: 'https://github.com/...', prNumber: 123, branch: 'feat/xyz' }` | ||
| - Human Gate: `{ approved: true, approvedBy: 'user123', approvedAt: '2025-...' }` | ||
| - Aggregate Gate: `{ votes: { reviewer1: 'approve', reviewer2: 'approve', reviewer3: 'approve' }, quorum: 3 }` | ||
| - Task Result Gate: `{ result: 'passed', summary: '...' }` | ||
|
|
||
| **Key property**: Gates persist their data to SQLite. Agents read/write gate data via MCP tools (`read_gate`, `write_gate`). The gate's `evaluate()` checks its own data store — no external state machine needed. | ||
|
|
||
| ### Channels | ||
|
|
||
| A **Channel** controls who can talk to whom (communication flow). A channel connects two nodes and has a gate that controls when messages can flow. | ||
|
|
||
| ```typescript | ||
| interface Channel { | ||
| id: string; | ||
| from: string; // source node ID | ||
| to: string; // target node ID | ||
| gate: Gate; // controls when this channel opens | ||
| isCyclic?: boolean; // for feedback loops | ||
| } | ||
| ``` | ||
|
|
||
| ### Why This Is Simpler | ||
|
|
||
| Instead of a state machine with states like `planning`, `waiting_for_plan_review`, `waiting_for_human_approval`, `coding`, `waiting_for_code_review`, `waiting_for_qa`, `done`, `failed`, `needs_attention` — each with complex transition rules — we have: | ||
|
|
||
| 1. **Nodes** execute agents (one at a time or in parallel) | ||
| 2. **Channels** connect nodes with gates | ||
| 3. **Gates** are simple conditions with data stores | ||
| 4. The workflow "state" is just: which nodes are active + what data is in each gate | ||
|
|
||
| Adding new behaviors = adding new gates and channels, not new states and transition rules. | ||
|
|
||
| ## Current State Analysis | ||
|
|
||
| ### What Already Exists (Working Infrastructure) | ||
|
|
||
| 1. **Space data model**: `Space`, `SpaceTask`, `SpaceWorkflow`, `SpaceWorkflowRun`, `SpaceAgent` types in `packages/shared/src/types/space.ts` — fully defined with channels, gates, multi-agent nodes. | ||
|
|
||
| 2. **Space CRUD**: `SpaceManager`, `SpaceAgentManager`, `SpaceWorkflowManager`, `SpaceTaskManager` — all backed by SQLite repos with reactive DB notifications. | ||
|
|
||
| 3. **Built-in workflows**: `CODING_WORKFLOW` (Plan -> Code -> Verify -> Done with human gate), `RESEARCH_WORKFLOW`, `REVIEW_ONLY_WORKFLOW` in `packages/daemon/src/lib/space/workflows/built-in-workflows.ts`. Seeded at space creation time. | ||
|
|
||
| 4. **Preset agents**: Coder, General, Planner, Reviewer — seeded via `seedPresetAgents()` at space creation. | ||
|
|
||
| 5. **Channel routing**: `ChannelRouter` with gate evaluation (`always`, `human`, `condition`, `task_result`), `ChannelResolver` for channel topology, `ChannelGateEvaluator`. | ||
|
|
||
| 6. **Agent-centric messaging**: Node agents use `send_message` (channel-validated), `report_done`, `list_peers`, `list_reachable_agents` via MCP tools. | ||
|
|
||
| 7. **Task Agent**: Session-level orchestrator (`TaskAgentManager`) that spawns sub-sessions per workflow node, monitors completion via `CompletionDetector`, handles lazy node activation. | ||
|
|
||
| 8. **Custom agent factory**: `createCustomAgentInit()` builds `AgentSessionInit` from `SpaceAgent` config with proper system prompts, tools, and role-based defaults. | ||
|
|
||
| 9. **Space Runtime**: `SpaceRuntime` with tick loop, executor map, rehydration, completion detection, and notification sink. | ||
|
|
||
| 10. **Space chat agent**: Conversational coordinator in `packages/daemon/src/lib/space/agents/space-chat-agent.ts` that can `start_workflow_run`, `create_standalone_task`, `suggest_workflow`, `list_workflows`, etc. | ||
|
|
||
| 11. **E2E tests**: Space creation, workflow visual editor, multi-agent editor, export/import, agent-centric workflow tests. | ||
|
|
||
| 12. **Online tests**: `task-agent-lifecycle.test.ts`, `space-agent-coordination.test.ts`. | ||
|
|
||
| ### What Needs to Be Built / Fixed | ||
|
|
||
| 1. **Gate + Channel architecture refactor**: The existing `ChannelGateEvaluator` supports basic gate types but lacks the **gate data store** concept. Gates need to persist data (PR URLs, review votes, approval status) that agents can read/write. This is the core architectural change. | ||
|
|
||
| 2. **New gate types**: PR Gate (checks PR exists, stores URL), Aggregate Gate (quorum voting), and enhanced Human Gate (stores approval + shows artifacts). | ||
|
|
||
| 3. **Extended workflow template**: Create `CODING_WORKFLOW_V2` matching the target pipeline with PR gates, parallel reviewers, and aggregate gate. | ||
|
|
||
| 4. **Node agent prompt specialization**: Node agents need proper system prompts with git workflow, PR creation, review posting, gate data writing. | ||
|
|
||
| 5. **Parallel reviewer support**: The workflow needs 3 reviewer nodes that run in parallel, with an aggregate gate requiring all 3 to approve before QA runs. | ||
|
|
||
| 6. **QA agent step**: Verification agent that checks test coverage, CI status, and PR mergeability. | ||
|
|
||
| 7. **Human gate UI with canvas visualization**: Live workflow visualization on a canvas. Clicking a human gate opens an artifacts view showing all changes in the worktree. Clicking individual changes renders file diffs. Similar to GitHub Actions visualization but with human-in-the-loop nodes. | ||
|
|
||
| 8. **Worktree isolation (one per task)**: Currently no worktree isolation exists. Need ONE worktree per task (shared by all agents in that task), with short human-readable folder names (e.g., `alpha-3`, `nova-7`). | ||
|
|
||
| 9. **Gate data MCP tools**: Agents need `read_gate` and `write_gate` MCP tools to interact with gate data stores. | ||
|
|
||
| 10. **End-to-end integration testing**: No single test exercises the full pipeline. | ||
|
|
||
| ## High-Level Approach | ||
|
|
||
| **Phase 1 — Gate + Channel architecture and workflow template** (Milestones 1-3): | ||
| - Implement gate data store and new gate types (PR, Aggregate, enhanced Human) | ||
| - Enhance node agent prompts (git workflow, review posting, PR management, gate interaction) | ||
| - Create extended CODING_WORKFLOW_V2 with the full pipeline | ||
| - Implement worktree isolation (one per task, short names) | ||
|
|
||
| **Phase 2 — QA, human gate UI, and completion** (Milestones 4-6): | ||
| - Add QA node to the pipeline | ||
| - Build human gate canvas UI with artifacts view and diff rendering | ||
| - Wire completion flow so Task Agent reports final status | ||
| - Implement conversation-to-task entry point | ||
|
|
||
| **Phase 3 — End-to-end testing and hardening** (Milestones 7-9): | ||
| - Online integration tests with dev proxy | ||
| - E2E Playwright test exercising the full UI flow | ||
| - Bug fixes and hardening | ||
|
|
||
| ## Milestones | ||
|
|
||
| 1. **Gate data store and new gate types** — Implement the gate data store (persisted to SQLite), `read_gate`/`write_gate` MCP tools, and new gate types: PR Gate, Aggregate Gate, enhanced Human Gate | ||
|
|
||
| 2. **Enhanced node agent prompts** — Add git/PR/review-specific system prompts for planner, coder, reviewer, and QA agents, including gate data interaction instructions | ||
|
|
||
| 3. **Extended coding workflow (V2)** — Create CODING_WORKFLOW_V2 with the full pipeline: Planning → [PR Gate] → Plan Review → [Human Gate] → Coding → [PR Gate] → 3 Reviewers (parallel) → [Aggregate Gate] → QA → Done | ||
|
|
||
| 4. **Worktree isolation (one per task)** — Implement single worktree per task with short human-readable names (e.g., `alpha-3`, `nova-7`), shared by all agents in the task | ||
|
|
||
| 5. **QA agent node** — Add QA as the verification step before Done, with QA→Code feedback loop | ||
|
|
||
| 6. **Human gate canvas UI** — Build live workflow canvas visualization with clickable human gates that show artifacts view with file diffs (GitHub Actions-style but with human-in-the-loop) | ||
|
|
||
| 7. **Online integration test** — Exercise the full happy path with dev proxy, broken into focused per-component sub-tests | ||
|
|
||
| 8. **E2E test** — Playwright test exercising the full UI flow from space chat through task creation and workflow execution | ||
|
|
||
| 9. **Bug fixes and hardening** — Fix issues discovered during testing; add error handling and edge case coverage | ||
|
|
||
| ## Final Workflow Graph | ||
|
|
||
| ``` | ||
| Planning ──[PR Gate]──► Plan Review (reviewers) ──[Human Gate]──► Coding ──[PR Gate]──► Reviewer 1 ─┐ | ||
| ▲ Reviewer 2 ─┼─[Aggregate Gate: 3 yes]──► QA ──[Task Result: pass]──► Done | ||
| │ Reviewer 3 ─┘ │ | ||
| │ │ | ||
| └──────────── [Task Result: fail, cyclic] ────────────────────┘ | ||
| │ │ | ||
| └── [Review reject, cyclic]┘ | ||
| ``` | ||
|
|
||
| **Gate data flow**: | ||
| - Planner writes to PR Gate: `{ prUrl, prNumber, branch }` | ||
| - Plan reviewers read PR Gate data to find the plan PR | ||
| - Human reads gate artifacts view, clicks approve → Human Gate data: `{ approved: true }` | ||
| - Coder writes to PR Gate: `{ prUrl, prNumber, branch }` | ||
| - Each reviewer writes to Aggregate Gate: `{ votes: { [reviewerId]: 'approve' | 'reject' } }` | ||
| - Aggregate Gate evaluates: passes when `Object.values(votes).filter(v => v === 'approve').length >= quorum` | ||
| - QA reads PR Gate data, runs tests, writes Task Result Gate data | ||
|
|
||
| **All cyclic channels route back to Coding, never to Planning.** This ensures: | ||
| - Code-level issues (review feedback, QA failures) are fixed by the Coder directly without re-planning | ||
| - The human gate only fires once (Plan Review → Coding), not on every iteration | ||
| - The Coder can iterate on feedback from both reviewers and QA independently | ||
|
|
||
| **Iteration cap**: `maxIterations` is a global counter on the workflow run, incremented each time ANY cyclic channel is traversed. When the cap is reached, the workflow transitions to `failed` with a `failureReason` of `'maxIterationsReached'`. | ||
|
|
||
| ## Cross-Milestone Dependencies | ||
|
|
||
| - Milestone 1 (gate data store) is the foundation — M2 and M3 depend on it | ||
| - Milestone 2 (prompts) depends on M1 (agents need `read_gate`/`write_gate` instructions) | ||
| - Milestone 3 (V2 workflow) depends on M1 (new gate types must exist) | ||
| - Milestone 4 (worktree) can start in parallel with M2/M3 | ||
| - Milestone 5 (QA) depends on M3 (V2 workflow template must exist) | ||
| - Milestone 6 (human gate UI) depends on M1 (gate data store) and M3 (V2 workflow with human gate) | ||
| - Milestone 7 (online test) depends on M5 and M6 | ||
| - Milestone 8 (E2E test) depends on M6; can start in parallel with M7 | ||
| - Milestone 9 (hardening) depends on M7 and M8 | ||
|
|
||
| ## V2 Workflow Seeding Strategy | ||
|
|
||
| - `CODING_WORKFLOW_V2` is seeded alongside existing workflows (additive, not replacing) | ||
| - Existing spaces are not affected (idempotent seeding) | ||
| - V2 gets `tag: 'default'` so workflow selector ranks it first for coding-type requests | ||
| - Existing `CODING_WORKFLOW` (V1) kept for backward compatibility | ||
| - **V1→V2 migration is out of scope** | ||
|
|
||
| ## Worktree Strategy | ||
|
|
||
| - **One worktree per task** (shared by all agents in that task — planner, coder, reviewer, QA all work in the same worktree) | ||
| - **Short, human-readable folder names**: `alpha-3`, `nova-7`, `flux-2` — short adjective + dash + number (similar to Codex naming) | ||
| - The worktree name does NOT need to associate with session IDs — the DB links everything | ||
| - Folder name just needs to be unique and memorable | ||
| - Agents work sequentially in the task worktree, so no conflicts | ||
|
|
||
| ## Total Estimated Task Count | ||
|
|
||
| ~30 tasks across 9 milestones | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1:
WorkflowRunStatusdoes not include'failed'The plan references
failedas aWorkflowRunStatusvalue here and in multiple other milestones (M6 Task 6.1, M9 Task 9.4). However, the current type inpackages/shared/src/types/space.ts:304only has:The existing status machine maps failure scenarios to
'needs_attention'. Adding'failed'is a cross-cutting type-system change that needs an explicit task, not just a subtask buried in M6.Action: Add a dedicated task in M1 to extend
WorkflowRunStatuswith'failed'and'failureReason', and updateworkflow-run-status-machine.tsaccordingly. All downstream milestones reference this status.