Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Space Feature: End-to-End Happy Path

## Goal Summary

Make the happy path for a single space with a single task using a single workflow work end-to-end: human converses with Space Agent, creates a task, Space Agent selects the default coding workflow, and the workflow runs through the full pipeline with proper gate enforcement, agent-to-agent messaging, and completion detection.

**Scope constraints**: Single task, single space, single workflow run. No goals/missions involved.

## Target Workflow Pipeline

```
Planning → [PR Gate] → Plan Review (reviewer agents) → [Human Gate] → Coding Agent → [PR Gate] → 3 Coding Reviewers (parallel) → [Aggregate Gate: 3 yes votes required] → QA → Task Agent (Done)
```

**Gate types**:
- **PR Gate**: Blocks until a PR is created. Stores the PR URL. Agents downstream can read it.
- **Human Gate**: Blocks until a human approves. Shows artifacts view with all changes in the worktree.
- **Aggregate Gate**: Blocks until a quorum is met (e.g., 3/3 reviewers vote "yes"). Stores each reviewer's vote.
- **Task Result Gate**: Simple pass/fail based on agent's `report_done` result.

## Core Architecture: Gates + Channels

**CRITICAL DESIGN DECISION**: The Space workflow uses a Gate + Channel model instead of a complex state machine. This is fundamentally simpler and more composable than tracking many states with complex transition rules.

### Gates

A **Gate** is a simple condition that can pass or not, **with a data store**. Gates hold the data they need (PR URLs, review results, approval status). Agents can read and write gate data.

```typescript
interface Gate {
id: string;
type: 'pr' | 'human' | 'aggregate' | 'task_result' | 'always';
// The gate's data store — agents can read/write this
data: Record<string, unknown>;
// Evaluate whether the gate passes
evaluate(): boolean;
}
```

**Gate data examples**:
- PR Gate: `{ prUrl: 'https://github.com/...', prNumber: 123, branch: 'feat/xyz' }`
- Human Gate: `{ approved: true, approvedBy: 'user123', approvedAt: '2025-...' }`
- Aggregate Gate: `{ votes: { reviewer1: 'approve', reviewer2: 'approve', reviewer3: 'approve' }, quorum: 3 }`
- Task Result Gate: `{ result: 'passed', summary: '...' }`

**Key property**: Gates persist their data to SQLite. Agents read/write gate data via MCP tools (`read_gate`, `write_gate`). The gate's `evaluate()` checks its own data store — no external state machine needed.

### Channels

A **Channel** controls who can talk to whom (communication flow). A channel connects two nodes and has a gate that controls when messages can flow.

```typescript
interface Channel {
id: string;
from: string; // source node ID
to: string; // target node ID
gate: Gate; // controls when this channel opens
isCyclic?: boolean; // for feedback loops
}
```

### Why This Is Simpler

Instead of a state machine with states like `planning`, `waiting_for_plan_review`, `waiting_for_human_approval`, `coding`, `waiting_for_code_review`, `waiting_for_qa`, `done`, `failed`, `needs_attention` — each with complex transition rules — we have:

1. **Nodes** execute agents (one at a time or in parallel)
2. **Channels** connect nodes with gates
3. **Gates** are simple conditions with data stores
4. The workflow "state" is just: which nodes are active + what data is in each gate

Adding new behaviors = adding new gates and channels, not new states and transition rules.

## Current State Analysis

### What Already Exists (Working Infrastructure)

1. **Space data model**: `Space`, `SpaceTask`, `SpaceWorkflow`, `SpaceWorkflowRun`, `SpaceAgent` types in `packages/shared/src/types/space.ts` — fully defined with channels, gates, multi-agent nodes.

2. **Space CRUD**: `SpaceManager`, `SpaceAgentManager`, `SpaceWorkflowManager`, `SpaceTaskManager` — all backed by SQLite repos with reactive DB notifications.

3. **Built-in workflows**: `CODING_WORKFLOW` (Plan -> Code -> Verify -> Done with human gate), `RESEARCH_WORKFLOW`, `REVIEW_ONLY_WORKFLOW` in `packages/daemon/src/lib/space/workflows/built-in-workflows.ts`. Seeded at space creation time.

4. **Preset agents**: Coder, General, Planner, Reviewer — seeded via `seedPresetAgents()` at space creation.

5. **Channel routing**: `ChannelRouter` with gate evaluation (`always`, `human`, `condition`, `task_result`), `ChannelResolver` for channel topology, `ChannelGateEvaluator`.

6. **Agent-centric messaging**: Node agents use `send_message` (channel-validated), `report_done`, `list_peers`, `list_reachable_agents` via MCP tools.

7. **Task Agent**: Session-level orchestrator (`TaskAgentManager`) that spawns sub-sessions per workflow node, monitors completion via `CompletionDetector`, handles lazy node activation.

8. **Custom agent factory**: `createCustomAgentInit()` builds `AgentSessionInit` from `SpaceAgent` config with proper system prompts, tools, and role-based defaults.

9. **Space Runtime**: `SpaceRuntime` with tick loop, executor map, rehydration, completion detection, and notification sink.

10. **Space chat agent**: Conversational coordinator in `packages/daemon/src/lib/space/agents/space-chat-agent.ts` that can `start_workflow_run`, `create_standalone_task`, `suggest_workflow`, `list_workflows`, etc.

11. **E2E tests**: Space creation, workflow visual editor, multi-agent editor, export/import, agent-centric workflow tests.

12. **Online tests**: `task-agent-lifecycle.test.ts`, `space-agent-coordination.test.ts`.

### What Needs to Be Built / Fixed

1. **Gate + Channel architecture refactor**: The existing `ChannelGateEvaluator` supports basic gate types but lacks the **gate data store** concept. Gates need to persist data (PR URLs, review votes, approval status) that agents can read/write. This is the core architectural change.

2. **New gate types**: PR Gate (checks PR exists, stores URL), Aggregate Gate (quorum voting), and enhanced Human Gate (stores approval + shows artifacts).

3. **Extended workflow template**: Create `CODING_WORKFLOW_V2` matching the target pipeline with PR gates, parallel reviewers, and aggregate gate.

4. **Node agent prompt specialization**: Node agents need proper system prompts with git workflow, PR creation, review posting, gate data writing.

5. **Parallel reviewer support**: The workflow needs 3 reviewer nodes that run in parallel, with an aggregate gate requiring all 3 to approve before QA runs.

6. **QA agent step**: Verification agent that checks test coverage, CI status, and PR mergeability.

7. **Human gate UI with canvas visualization**: Live workflow visualization on a canvas. Clicking a human gate opens an artifacts view showing all changes in the worktree. Clicking individual changes renders file diffs. Similar to GitHub Actions visualization but with human-in-the-loop nodes.

8. **Worktree isolation (one per task)**: Currently no worktree isolation exists. Need ONE worktree per task (shared by all agents in that task), with short human-readable folder names (e.g., `alpha-3`, `nova-7`).

9. **Gate data MCP tools**: Agents need `read_gate` and `write_gate` MCP tools to interact with gate data stores.

10. **End-to-end integration testing**: No single test exercises the full pipeline.

## High-Level Approach

**Phase 1 — Gate + Channel architecture and workflow template** (Milestones 1-3):
- Implement gate data store and new gate types (PR, Aggregate, enhanced Human)
- Enhance node agent prompts (git workflow, review posting, PR management, gate interaction)
- Create extended CODING_WORKFLOW_V2 with the full pipeline
- Implement worktree isolation (one per task, short names)

**Phase 2 — QA, human gate UI, and completion** (Milestones 4-6):
- Add QA node to the pipeline
- Build human gate canvas UI with artifacts view and diff rendering
- Wire completion flow so Task Agent reports final status
- Implement conversation-to-task entry point

**Phase 3 — End-to-end testing and hardening** (Milestones 7-9):
- Online integration tests with dev proxy
- E2E Playwright test exercising the full UI flow
- Bug fixes and hardening

## Milestones

1. **Gate data store and new gate types** — Implement the gate data store (persisted to SQLite), `read_gate`/`write_gate` MCP tools, and new gate types: PR Gate, Aggregate Gate, enhanced Human Gate

2. **Enhanced node agent prompts** — Add git/PR/review-specific system prompts for planner, coder, reviewer, and QA agents, including gate data interaction instructions

3. **Extended coding workflow (V2)** — Create CODING_WORKFLOW_V2 with the full pipeline: Planning → [PR Gate] → Plan Review → [Human Gate] → Coding → [PR Gate] → 3 Reviewers (parallel) → [Aggregate Gate] → QA → Done

4. **Worktree isolation (one per task)** — Implement single worktree per task with short human-readable names (e.g., `alpha-3`, `nova-7`), shared by all agents in the task

5. **QA agent node** — Add QA as the verification step before Done, with QA→Code feedback loop

6. **Human gate canvas UI** — Build live workflow canvas visualization with clickable human gates that show artifacts view with file diffs (GitHub Actions-style but with human-in-the-loop)

7. **Online integration test** — Exercise the full happy path with dev proxy, broken into focused per-component sub-tests

8. **E2E test** — Playwright test exercising the full UI flow from space chat through task creation and workflow execution

9. **Bug fixes and hardening** — Fix issues discovered during testing; add error handling and edge case coverage

## Final Workflow Graph

```
Planning ──[PR Gate]──► Plan Review (reviewers) ──[Human Gate]──► Coding ──[PR Gate]──► Reviewer 1 ─┐
▲ Reviewer 2 ─┼─[Aggregate Gate: 3 yes]──► QA ──[Task Result: pass]──► Done
│ Reviewer 3 ─┘ │
│ │
└──────────── [Task Result: fail, cyclic] ────────────────────┘
│ │
└── [Review reject, cyclic]┘
```

**Gate data flow**:
- Planner writes to PR Gate: `{ prUrl, prNumber, branch }`
- Plan reviewers read PR Gate data to find the plan PR
- Human reads gate artifacts view, clicks approve → Human Gate data: `{ approved: true }`
- Coder writes to PR Gate: `{ prUrl, prNumber, branch }`
- Each reviewer writes to Aggregate Gate: `{ votes: { [reviewerId]: 'approve' | 'reject' } }`
- Aggregate Gate evaluates: passes when `Object.values(votes).filter(v => v === 'approve').length >= quorum`
- QA reads PR Gate data, runs tests, writes Task Result Gate data

**All cyclic channels route back to Coding, never to Planning.** This ensures:
- Code-level issues (review feedback, QA failures) are fixed by the Coder directly without re-planning
- The human gate only fires once (Plan Review → Coding), not on every iteration
- The Coder can iterate on feedback from both reviewers and QA independently

**Iteration cap**: `maxIterations` is a global counter on the workflow run, incremented each time ANY cyclic channel is traversed. When the cap is reached, the workflow transitions to `failed` with a `failureReason` of `'maxIterationsReached'`.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: WorkflowRunStatus does not include 'failed'

The plan references failed as a WorkflowRunStatus value here and in multiple other milestones (M6 Task 6.1, M9 Task 9.4). However, the current type in packages/shared/src/types/space.ts:304 only has:

| 'pending' | 'in_progress' | 'completed' | 'cancelled' | 'needs_attention'

The existing status machine maps failure scenarios to 'needs_attention'. Adding 'failed' is a cross-cutting type-system change that needs an explicit task, not just a subtask buried in M6.

Action: Add a dedicated task in M1 to extend WorkflowRunStatus with 'failed' and 'failureReason', and update workflow-run-status-machine.ts accordingly. All downstream milestones reference this status.


## Cross-Milestone Dependencies

- Milestone 1 (gate data store) is the foundation — M2 and M3 depend on it
- Milestone 2 (prompts) depends on M1 (agents need `read_gate`/`write_gate` instructions)
- Milestone 3 (V2 workflow) depends on M1 (new gate types must exist)
- Milestone 4 (worktree) can start in parallel with M2/M3
- Milestone 5 (QA) depends on M3 (V2 workflow template must exist)
- Milestone 6 (human gate UI) depends on M1 (gate data store) and M3 (V2 workflow with human gate)
- Milestone 7 (online test) depends on M5 and M6
- Milestone 8 (E2E test) depends on M6; can start in parallel with M7
- Milestone 9 (hardening) depends on M7 and M8

## V2 Workflow Seeding Strategy

- `CODING_WORKFLOW_V2` is seeded alongside existing workflows (additive, not replacing)
- Existing spaces are not affected (idempotent seeding)
- V2 gets `tag: 'default'` so workflow selector ranks it first for coding-type requests
- Existing `CODING_WORKFLOW` (V1) kept for backward compatibility
- **V1→V2 migration is out of scope**

## Worktree Strategy

- **One worktree per task** (shared by all agents in that task — planner, coder, reviewer, QA all work in the same worktree)
- **Short, human-readable folder names**: `alpha-3`, `nova-7`, `flux-2` — short adjective + dash + number (similar to Codex naming)
- The worktree name does NOT need to associate with session IDs — the DB links everything
- Folder name just needs to be unique and memorable
- Agents work sequentially in the task worktree, so no conflicts

## Total Estimated Task Count

~30 tasks across 9 milestones
Loading
Loading