diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md new file mode 100644 index 000000000..73fd518ce --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md @@ -0,0 +1,356 @@ +# Space Feature: End-to-End Happy Path + +## Goal Summary + +Make the happy path for a single space with a single task using a single workflow work end-to-end: human converses with Space Agent, creates a task, Space Agent selects the default coding workflow, and the workflow runs through the full pipeline with proper gate enforcement, agent-to-agent messaging, and completion detection. + +**Scope constraints**: Single task, single space, single workflow run. No goals/missions involved. + +## Target Workflow Pipeline + +``` +Planning → [check: prUrl exists] → Plan Review (1 reviewer) → [check: approved] → Coding → [check: prUrl exists] → 3 Code Reviewers (parallel) → [check: ≥3 approve votes] → QA → Done +``` + +## Core Architecture: Channels + Gates + +**CRITICAL DESIGN DECISION**: The Space workflow uses a Channel + Gate model instead of a complex state machine. This is fundamentally simpler and more composable than tracking many states with complex transition rules. + +### Channels and Gates Are Separate Concepts + +- A **Channel** is a unidirectional pipe between two nodes: `{ from, to }`. Just a connection. No condition logic. +- A **Gate** is an optional filter attached to a channel: `{ condition, data }`. It controls when communication can flow through the channel. + +A channel **without** a gate is always open — messages flow freely. +A channel **with** a gate is filtered — communication only flows when the gate's condition passes. + +For **bidirectional** communication between two nodes, create TWO channels (one per direction), each with its own optional gate. + +```typescript +// Channel = just a pipe between nodes +interface Channel { + id: string; + from: string; // source node ID + to: string; // target node ID + gateId?: string; // optional — if absent, channel is always open + isCyclic?: boolean; // for feedback loops +} + +// Gate = independent filter entity, referenced by channels via gateId +// A single gate can be shared by multiple channels (e.g., code-pr-gate shared by 3 reviewer channels) +interface Gate { + id: string; + condition: GateCondition; // what to check — composable predicates + data: Record; // persistent data store — agents read/write this + allowedWriterRoles: string[]; // who can write — ['planner'], ['reviewer'], etc. + description: string; // human-readable — "Write your PR URL here after creating the PR" + resetOnCycle: boolean; // whether data is cleared when a cyclic channel fires +} +``` + +**Example — bidirectional planner ↔ reviewer**: +- Channel A: `planner → reviewer`, Gate: `plan-pr-gate` (condition: `check: prUrl exists`) — reviewer can't start until planner creates a PR +- Channel B: `reviewer → planner`, No gate — always open for feedback messages + +### Composable Conditions + +Instead of a type hierarchy, conditions are small pluggable predicates that check gate data: + +```typescript +type GateCondition = + | { type: 'check'; field: string; op?: '==' | '!=' | 'exists'; value?: unknown } // check a single field + | { type: 'count'; field: string; matchValue: unknown; min: number } // count matching entries in a map + | { type: 'all'; conditions: GateCondition[] } // AND — all sub-conditions must pass + | { type: 'any'; conditions: GateCondition[] } // OR — at least one sub-condition must pass +``` + +**No `always` condition type** — a channel without a gate is implicitly always open. Four condition types cover every gate behavior: + +| Gate use case | Condition | Example | +|---------------|-----------|---------| +| PR created | `{ type: 'check', field: 'prUrl', op: 'exists' }` | Passes when `data.prUrl` is truthy | +| Human approval | `{ type: 'check', field: 'approved', op: '==', value: true }` | Passes when `data.approved === true` | +| QA passed | `{ type: 'check', field: 'result', op: '==', value: 'passed' }` | Passes when `data.result === 'passed'` | +| QA failed (cyclic) | `{ type: 'check', field: 'result', op: '==', value: 'failed' }` | Passes when `data.result === 'failed'` | +| Review rejected (cyclic) | `{ type: 'check', field: 'result', op: '==', value: 'rejected' }` | Passes when `data.result === 'rejected'` | +| 3 reviewer votes | `{ type: 'count', field: 'votes', matchValue: 'approve', min: 3 }` | Passes when ≥3 entries in `data.votes` equal `'approve'` | +| Composite AND | `{ type: 'all', conditions: [...] }` | Passes when ALL sub-conditions pass | +| Composite OR | `{ type: 'any', conditions: [...] }` | Passes when ANY sub-condition passes | + +The `all`/`any` types are **recursive** — sub-conditions can themselves be `all`/`any`, enabling arbitrarily complex logic. For the current V2 workflow, simple `check`/`count` conditions suffice, but composite conditions enable future gates like "PR exists AND CI passes" in a single gate. + +**Why this is better**: Channels are simple pipes. Gates are optional filters. No class hierarchy, no separate evaluator per type. One `evaluate(gate)` function with a recursive switch on `condition.type`. Adding a new behavior = defining a new condition config, not a new class. A channel without a gate replaces the old `always` condition type. + +### Structured `send_message` Data + +The `send_message` MCP tool carries structured data alongside natural language text: + +```typescript +{ + text: string, // natural language message + data?: Record // structured data (extensible) +} +``` + +Use cases: +- Reviewer sends feedback: `{ text: 'Approved with suggestions', data: { reviewUrl, gateData: { 'review-votes-gate': { votes: { approve: 1 } } } } }` +- Planner sends plan: `{ text: 'Plan ready for review', data: { planDocPath, prUrl } }` +- Gate data updates can be embedded in message `data` and applied on delivery through the channel + +### Gate Data Store + +Gates persist their data to SQLite in a dedicated `gate_data` table (keyed by `runId + gateId`), separate from channel and gate definitions. This separation ensures: (a) gate data changes frequently during a run while definitions are static, (b) gate data is per-run while definitions are per-workflow, and (c) atomic reads/writes without deserializing a JSON blob. + +**Gate data examples**: +- PR gate: `{ prUrl: 'https://github.com/...', prNumber: 123, branch: 'feat/xyz' }` +- Human approval gate: `{ approved: true, approvedBy: 'user123', approvedAt: '2025-...' }` +- Vote gate: `{ votes: { 'reviewer-1-node': 'approve', 'reviewer-2-node': 'approve', 'reviewer-3-node': 'approve' } }` +- QA result gate: `{ result: 'passed', summary: '...' }` + +### Gate and Channel Discovery + +Agents discover the workflow topology via MCP tools and injected context: +1. **`list_channels` MCP tool**: Returns all channels for the current workflow run with their IDs, from/to nodes, and whether they have a gate attached. +2. **`list_gates` MCP tool**: Returns all gates with their IDs, conditions, descriptions, and current data. +3. **`read_gate` / `write_gate` MCP tools**: Read from and write to gate data stores. +4. **Workflow context injection**: When a node agent is spawned, the `TaskAgentManager` injects a `workflowContext` section into the agent's task message containing: upstream/downstream channel and gate IDs with human-readable descriptions. + +### Gate Write Permissions + +Each gate has an `allowedWriterRoles` list (persisted in the gate definition): +- `plan-pr-gate`: `['planner']` +- `plan-approval-gate`: `['human']` (written via RPC, not MCP tool) +- `code-pr-gate`: `['coder']` +- `review-votes-gate`: `['reviewer']` +- `review-reject-gate`: `['reviewer']` +- `qa-result-gate`: `['qa']` +- `qa-fail-gate`: `['qa']` + +When an unauthorized agent calls `write_gate`, the tool returns an error: `"Permission denied: role '{role}' cannot write to gate '{gateId}'"`. The authorization check uses the agent's `nodeRole` from the MCP server config. + +### WorkflowRunStatus Strategy + +The current `WorkflowRunStatus` type is: `'pending' | 'in_progress' | 'completed' | 'cancelled' | 'needs_attention'`. Rather than adding `'failed'`, **all failure scenarios use the existing `'needs_attention'` status** with a structured `failureReason` field added to `SpaceWorkflowRun`: + +```typescript +failureReason?: 'humanRejected' | 'maxIterationsReached' | 'nodeTimeout' | 'agentCrash'; +``` + +This avoids a cross-cutting type change that would affect the status machine, repository, RPC handlers, and all consumers. + +### Identification and Navigation + +#### Numeric Task IDs + +Tasks use **auto-incrementing numeric IDs** instead of UUIDs, scoped per space: + +```typescript +// SpaceTask table +interface SpaceTask { + id: string; // UUID — internal primary key for FK references (workflow runs, worktrees, etc.) + taskNumber: number; // space-scoped numeric ID — human-facing (auto-assigned via MAX+1) + spaceId: string; // UUID of the parent space + title: string; + // ... other fields +} +``` + +- **Human-friendly**: "task 5" instead of "task 550e8400-e29b-41d4-a716-446655440000" +- **Space-scoped**: uniqueness within a space is sufficient. Cross-space refs use `spaceId + taskNumber`. +- **Monotonically increasing**: auto-assigned via `MAX(task_number) + 1` per space. Numbers are NOT contiguous — gaps may appear when tasks are deleted (e.g., 1, 2, 5 if tasks 3 and 4 were deleted). Like GitHub issue numbers. +- **Easy to reference**: in UI, agent prompts, logs, messages, and GitHub-style references (`neokai-dev#5`) + +#### Space Slugs + +Spaces have a **slug** column as an alternative, human-readable identifier: + +```typescript +interface Space { + id: string; // UUID — primary key (unchanged) + slug: string; // UNIQUE indexed column — auto-generated from name, editable + name: string; + // ... other fields +} +``` + +- **UUID remains primary key** internally — no migration risk +- **Slug is a unique indexed column** — alternative lookup for navigation +- **Auto-generated from space name** on creation using the same slugification rules as worktree slugs (lowercase, hyphens, max 60 chars, collision suffix) +- **Editable** by the user after creation +- **Navigation**: `/space/neokai-dev` instead of `/space/{uuid}` + +#### Reference Hierarchy + +This creates a clean, GitHub-inspired reference model: + +``` +neokai-dev → space (slug) +neokai-dev#5 → task (space slug + numeric ID) +space/add-dark-mode-support → worktree branch (task-title slug) +``` + +### Why This Is Simpler + +Instead of a state machine with many states and complex transition rules, we have: + +1. **Nodes** execute agents (one at a time or in parallel) +2. **Channels** are unidirectional pipes between nodes (bidirectional = two channels) +3. **Gates** are optional filters on channels — all the same type, all the same API +4. The workflow "state" is just: which nodes are active + what data is in each gate + +Adding new behaviors = adding new gates with new condition configs, not new gate classes or state transitions. Making a channel always open = just remove the gate. + +## Current State Analysis + +### What Already Exists (Working Infrastructure) + +1. **Space data model**: `Space`, `SpaceTask`, `SpaceWorkflow`, `SpaceWorkflowRun`, `SpaceAgent` types in `packages/shared/src/types/space.ts` — fully defined with channels, gates, multi-agent nodes. + +2. **Space CRUD**: `SpaceManager`, `SpaceAgentManager`, `SpaceWorkflowManager`, `SpaceTaskManager` — all backed by SQLite repos with reactive DB notifications. + +3. **Built-in workflows**: `CODING_WORKFLOW` (Plan -> Code -> Verify -> Done with human gate), `RESEARCH_WORKFLOW`, `REVIEW_ONLY_WORKFLOW` in `packages/daemon/src/lib/space/workflows/built-in-workflows.ts`. Seeded at space creation time. + +4. **Preset agents**: Coder, General, Planner, Reviewer — seeded via `seedPresetAgents()` at space creation. + +5. **Channel routing**: `ChannelRouter` with gate evaluation (`always`, `human`, `condition`, `task_result`), `ChannelResolver` for channel topology, `ChannelGateEvaluator`. + +6. **Agent-centric messaging**: Node agents use `send_message` (channel-validated), `report_done`, `list_peers`, `list_reachable_agents` via MCP tools. + +7. **Task Agent**: Session-level orchestrator (`TaskAgentManager`) that spawns sub-sessions per workflow node, monitors completion via `CompletionDetector`, handles lazy node activation. + +8. **Custom agent factory**: `createCustomAgentInit()` builds `AgentSessionInit` from `SpaceAgent` config with proper system prompts, tools, and role-based defaults. + +9. **Space Runtime**: `SpaceRuntime` with tick loop, executor map, rehydration, completion detection, and notification sink. + +10. **Space chat agent**: Conversational coordinator in `packages/daemon/src/lib/space/agents/space-chat-agent.ts` that can `start_workflow_run`, `create_standalone_task`, `suggest_workflow`, `list_workflows`, etc. + +11. **E2E tests**: Space creation, workflow visual editor, multi-agent editor, export/import, agent-centric workflow tests. + +12. **Online tests**: `task-agent-lifecycle.test.ts`, `space-agent-coordination.test.ts`. + +### What Needs to Be Built / Fixed + +1. **Separate Channel + Gate architecture**: The existing implementation couples gates into channels. Refactor so channels are simple pipes and gates are independent entities optionally attached to channels. A channel without a gate is always open. + +2. **Composable conditions**: Replace the current per-type evaluator logic with four condition types (`check`, `count`, `all`, `any`) that cover all workflow behaviors including AND/OR composition. No `always` type — a channel without a gate is implicitly always open. + +3. **Extended workflow template**: Create `CODING_WORKFLOW_V2` matching the target pipeline with gates configured via conditions. + +4. **Node agent prompt specialization**: Node agents need proper system prompts with git workflow, PR creation, review posting, gate data writing. + +5. **Parallel reviewer support**: The workflow needs 3 reviewer nodes that run in parallel, with a vote-counting gate requiring all 3 to approve before QA runs. + +6. **QA agent step**: Verification agent that checks test coverage, CI status, and PR mergeability. + +7. **Approval gate UI with canvas visualization**: Live workflow visualization on a canvas. Clicking an approval gate (`plan-approval-gate`) opens an artifacts view showing all changes in the worktree. + +8. **Worktree isolation (one per task)**: Currently no worktree isolation exists. Need ONE worktree per task (shared by all agents in that task), with folder/branch names derived from the task title via slugification (e.g., task "Add dark mode support" → folder `add-dark-mode-support`, branch `space/add-dark-mode-support`). + +9. **Channel and Gate MCP tools**: Agents need `list_channels`, `list_gates`, `read_gate`, `write_gate` MCP tools to discover the workflow topology and interact with gate data stores. `send_message` needs structured `data` field alongside text. + +10. **End-to-end integration testing**: No single test exercises the full pipeline. + +11. **Numeric task IDs**: Replace UUID-based task IDs with auto-incrementing numeric IDs (space-scoped). Human-friendly: "task 5" instead of "task 550e8400-e29b...". + +12. **Space slugs**: Add slug column to spaces for human-readable navigation (`/space/neokai-dev` instead of `/space/{uuid}`). Enables GitHub-style task references: `neokai-dev#5`. + +## High-Level Approach + +**Phase 1 — Unified Gate architecture and workflow template** (Milestones 1-3): +- Implement separated Channel + Gate architecture with composable conditions and data store +- Enhance node agent prompts with gate interaction instructions +- Create extended CODING_WORKFLOW_V2 with the full pipeline +- Implement worktree isolation (one per task, short names) + +**Phase 2 — QA, approval gate UI, and completion** (Milestones 4-6): +- Add QA node to the pipeline +- Build approval gate canvas UI with artifacts view and diff rendering +- Wire completion flow so Task Agent reports final status +- Implement conversation-to-task entry point + +**Phase 3 — End-to-end testing and hardening** (Milestones 7-9): +- Online integration tests with dev proxy +- E2E Playwright test exercising the full UI flow +- Bug fixes and hardening + +## Milestones + +1. **Core architecture — Channels, Gates, and data model** — Implement channels as simple pipes and gates as independent entities with persistent data stores, four condition types (`check`, `count`, `all`/`any` for AND/OR composition), MCP tools (`list_channels`/`list_gates`/`read_gate`/`write_gate`, structured `send_message`), channel router integration, numeric task IDs (space-scoped auto-increment), and space slugs (human-readable navigation) + +2. **Enhanced node agent prompts** — Add git/PR/review-specific system prompts for planner, coder, reviewer, and QA agents, including gate data interaction instructions + +3. **Extended coding workflow (V2)** — Create CODING_WORKFLOW_V2 with the full pipeline using separated channels and gates with composable conditions + +4. **Worktree isolation (one per task)** — Implement single worktree per task with task-title-derived slug names (e.g., `add-dark-mode-support`), shared by all agents in the task + +5. **QA agent node** — Add QA as the verification step before Done, with QA→Code feedback loop + +6. **Approval gate canvas UI** — Build live workflow canvas visualization with clickable approval gates (`plan-approval-gate`) that show artifacts view with file diffs (GitHub Actions-style but with human-in-the-loop) + +7. **Online integration test** — Exercise the full happy path with dev proxy, broken into focused per-component sub-tests + +8. **E2E test** — Playwright test exercising the full UI flow from space chat through task creation and workflow execution + +9. **Bug fixes and hardening** — Fix issues discovered during testing; add error handling and edge case coverage + +## Final Workflow Graph + +``` +Planning ──[check: prUrl exists]──► Plan Review (1 reviewer) ──[check: approved]──► Coding ──[check: prUrl exists]──► Reviewer 1 ─┐ + ▲ Reviewer 2 ─┼─[count: votes.approve ≥ 3]──► QA ──[check: result == passed]──► Done + │ Reviewer 3 ─┘ │ + │ │ + └──────────── [check: result == failed, cyclic] ────────────────────────────────────────────────┘ + │ │ + └── [check: result == rejected, cyclic]┘ +``` + +**Gate data flow**: +- Planner writes `{ prUrl, prNumber, branch }` → `plan-pr-gate` condition `check: prUrl exists` passes +- Plan reviewer reads plan PR from `plan-pr-gate` data +- Human clicks approve → `plan-approval-gate` data gets `{ approved: true }` → condition `check: approved == true` passes +- Coder writes `{ prUrl, prNumber, branch }` → `code-pr-gate` condition `check: prUrl exists` passes +- Each reviewer writes `{ votes: { [nodeId]: 'approve' | 'reject' } }` → `review-votes-gate` condition `count: votes.approve >= 3` passes when quorum met +- QA reads PR from `code-pr-gate` data, writes `{ result: 'passed' | 'failed', summary: '...' }` → `qa-result-gate` + +**All cyclic channels route back to Coding, never to Planning.** This ensures: +- Code-level issues (review feedback, QA failures) are fixed by the Coder directly without re-planning +- The approval gate (`plan-approval-gate`) only fires once (Plan Review → Coding), not on every iteration +- The Coder can iterate on feedback from both reviewers and QA independently + +**Iteration cap**: `maxIterations` is a global counter on the workflow run, incremented each time ANY cyclic channel is traversed. When the cap is reached, the workflow transitions to `needs_attention` with `failureReason: 'maxIterationsReached'`. + +**Gate data reset on cycles**: When a cyclic channel fires, gates with `resetOnCycle: true` have their data cleared to `{}`. This ensures reviewers must re-vote from scratch after the Coder fixes issues. Gates with `resetOnCycle: false` (like `code-pr-gate`) preserve their data across cycles. See M1 Task 1.4 for implementation. + +## Cross-Milestone Dependencies + +- Milestone 1 (channels + gates) is the foundation — M2 and M3 depend on it +- Milestone 2 (prompts) depends on M1 (agents need gate MCP tools) AND M3 (prompts reference specific gate IDs). **M2 should be implemented after M3.** +- Milestone 3 (V2 workflow) depends on M1 (unified gate must exist) +- Milestone 4 (worktree) can start in parallel with M2/M3 +- Milestone 5 (QA) depends on M3 (V2 workflow template must exist) +- Milestone 6 (approval gate UI) depends on M1 (gate data store) and M3 (V2 workflow with `plan-approval-gate`) +- Milestone 7 (online test) depends on M5 and M6 +- Milestone 8 (E2E test) depends on M6; can start in parallel with M7 +- Milestone 9 (hardening) depends on M7 and M8 + +## V2 Workflow Seeding Strategy + +- `CODING_WORKFLOW_V2` is seeded alongside existing workflows (additive, not replacing) +- Existing spaces are not affected (idempotent seeding) +- V2 gets `tag: 'default'` so workflow selector ranks it first for coding-type requests +- Existing `CODING_WORKFLOW` (V1) kept for backward compatibility +- **V1→V2 migration is out of scope** + +## Worktree Strategy + +- **One worktree per task** (shared by all agents in that task — planner, coder, reviewer, QA all work in the same worktree) +- **Task-title-derived slug names**: The worktree folder name is a slugified version of the task title (e.g., task "Add dark mode support" → `add-dark-mode-support`). This is self-documenting — you can see which worktree belongs to which task at a glance. +- Slugification: lowercase, hyphens for spaces/special chars, max 60 chars, collision suffix (`-2`, `-3`) if needed +- Agents work sequentially in the task worktree, so no conflicts +- **Branch naming**: `space/{slug}` (e.g., `space/add-dark-mode-support`) — same slug as the folder name, making worktrees and branches easy to correlate +- **Cleanup timing**: Worktrees are kept until the PR is merged or the task is explicitly deleted by the human. A TTL-based reaper (default: 7 days after workflow completion) cleans up stale worktrees. Immediate cleanup only on task cancellation. + +## Total Estimated Task Count + +~37 tasks across 9 milestones diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md new file mode 100644 index 000000000..9848b131f --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md @@ -0,0 +1,305 @@ +# Milestone 1: Core Architecture — Channels, Gates, and Data Model + +## Goal and Scope + +Implement the core Channel + Gate architecture and foundational data model changes. Channels are simple unidirectional pipes; gates are independent entities with persistent data stores and composable conditions. No class hierarchy of gate types — one Gate concept, four condition types (including `all`/`any` composition), one set of MCP tools. Also adds numeric task IDs and space slugs for human-friendly identification. This is the foundation that all other milestones build on. + +## Architecture + +### Channels and Gates Are Separate Concepts + +A **Channel** is a unidirectional pipe between two nodes. A **Gate** is an independent filter entity referenced by channels via `gateId`. A single gate can be shared by multiple channels (e.g., `code-pr-gate` is shared by 3 reviewer channels). + +```typescript +// In packages/shared/src/types/space.ts + +// Channel = just a pipe between nodes +interface Channel { + id: string; + from: string; // source node ID + to: string; // target node ID + gateId?: string; // optional — references a Gate by ID. If absent, channel is always open. + isCyclic?: boolean; // for feedback loops +} + +// Gate = independent filter entity, referenced by channels via gateId +// No back-reference to channel — a gate can be shared by multiple channels +interface Gate { + id: string; // e.g., 'plan-pr-gate', 'review-votes-gate' + condition: GateCondition; // composable predicate — NOT a class hierarchy + data: Record; // persistent data store (SQLite) + allowedWriterRoles: string[]; // who can write — ['planner'], ['reviewer'], etc. + description: string; // human-readable — injected into agent task messages + resetOnCycle: boolean; // whether data resets when a cyclic channel fires +} + +// Four condition types cover ALL gate behaviors (including composition) +// No 'always' type — a channel without a gate is implicitly always open +type GateCondition = + | { type: 'check'; field: string; op?: '==' | '!=' | 'exists'; value?: unknown } + | { type: 'count'; field: string; matchValue: unknown; min: number } + | { type: 'all'; conditions: GateCondition[] } // AND — all must pass + | { type: 'any'; conditions: GateCondition[] } // OR — at least one must pass +``` + +### Bidirectional Communication + +Each channel is unidirectional. For bidirectional flow, create TWO channels (one per direction), each with its own optional gate: + +``` +planner ──[plan-pr-gate]──► reviewer (gated: reviewer can't start until PR exists) +planner ◄─────────────────── reviewer (no gate: feedback flows freely) +``` + +### Structured `send_message` Data + +`send_message` carries structured data alongside natural language text: + +```typescript +{ + text: string, // natural language message + data?: Record // structured data (extensible) +} +``` + +Gate data updates can be embedded in message `data` and applied on delivery through the channel. + +### Condition Evaluation + +One `evaluate(gate)` function with a switch on `condition.type`: + +- **`check`**: Checks a single field in `gate.data`. + - `op: 'exists'` (default if no `op`): `data[field] != null && data[field] !== ''` + - `op: '=='`: `data[field] === value` + - `op: '!='`: `data[field] !== value` +- **`count`**: Counts entries in a map field that match a value. + - `Object.values(data[field] || {}).filter(v => v === matchValue).length >= min` +- **`all`**: AND composition — `conditions.every(c => evaluate(c, gate.data))`. + - Empty `conditions` array returns `true` (vacuous truth). +- **`any`**: OR composition — `conditions.some(c => evaluate(c, gate.data))`. + - Empty `conditions` array returns `false`. + +For channels without a gate: no evaluation needed — always open. + +### How Each Workflow Gate Maps to Conditions + +| Gate ID | Referenced by Channel(s) | Condition Config | Passes when... | +|---------|------------------------|-----------------|----------------| +| `plan-pr-gate` | `ch-plan-to-review` | `{ type: 'check', field: 'prUrl' }` | Planner writes PR URL | +| `plan-approval-gate` | `ch-review-to-coding` | `{ type: 'check', field: 'approved', op: '==', value: true }` | Human approves | +| `code-pr-gate` | `ch-coding-to-rev1`, `ch-coding-to-rev2`, `ch-coding-to-rev3` (shared) | `{ type: 'check', field: 'prUrl' }` | Coder writes PR URL | +| `review-votes-gate` | `ch-rev1-to-qa`, `ch-rev2-to-qa`, `ch-rev3-to-qa` (shared) | `{ type: 'count', field: 'votes', matchValue: 'approve', min: 3 }` | ≥3 reviewers approve | +| `review-reject-gate` | `ch-rev-to-coding` | `{ type: 'check', field: 'result', op: '==', value: 'rejected' }` | Any reviewer rejects | +| `qa-result-gate` | `ch-qa-to-done` | `{ type: 'check', field: 'result', op: '==', value: 'passed' }` | QA passes | +| `qa-fail-gate` | `ch-qa-to-coding` | `{ type: 'check', field: 'result', op: '==', value: 'failed' }` | QA fails | + +**Note**: These are all the same Gate entity with different condition configs. No `PRGate`, `AggregateGate`, `HumanGate` classes. Channels without gates (e.g., feedback channels) are always open. + +## Tasks + +### Task 1.1: Implement Separated Channel + Gate Types and Data Store Schema + +**Description**: Replace the existing coupled gate/channel system in `packages/shared/src/types/space.ts` with separated `Channel` and `Gate` interfaces. Add the `gate_data` SQLite table for persistent data stores. + +**Subtasks**: +1. Audit the existing gate types in `packages/shared/src/types/space.ts` — currently supports `always`, `human`, `condition`, `task_result` as separate types coupled into channels +2. Define the `Channel` interface: `{ id, from, to, gateId?, isCyclic? }` — a simple unidirectional pipe. No condition logic. A channel without `gateId` is always open. +3. Define the `Gate` interface: `{ id, condition: GateCondition, data, allowedWriterRoles, description, resetOnCycle }` — an independent entity referenced by channels via `gateId`. No back-reference to channel — a gate can be shared by multiple channels (e.g., `code-pr-gate` shared by 3 reviewer channels). +4. Define the `GateCondition` discriminated union with four types: `check`, `count`, `all` (AND composition), `any` (OR composition). No `always` type — a channel without a gate is implicitly always open. The `all`/`any` types are recursive — `conditions` is `GateCondition[]`, enabling arbitrarily nested logic. +5. Create a dedicated `gate_data` table in SQLite keyed by `(run_id, gate_id)` with a JSON `data` column. Rationale: (a) gate data changes frequently during a run while gate definitions are static, (b) gate data is per-run while gate definitions are per-workflow template, (c) separate table enables atomic reads/writes without JSON blob deserialization, (d) concurrent writes (e.g., 3 reviewers voting) benefit from row-level granularity. +6. Add `allowedWriterRoles: string[]` to the gate definition schema (static, per-gate) +7. Add `resetOnCycle: boolean` to the gate definition schema — controls whether data is cleared on cyclic channel traversal +8. Update `send_message` MCP tool to accept structured data alongside text: `{ text: string, data?: Record }`. Gate data updates can be embedded in message `data`. +9. Add `failureReason` optional field to `SpaceWorkflowRun` interface: `failureReason?: 'humanRejected' | 'maxIterationsReached' | 'nodeTimeout' | 'agentCrash'`. All failure scenarios use existing `'needs_attention'` status with this field. +10. Migrate existing gate definitions to the new separated format. Map each old gate type: + - `always` → remove gate, set channel's `gateId` to `undefined` (channel without gate = always open) + - `human` → gate with `{ type: 'check', field: 'approved', op: '==', value: true }` + - `condition` (shell expression) → gate with `{ type: 'check', field: 'result', op: '==', value: 'passed' }`. The old shell-expression-based condition evaluation is replaced by explicit field checks — agents write the condition result to the gate data store instead of the system evaluating a shell expression. + - `task_result` → gate with `{ type: 'check', field: 'result', op: '==', value: 'passed' }` (same mapping as `condition` — both checked a result value) + - Ensure backward compatibility: existing workflow runs in progress at migration time should continue to work. For in-flight runs, populate `gate_data` table from the old gate state. +11. Unit tests: type validation, schema creation, data persistence round-trip, gate_data table CRUD, channel-without-gate routing, backward-compatible migration + +**Acceptance Criteria**: +- Channels and gates are separate entities — channels are simple pipes, gates are optional filters +- A channel without a gate is always open (replaces old `always` condition type) +- Four condition types (`check`, `count`, `all`, `any`) cover all gate behaviors including composition +- `send_message` accepts structured `data` field +- Gate data persisted to SQLite `gate_data` table and survives daemon restart +- Existing definitions are migrated to separated format +- Unit tests verify persistence round-trip, gateless channels, and migration + +**Depends on**: nothing + +**Agent type**: coder + +--- + +### Task 1.2: Implement Unified Gate Evaluator + +**Description**: Implement a single `evaluate(gate)` function that handles all four condition types (including recursive `all`/`any`). Replace the existing per-type evaluator logic in `ChannelGateEvaluator`. For channels without a gate, no evaluation is needed — they are always open. + +**Subtasks**: +1. Create `evaluateGate(gate: Gate): boolean` function: + - Switch on `gate.condition.type` + - `check` → read `gate.data[field]`, apply op (`exists`, `==`, `!=`) + - `count` → read `gate.data[field]` as a map, count values matching `matchValue`, check `>= min` + - `all` → recursively evaluate all sub-conditions, return `true` only if ALL pass. Empty array → `true` (vacuous truth). + - `any` → recursively evaluate all sub-conditions, return `true` if ANY passes. Empty array → `false`. +2. Add `isChannelOpen(channel: Channel): boolean` helper: if `channel.gateId` is absent → return `true` (always open); otherwise → look up gate and call `evaluateGate()` +3. Refactor `ChannelGateEvaluator` to call `isChannelOpen()` instead of per-type logic +4. Ensure the evaluator reads from the gate's `data` store (from `gate_data` table), not from workflow run config +5. Handle edge cases: missing field → `check` with `exists` returns false; missing map field → `count` returns 0 +6. Remove the old per-type evaluator code paths (`human`, `pr`, `aggregate`, `task_result`, `always` as separate branches) +7. Unit tests for each condition type with various data states, including edge cases (null data, empty map, missing field) +8. Unit tests for composite conditions: `all` with mixed pass/fail sub-conditions, `any` with mixed pass/fail, nested `all`/`any`, empty arrays +9. Unit tests for gateless channels: verify `isChannelOpen()` returns `true` when no gate is attached + +**Acceptance Criteria**: +- Single `evaluateGate()` function handles all conditions including recursive `all`/`any` +- Channels without a gate are always open (no `always` condition type needed) +- No separate evaluator per gate type — one code path with a 4-way switch +- All existing gate behaviors continue to work (verified by backward-compat tests) +- Unit tests cover all condition types, edge cases, and gateless channels + +**Depends on**: Task 1.1 + +**Agent type**: coder + +--- + +### Task 1.3: Implement Channel and Gate MCP Tools + +**Description**: Create MCP tools that allow node agents to discover channels and gates, read from and write to gate data stores, and send structured messages. These tools are added to the `node-agent-tools` MCP server. All gates use the same tools — no type-specific APIs. + +**Subtasks**: +1. Add `list_channels` MCP tool to `node-agent-tools`: + - Parameters: none (uses the current workflow run context from the MCP server config) + - Returns: array of `{ channelId, from, to, gateId?, isCyclic }` for all channels in the run + - Agents call this at session start to understand the workflow topology +2. Add `list_gates` MCP tool to `node-agent-tools`: + - Parameters: none + - Returns: array of `{ gateId, condition, description, allowedWriterRoles, currentData }` for all gates in the run +3. Add `read_gate` MCP tool to `node-agent-tools`: + - Parameters: `{ gateId: string }` + - Returns: the gate's current `data` object from the `gate_data` table +4. Add `write_gate` MCP tool to `node-agent-tools`: + - Parameters: `{ gateId: string, data: Record }` (merge semantics — new keys added, existing keys updated) + - **Authorization check**: reads calling agent's `nodeRole` from MCP server config, compares against gate's `allowedWriterRoles`. Unauthorized → error: `"Permission denied: role '{role}' cannot write to gate '{gateId}'"` + - Persists updated data to `gate_data` table + - Triggers gate re-evaluation (may unblock the channel this gate is attached to) +5. Update `send_message` MCP tool to accept structured data: `{ text: string, data?: Record }`. The `data` field is extensible and can carry gate data updates, PR URLs, review metadata, etc. On delivery through a channel, if `data` contains gate-targeted updates, they are applied to the appropriate gate's data store. +6. Wire tools into `TaskAgentManager` with workflow run context (runId, channel definitions, gate definitions) +7. **Workflow context injection**: When `TaskAgentManager.spawnSubSession()` creates a node agent, inject `workflowContext` into the task message containing: upstream/downstream channel IDs and gate IDs, condition descriptions, and human-readable instructions (e.g., "code-pr-gate: write your PR URL here after creating the PR") +8. **Vote keys**: For gates using `count` condition (vote counting), use `nodeId` (not `agentId`) as the map key. Prevents collision if an agent is re-spawned after a crash. +9. Unit tests: list_channels, list_gates, read/write round-trip, permission enforcement, gate re-evaluation on write, structured send_message data delivery, vote key collision handling + +**Acceptance Criteria**: +- `list_channels` returns all channels; `list_gates` returns all gates — separate queries reflecting the separated architecture +- All gates use the same `read_gate`/`write_gate` tools — no type-specific APIs +- `send_message` accepts structured `data` alongside text +- Writing to a gate triggers re-evaluation (may unblock the attached channel) +- Permission model prevents unauthorized writes (clear error message) +- Workflow context injection provides channel and gate IDs in task message +- Unit tests verify all tool behaviors + +**Depends on**: Task 1.2 + +**Agent type**: coder + +--- + +### Task 1.4: Integrate Separated Channels + Gates with Channel Router + +**Description**: Update the `ChannelRouter` to use the separated channel/gate architecture. Channels without gates are always open. Channels with gates use `evaluateGate()` for routing decisions. Implement gate data reset on cyclic traversal using the `resetOnCycle` flag. + +**Subtasks**: +1. Update `ChannelRouter.deliverMessage()` to use `isChannelOpen(channel)`: if channel has no gate → always deliver; if channel has a gate → call `evaluateGate(gate)` using the gate's data store +2. Add `onGateDataChanged(gateId)` method that triggers re-evaluation of the channel the gate is attached to +3. When a gated channel transitions from blocked → open, activate the target node. Gateless channels activate the target node immediately. +4. Handle vote-counting gates: multiple agents write to the same gate. Each write triggers re-evaluation, but only the final vote meeting the `min` threshold unblocks the channel. +5. **Implement `resetOnCycle` behavior**: When the `ChannelRouter` traverses a cyclic channel, reset the `data` to `{}` for all downstream gates where `resetOnCycle === true`. Specifically in the V2 workflow: + - `review-votes-gate` (`resetOnCycle: true`) → resets to `{}` — all 3 reviewers must re-vote + - `review-reject-gate` (`resetOnCycle: true`) → resets to `{}` + - `qa-result-gate` (`resetOnCycle: true`) → resets to `{}` + - `qa-fail-gate` (`resetOnCycle: true`) → resets to `{}` + - `code-pr-gate` (`resetOnCycle: false`) → **preserved** (PR URL doesn't change) + - The reset is atomic with the cyclic traversal (same SQLite transaction) +6. Ensure gate data changes are persisted before evaluation (SQLite transactions for atomic read-evaluate-write) +7. Handle concurrent writes (e.g., 3 reviewers voting simultaneously): serialize via SQLite write lock, re-evaluate after each write +8. Unit tests: gateless channel always delivers, gated channel blocks until condition passes, gate transition triggers node activation, vote-counting gate with incremental writes, concurrent write handling, **resetOnCycle behavior** (verify data cleared for resetOnCycle:true, preserved for resetOnCycle:false) + +**Acceptance Criteria**: +- Channel router uses `isChannelOpen()` for all routing — gateless channels always open, gated channels use `evaluateGate()` +- Gate data changes trigger re-evaluation of the attached channel and potential node activation +- Vote-counting gates handle incremental writes correctly +- `resetOnCycle` flag controls which gates are cleared on cyclic traversal +- No race conditions (SQLite transactions) +- Concurrent writes serialized correctly +- Unit tests cover all scenarios including gateless channels and reset behavior + +**Depends on**: Task 1.3 + +**Agent type**: coder + +--- + +### Task 1.5: Implement Numeric Task IDs + +**Description**: Replace UUID-based task IDs with auto-incrementing numeric IDs, scoped per space. This makes tasks human-friendly ("task 5" instead of "task 550e8400-...") and enables GitHub-style references (`neokai-dev#5`). + +**Subtasks**: +1. Update `SpaceTask` schema: change `id` column from UUID `TEXT` to `INTEGER PRIMARY KEY AUTOINCREMENT`. The auto-increment counter is space-scoped — each space has its own sequence. +2. **Implementation approach**: Add a `task_number` column (`INTEGER NOT NULL`) to the existing `space_tasks` table, with a `UNIQUE(space_id, task_number)` constraint. The UUID `id` column stays as the internal primary key for foreign key references (workflow runs, worktrees, etc.), while `task_number` is the human-facing numeric ID. This avoids a risky migration of all FK references from UUID to integer. +3. Auto-assign `task_number` on task creation: `SELECT COALESCE(MAX(task_number), 0) + 1 FROM space_tasks WHERE space_id = ?`. Use a transaction to prevent race conditions. +4. Update `SpaceTask` TypeScript interface to include `taskNumber: number` +5. Update `SpaceTaskManager.createTask()` to auto-assign the numeric ID +6. Update all RPC handlers that return tasks to include `taskNumber` +7. Add `getTaskByNumber(spaceId, taskNumber)` lookup method to `SpaceTaskManager` +8. Update agent prompts and workflow context to use numeric task IDs (e.g., "Working on task #5") +9. Unit tests: auto-increment, space-scoped uniqueness, concurrent creation, lookup by number + +**Acceptance Criteria**: +- Tasks have auto-incrementing numeric IDs scoped per space +- Numeric IDs are monotonically increasing within a space (gaps may appear on deletion, like GitHub issue numbers) +- Agents and UI reference tasks by number ("task #5") +- UUID remains as internal primary key for FK references +- Lookup by `(spaceId, taskNumber)` works +- Unit tests verify auto-increment and uniqueness + +**Depends on**: nothing (parallel with other M1 tasks) + +**Agent type**: coder + +--- + +### Task 1.6: Implement Space Slugs + +**Description**: Add a `slug` column to spaces as an alternative, human-readable identifier for navigation. Auto-generated from the space name, editable by the user. + +**Subtasks**: +1. Add `slug TEXT NOT NULL UNIQUE` column to the `spaces` table. Add a unique index for fast lookup. +2. Create the shared `slugify()` utility in `packages/daemon/src/lib/space/slug.ts`: + - `slugify(input: string, existingSlugs: string[]): string` — converts any input string to a valid slug, appending `-2`/`-3`/etc. if the base slug already exists + - Slugification rules: lowercase, replace spaces/non-alphanumeric with hyphens, collapse consecutive hyphens, strip leading/trailing hyphens, truncate to max 60 chars at word boundary, collision suffix (`-2`, `-3`) + - If input is empty or produces an empty slug, fall back to `unnamed-space` + - This utility is reused by M4 Task 4.1 (worktree naming) and any future slug needs +3. Auto-generate slug from space name on creation: `slugify(spaceName, existingSlugs)` +4. Add `updateSlug(spaceId, newSlug)` method to `SpaceManager` — validates uniqueness, format (lowercase, hyphens, max 60 chars) +5. Add `getSpaceBySlug(slug)` lookup method to `SpaceManager` +6. Update `space.create` RPC to auto-generate slug; add `space.updateSlug` RPC for user editing +7. Update frontend routing to support `/space/{slug}` alongside `/space/{uuid}` — try slug lookup first, fall back to UUID +8. Backfill slugs for existing spaces (migration): generate slug from `name` for each space, handle collisions with suffix +9. Unit tests: auto-generation, uniqueness, editable slug update, lookup by slug, backfill migration, collision handling + +**Acceptance Criteria**: +- Spaces have a unique slug column (e.g., `neokai-dev`) +- Slug auto-generated from name on creation +- Slug editable by user (with uniqueness validation) +- Navigation works via `/space/{slug}` +- Existing spaces get backfilled slugs +- GitHub-style task references possible: `neokai-dev#5` (space slug + task number) +- Unit tests verify all behaviors + +**Depends on**: nothing (parallel with other M1 tasks) + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/02-enhanced-node-agent-prompts.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/02-enhanced-node-agent-prompts.md new file mode 100644 index 000000000..ba1448483 --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/02-enhanced-node-agent-prompts.md @@ -0,0 +1,107 @@ +# Milestone 2: Enhanced Node Agent Prompts + +## Goal and Scope + +Upgrade the system prompts for planner, coder, reviewer, and QA node agents in the Space system. Prompts must include git workflow, PR management, review posting, and — critically — instructions for interacting with gate data stores via `list_gates`/`read_gate`/`write_gate` MCP tools. + +**Dependency note**: M2 depends on both M1 (unified gate with MCP tools) and M3 (V2 workflow template). Prompts reference specific gate IDs from the V2 workflow (e.g., `code-pr-gate`, `review-votes-gate`). **Implement M3 before M2** so the concrete gate IDs exist. The prompts use the gate IDs injected via workflow context (M1 Task 1.3 subtask 5) and reference them by the `description` field from the gate definitions. Since all gates use the same `read_gate`/`write_gate` tools, prompts don't need type-specific instructions — just "write data to gate X". + +**Gate discovery pattern**: All prompts include a standard preamble: "At session start, call `list_gates` to discover available gates and their IDs. Your task message also includes a `workflowContext` block with your upstream/downstream gate IDs." + +## Tasks + +### Task 2.1: Enhance Coder Node Agent System Prompt + +**Description**: Update `buildCustomAgentSystemPrompt()` in `packages/daemon/src/lib/space/agents/custom-agent.ts` to include full git workflow instructions, PR creation, and gate data writing — mirroring the Room system's `buildCoderSystemPrompt()`. + +**Subtasks**: +1. Read `packages/daemon/src/lib/room/agents/coder-agent.ts` (`buildCoderSystemPrompt()`) and identify all prompt sections +2. Add bypass markers section (RESEARCH_ONLY, VERIFICATION_COMPLETE, etc.) for role 'coder' +3. Add review feedback handling: how to fetch GitHub reviews, verify feedback, push fixes +4. Add PR creation flow with duplicate prevention (`gh pr list --head`) +5. **Add gate interaction instructions**: After creating a PR, the coder must call `write_gate` on `code-pr-gate` to write PR data (`{ prUrl, prNumber, branch }`). The gate's `check: prUrl exists` condition then passes, unblocking the reviewer channel. Same `write_gate` tool as every other gate — no type-specific API. +6. Add instructions for reading upstream gate data: the coder should call `read_gate` on `plan-pr-gate` to understand the plan before coding + +**Acceptance Criteria**: +- Coder agents produce same quality git/PR workflow as Room coder agents +- Coder writes PR data to gate after creating PR (triggers reviewer activation) +- Coder reads plan gate data to understand the plan +- Unit tests pass for updated prompt builder + +**Depends on**: Milestone 1 (gate MCP tools) and Milestone 3 (V2 workflow template with concrete gate IDs) + +**Agent type**: coder + +--- + +### Task 2.2: Enhance Planner Node Agent System Prompt + +**Description**: Create a specialized planner prompt that includes plan document creation, PR management, and gate data writing. + +**Subtasks**: +1. Add `buildPlannerNodeAgentPrompt()` in `custom-agent.ts` for role 'planner' +2. Include plan document creation instructions (explore codebase, write plan, create PR) +3. **Add gate interaction instructions**: After creating a plan PR, the planner must call `write_gate` on `plan-pr-gate` to write PR data (`{ prUrl, prNumber, branch }`). The gate's `check: prUrl exists` condition then passes, unblocking the plan review channel. +4. Add instructions for `send_message` to communicate with plan reviewers +5. Ensure the prompt works with `injectWorkflowContext` flag + +**Acceptance Criteria**: +- Planner creates plan documents on feature branches with PRs +- Planner writes PR data to `plan-pr-gate` (triggers plan review activation) +- Unit tests cover the new prompt builder + +**Depends on**: Milestone 1 (gate MCP tools) + +**Agent type**: coder + +--- + +### Task 2.3: Enhance Reviewer Node Agent System Prompt + +**Description**: Create a specialized reviewer prompt for posting PR reviews with severity classification and writing votes to the `review-votes-gate`. + +**Subtasks**: +1. Add `buildReviewerNodeAgentPrompt()` in `custom-agent.ts` for role 'reviewer' +2. Include PR review process: read changed files, evaluate correctness/completeness/security +3. Add review posting via REST API (`GH_PAGER=cat gh api repos/{owner}/{repo}/pulls/{pr}/reviews`) +4. Add structured output format: `---REVIEW_POSTED---` block with URL, recommendation, severity counts +5. **Add gate interaction instructions**: Reviewer reads `code-pr-gate` (via `read_gate`) to find the PR URL, then after reviewing, writes its vote to `review-votes-gate` via `write_gate` using its **nodeId** as the vote key: `{ votes: { [nodeId]: 'approve' | 'reject' } }`. Using nodeId (not agentId) prevents collision on re-spawn. The gate's `count: votes.approve >= 3` condition evaluates after each write. +6. When 3 reviewers all write 'approve', the `review-votes-gate` condition passes and QA is activated +7. **Add edge case guidance**: Instruct the reviewer to check current vote state via `read_gate` on `review-votes-gate` before voting. If re-spawned, check if already voted and update/confirm. + +**Acceptance Criteria**: +- Reviewer reads PR URL from gate data +- Reviewer posts proper PR reviews with severity classification +- Reviewer writes vote to `review-votes-gate` +- Unit tests cover the prompt builder + +**Depends on**: Milestone 1 (gate MCP tools) + +**Agent type**: coder + +--- + +### Task 2.4: Create QA Agent System Prompt + +**Description**: Build a specialized system prompt for the QA agent that checks test coverage, runs tests, verifies CI status, and writes results to `qa-result-gate`. + +**Subtasks**: +1. Add `buildQaNodeAgentPrompt()` in `custom-agent.ts` for role 'qa' +2. Include instructions for: + - Test command detection (package.json scripts, Makefile targets, fallback commands) + - Checking CI status via `gh pr checks` or `gh pr view --json statusCheckRollup` + - Verifying PR mergeability via `gh pr view --json mergeable,mergeStateStatus` + - Checking for merge conflicts +3. **Add gate interaction instructions**: QA reads `code-pr-gate` to find the PR, then writes result to `qa-result-gate` via `write_gate({ result: 'passed' | 'failed', summary: '...' })`. The gate's `check: result == passed` condition evaluates after the write. +4. Include structured output format for QA results +5. Add `gh` CLI auth verification instructions + +**Acceptance Criteria**: +- QA agent has comprehensive verification prompt +- QA reads PR URL from gate data +- QA writes result to `qa-result-gate` +- Unit tests cover the prompt builder + +**Depends on**: Milestone 1 (gate MCP tools) + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/03-extended-coding-workflow.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/03-extended-coding-workflow.md new file mode 100644 index 000000000..87b035f2b --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/03-extended-coding-workflow.md @@ -0,0 +1,178 @@ +# Milestone 3: Extended Coding Workflow (V2) + +## Goal and Scope + +Create `CODING_WORKFLOW_V2` with the full pipeline using separated channels and gates with composable conditions. Channels are simple pipes; gates are optional filters attached to channels. This uses the Channel + Gate architecture from Milestone 1. + +## Target Pipeline + +``` +Planning ──[check: prUrl]──► Plan Review ──[check: approved]──► Coding ──[check: prUrl]──► Reviewer 1 ─┐ + ▲ Reviewer 2 ─┼─[count: votes.approve ≥ 3]──► QA ──[check: result == passed]──► Done + │ Reviewer 3 ─┘ │ + │ │ + └── [check: result == rejected/failed, cyclic] ─────────────────────┘ +``` + +### Node Definitions + +| Node | Agent Role | Parallel | Description | +|------|-----------|----------|-------------| +| Planning | planner | no | Creates plan document, opens plan PR | +| Plan Review | reviewer | no (single for MVP) | Reviews the plan PR | +| Coding | coder | no | Implements the plan, opens code PR | +| Reviewer 1/2/3 | reviewer | yes (3 parallel) | Review code PR independently | +| QA | qa | no | Runs tests, checks CI, verifies mergeability | +| Done | - | no | Terminal node, Task Agent summarizes | + +### Channel Definitions + +Channels are simple unidirectional pipes. Gates are independent entities optionally attached to channels. A channel without a gate is always open. Channels that share the same gate instance are noted below. + +| Channel ID | From → To | Gate ID | Cyclic | Description | +|------------|-----------|---------|--------|-------------| +| `ch-plan-to-review` | Planning → Plan Review | `plan-pr-gate` | no | Gated: planner writes `{ prUrl }` | +| `ch-review-to-coding` | Plan Review → Coding | `plan-approval-gate` | no | Gated: human approves plan | +| `ch-coding-to-rev1` | Coding → Reviewer 1 | `code-pr-gate` | no | **Shared gate**: all 3 reviewer channels | +| `ch-coding-to-rev2` | Coding → Reviewer 2 | `code-pr-gate` | no | (same gate instance) | +| `ch-coding-to-rev3` | Coding → Reviewer 3 | `code-pr-gate` | no | (same gate instance) | +| `ch-rev1-to-qa` | Reviewer 1 → QA | `review-votes-gate` | no | **Shared gate**: all 3 reviewers vote here | +| `ch-rev2-to-qa` | Reviewer 2 → QA | `review-votes-gate` | no | (same gate instance) | +| `ch-rev3-to-qa` | Reviewer 3 → QA | `review-votes-gate` | no | (same gate instance) | +| `ch-qa-to-done` | QA → Done | `qa-result-gate` | no | Gated: QA passes | +| `ch-qa-to-coding` | QA → Coding | `qa-fail-gate` | yes | Gated: QA fails, feedback to coder | +| `ch-rev-to-coding` | Reviewers → Coding | `review-reject-gate` | yes | Gated: any reviewer rejects | + +### Gate Definitions + +All gates are independent entities — they differ only in their `condition` config. A gate has no back-reference to channels; channels reference gates via `gateId`. A single gate can be shared by multiple channels. + +| Gate ID | Condition | `resetOnCycle` | `allowedWriterRoles` | +|---------|-----------|----------------|---------------------| +| `plan-pr-gate` | `check: prUrl exists` | false | `['planner']` | +| `plan-approval-gate` | `check: approved == true` | false | `['human']` | +| `code-pr-gate` | `check: prUrl exists` | false | `['coder']` | +| `review-votes-gate` | `count: votes.approve >= 3` | true | `['reviewer']` | +| `review-reject-gate` | `check: result == rejected` | true | `['reviewer']` | +| `qa-result-gate` | `check: result == passed` | true | `['qa']` | +| `qa-fail-gate` | `check: result == failed` | true | `['qa']` | + +**Reject vs. votes gates**: `review-votes-gate` and `review-reject-gate` are **separate gate instances** with different conditions: +- `review-votes-gate`: condition `count: votes.approve >= 3`. Passes when all 3 approve. +- `review-reject-gate`: condition `check: result == rejected`. Any reviewer that rejects writes `{ result: 'rejected', feedback: '...' }` here, firing the cyclic channel back to Coding. +- A reviewer writes to BOTH gates: vote to `review-votes-gate`, and if rejecting, rejection to `review-reject-gate`. + +**Gate data reset on cycles**: Uses the `resetOnCycle` flag (see M1). Gates with `resetOnCycle: true` have their data cleared to `{}` when any cyclic channel fires. Gates with `resetOnCycle: false` (like `code-pr-gate`) preserve their data. This ensures reviewers must re-vote from scratch after a fix. + +### Iteration Cap + +- `maxIterations: 5` (higher than before because the pipeline is longer) +- Global counter per workflow run, incremented on each cyclic channel traversal +- When exhausted: workflow transitions to `needs_attention` with `failureReason: 'maxIterationsReached'` + +## Tasks + +### Task 3.1: Define CODING_WORKFLOW_V2 Template + +**Description**: Create the new workflow template in `built-in-workflows.ts` with all nodes, channels (as simple pipes), and gates (as independent entities attached to channels). + +**Subtasks**: +1. Define node ID constants for all 8 nodes (Planning, Plan Review, Coding, Reviewer 1/2/3, QA, Done) +2. Define the Planning node with `agentId: 'planner'` +3. Define the Plan Review node with `agentId: 'reviewer'` +4. Define the Coding node with `agentId: 'coder'` +5. Define 3 Reviewer nodes with `agentId: 'reviewer'`, marked as parallel +6. Define the QA node with `agentId: 'qa'` +7. Define the Done node (terminal) +8. Define all channels per the Channel Definitions table — each channel is a simple pipe with `from`, `to`, optional `gateId`, and `isCyclic` flag +9. Define all gates per the Gate Definitions table — each gate is an independent entity with `condition` config (`check` or `count`), `allowedWriterRoles`, `resetOnCycle` flag, and `description`. Gates have no back-reference to channels — channels reference gates via `gateId`. Note: V2 uses only `check` and `count` conditions; `all`/`any` composites are available for future workflows. +10. Set `maxIterations: 5` on the workflow template +11. Mark cyclic channels with `isCyclic: true` + +**Acceptance Criteria**: +- Workflow template has 8 nodes with correct agent assignments +- Channels are simple pipes — no condition logic in channels +- Gates are independent entities attached to channels via `gateId` +- Channel and gate topology matches the specification exactly +- `check` conditions used for PR URL, approval, and result gates +- `count` condition used for vote-counting gate (`review-votes-gate`) +- 3 reviewer nodes are marked as parallel +- Cyclic channels are marked correctly +- `maxIterations: 5` is set +- Unit test validates the full template structure + +**Depends on**: Milestone 1 (separated channels + gates must exist) + +**Agent type**: coder + +--- + +### Task 3.2: Update Workflow Seeding + +**Description**: Update `seedBuiltInWorkflows` to seed `CODING_WORKFLOW_V2` alongside existing workflows. Add QA to preset agents. + +**Subtasks**: +1. Add QA to `PRESET_AGENTS` in `seed-agents.ts`: + - Role: `'qa'` + - Tools: `['Read', 'Bash', 'Grep', 'Glob']` (read-only + bash for running tests) + - Description: "QA agent. Verifies test coverage, CI pipeline status, and PR mergeability." +2. Update `seedBuiltInWorkflows` to also seed `CODING_WORKFLOW_V2` (additive, not replacing) +3. V2 gets `tag: 'default'` so workflow selector ranks it first +4. Existing `CODING_WORKFLOW` (V1) kept for backward compatibility +5. Verify idempotent seeding (no duplicates on re-seed) + +**Acceptance Criteria**: +- QA agent is seeded alongside Coder, General, Planner, Reviewer +- V2 workflow is seeded alongside V1 +- V2 has `tag: 'default'` +- Seeding is idempotent +- Unit tests validate seeding + +**Depends on**: Task 3.1 + +**Agent type**: coder + +--- + +### Task 3.3: Implement Parallel Node Execution + +**Description**: Update `TaskAgentManager` to support parallel node execution. When the Coding node completes and `code-pr-gate` opens, all 3 reviewer nodes should activate simultaneously. + +**Subtasks**: +1. Update `TaskAgentManager.activateNode()` to handle multiple target nodes from a single gate transition +2. When `code-pr-gate` passes (condition `check: prUrl exists`) with 3 downstream channels, spawn all 3 reviewer sessions simultaneously +3. Each reviewer session operates in the same task worktree (read-only for reviewers) +4. Track parallel node completion: each reviewer writes its vote to the shared `review-votes-gate` +5. The gate's `count: votes.approve >= 3` condition evaluates after each write — only activates QA when threshold is met +6. Unit tests: parallel activation, incremental voting, quorum detection + +**Acceptance Criteria**: +- 3 reviewer nodes activate simultaneously when `code-pr-gate` condition passes +- Each reviewer writes its vote independently to `review-votes-gate` +- QA activates only when `count` condition meets threshold (≥3 approve votes) +- Parallel sessions don't interfere with each other +- Unit tests cover parallel execution and voting + +**Depends on**: Task 3.1, Milestone 1 (channel + gate evaluator) + +**Agent type**: coder + +--- + +### Task 3.4: Update Space Chat Agent for V2 Workflow + +**Description**: Update the `suggest_workflow` logic to prefer `CODING_WORKFLOW_V2` as the default for coding tasks. + +**Subtasks**: +1. Verify V2's `tag: 'default'` makes it the top suggestion for coding tasks +2. If no selector logic exists, implement it in the Space chat agent's MCP tools +3. Ensure backward compatibility with existing spaces + +**Acceptance Criteria**: +- `suggest_workflow` returns V2 as top match for coding tasks +- Existing spaces unaffected +- Unit tests for workflow selection + +**Depends on**: Task 3.2 + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/04-worktree-isolation.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/04-worktree-isolation.md new file mode 100644 index 000000000..4a4bdbf94 --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/04-worktree-isolation.md @@ -0,0 +1,126 @@ +# Milestone 4: Worktree Isolation (One Per Task) + +## Goal and Scope + +Implement git worktree isolation for Space tasks. Each task gets **one worktree** shared by all agents in that task (planner, coder, reviewer, QA all work in the same worktree). Worktree folder names and branch names are derived from the task title via slugification, making them self-documenting (e.g., task "Add dark mode support" → folder `add-dark-mode-support`, branch `space/add-dark-mode-support`). + +## Key Design Decisions + +1. **One worktree per task, not per agent**. Agents work sequentially within a task, so there are no conflicts. This is simpler to manage and matches the Room system's approach. + +2. **Task-title-based naming**. The worktree folder name and git branch name are both derived from the task title using a slugification function. The same slug is used for both, making it easy to identify which worktree/branch corresponds to which task. Example: task "Fix login timeout bug" → slug `fix-login-timeout-bug` → folder `.worktrees/fix-login-timeout-bug/` → branch `space/fix-login-timeout-bug`. + +3. **Slugification rules**: Lowercase, replace spaces/special chars with hyphens, collapse consecutive hyphens, strip leading/trailing hyphens, max 60 characters (truncate at word boundary), append `-2`/`-3`/etc. suffix if the slug already exists among active worktrees. + +4. **Worktree lifecycle**: Created when task workflow starts. **Not immediately cleaned up on completion** — kept until the PR is merged or the task is explicitly deleted, since the human may want to review code locally or the Coder may need follow-up commits. A TTL-based reaper (default: 7 days after workflow completion) cleans up stale worktrees. Immediate cleanup only on task cancellation. + +## Tasks + +### Task 4.1: Worktree Slug Generation (Wraps Shared Slugify) + +**Description**: Create a worktree-specific slug generation wrapper that uses the shared `slugify()` utility from M1 Task 1.6 and adds worktree-specific behavior (empty title fallback to `task-{taskNumber}`). + +**Subtasks**: +1. Create `packages/daemon/src/lib/space/worktree-slug.ts` that imports `slugify()` from `./slug.ts` (created in M1 Task 1.6) and wraps it with worktree-specific fallback behavior: + - `worktreeSlug(taskTitle: string, taskNumber: number, existingSlugs: string[]): string` + - Calls `slugify(taskTitle, existingSlugs)` for the core slugification + - If the title is empty or results in an empty slug after processing, falls back to `task-{taskNumber}` (the numeric task ID from M1 Task 1.5) +2. The returned slug is used for both the worktree folder name and the git branch name (prefixed with `space/`) +3. Unit tests: empty/whitespace-only titles produce `task-{taskNumber}` fallback, normal titles delegate to `slugify()`, collision handling with existing worktree slugs + +**Acceptance Criteria**: +- Reuses shared `slugify()` from M1 Task 1.6 (no duplication of slugification logic) +- Empty titles produce valid fallback slugs using `task-{taskNumber}` +- Unit tests verify worktree-specific behavior + +**Depends on**: M1 Task 1.5 (numeric task IDs for fallback), M1 Task 1.6 (shared slugify utility) + +**Agent type**: coder + +--- + +### Task 4.2: Implement Space Worktree Manager + +**Description**: Create a `SpaceWorktreeManager` that manages git worktrees for Space tasks. One worktree per task, created from the space's repository. + +**Subtasks**: +1. **Create a new `SpaceWorktreeManager`** in `packages/daemon/src/lib/space/`. The existing Room `WorktreeManager` uses `simple-git` with room-specific abstractions (`sessionId`, `WorktreeMetadata`, session group lifecycle). The Space system needs task-scoped worktrees with different lifecycle management (one per task, shared by all agents, TTL-based cleanup). Reuse `simple-git` directly (same dependency) but do NOT extend or modify the Room's `WorktreeManager` class. +2. `SpaceWorktreeManager` API: + - `createTaskWorktree(spaceId: string, taskId: string, taskTitle: string, baseBranch?: string): Promise<{ path: string, slug: string }>` — slugifies the task title, creates worktree, returns path and slug + - `removeTaskWorktree(spaceId: string, taskId: string): Promise` — cleans up + - `getTaskWorktreePath(spaceId: string, taskId: string): Promise` — looks up existing worktree for a task + - `listWorktrees(spaceId: string): Promise>` — lists all worktrees for a space + - `cleanupOrphaned(spaceId: string): Promise` — removes worktrees for completed/cancelled tasks +3. Worktree location: `{spaceWorkspacePath}/.worktrees/{slug}/` (e.g., `.worktrees/add-dark-mode-support/`) +4. Persist worktree ↔ task mapping in SQLite (table: `space_worktrees` with columns: `id`, `space_id`, `task_id`, `slug`, `path`, `created_at`) +5. Branch naming: `space/{slug}` (e.g., `space/add-dark-mode-support`) — the same slug used for the folder name, making it easy to correlate worktrees with branches +6. Unit tests: create, remove, lookup, list, orphan cleanup + +**Acceptance Criteria**: +- One worktree per task with task-title-derived slug name +- Worktree ↔ task mapping persisted in SQLite +- Create/remove/lookup/list/cleanup all work +- Unit tests cover lifecycle + +**Depends on**: Task 4.1 + +**Agent type**: coder + +--- + +### Task 4.3: Wire Worktree into TaskAgentManager + +**Description**: Update `TaskAgentManager.spawnSubSession()` to use the task's worktree instead of the raw space workspace path. + +**Subtasks**: +1. Before spawning the first node agent for a task, call `SpaceWorktreeManager.createTaskWorktree()` +2. Store the worktree path in the workflow run metadata +3. All subsequent `spawnSubSession()` calls for the same task use the same worktree path as `workspacePath` +4. **Do NOT immediately remove worktree on completion** — the PR may not be merged yet and the human may want to review locally. Instead, mark the worktree as `completed` in the `space_worktrees` table with a `completed_at` timestamp. +5. On workflow run cancellation, clean up the worktree immediately (no TTL — cancelled work is abandoned) +6. Implement TTL-based reaper: on daemon startup and periodically (every hour), remove worktrees where `completed_at` is older than 7 days (configurable). Also check if the associated PR has been merged — if so, clean up immediately regardless of TTL. +7. Store worktree path in the workflow run's gate data (so M6 `getGateArtifacts` RPC can find it for diff rendering) +8. On daemon restart, run `cleanupOrphaned()` to remove worktrees for cancelled/deleted tasks +9. Unit tests: worktree creation at run start, reuse across nodes, TTL-based cleanup, cancellation cleanup, orphan cleanup + +**Acceptance Criteria**: +- All node agents in a task share the same worktree +- Worktree is created at workflow run start +- Worktree is kept after completion (TTL-based cleanup, or immediate cleanup if PR merged) +- Cancellation cleans up worktree immediately +- Worktree path stored in workflow run metadata (for M6 artifacts RPC) +- Daemon restart cleans up orphaned worktrees +- Unit tests verify lifecycle including TTL reaper + +**Depends on**: Task 4.2 + +**Agent type**: coder + +--- + +### Task 4.4: Configure Feature Flags and Tool Access per Role + +**Description**: Node agents need specific feature flags and tool access based on their role. + +**Subtasks**: +1. Define feature flag profiles per role: + - `coder`: `rewind: false, worktree: false, coordinator: false, archive: false, sessionInfo: false` + - `reviewer`: same (tool restrictions handled by tool list, not feature flags) + - `planner`: same as coder + - `qa`: same as reviewer +2. Define tool access per role: + - `coder`: full tool access (Read, Write, Edit, Bash, Grep, Glob + MCP tools) + - `planner`: full tool access + - `reviewer`: read-only (Read, Bash, Grep, Glob — no Write/Edit) + - `qa`: read-only + bash for running tests (Read, Bash, Grep, Glob) +3. Apply in `TaskAgentManager.spawnSubSession()` and `createCustomAgentInit()` +4. Unit tests verify configuration per role + +**Acceptance Criteria**: +- Correct feature flags per role +- Reviewers and QA cannot use Write/Edit +- Unit tests verify configuration + +**Depends on**: nothing (parallel with other M4 tasks) + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md new file mode 100644 index 000000000..c75839ccf --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md @@ -0,0 +1,97 @@ +# Milestone 5: QA Agent Node + +## Goal and Scope + +Wire the QA agent into the V2 workflow pipeline. QA sits after `review-votes-gate` (passes when `count: votes.approve >= 3`) and before Done. QA verifies test coverage, CI status, and PR mergeability. On failure, QA feeds back to Coding via a cyclic channel. + +## Feedback Topology + +``` +3 Reviewers ──[review-votes-gate: count >= 3]──► QA ──[qa-result-gate: pass]──► Done + │ + └──[qa-fail-gate: fail, cyclic]──► Coding +``` + +When QA fails, feedback goes **directly to Coding** (not through Review). After the Coder fixes, the full re-review cycle runs: Coding → 3 Reviewers → QA → Done. This ensures reviewers verify the fix. + +### Iteration Counter + +Both cyclic channels (Reviewer→Coding and QA→Coding) share the same global `maxIterations` counter. + +## Tasks + +### Task 5.1: Wire QA into V2 Workflow + +**Description**: Ensure the QA node is correctly wired into the V2 workflow template (already defined in M3 Task 3.1) and that the QA→Coding feedback loop works. + +**Subtasks**: +1. Verify QA node exists in V2 template with correct agent assignment and tool access +2. Verify channels: `review-votes-gate`→QA (passes on `count: votes.approve >= 3`), QA→Done via `qa-result-gate` (passes on `check: result == passed`), QA→Coding via `qa-fail-gate` (passes on `check: result == failed`, cyclic) +3. Test the QA→Coding feedback loop: QA writes `{ result: 'failed', summary: '...' }` to `qa-fail-gate` → cyclic channel activates → Coding node re-activates +4. Verify the full re-review cycle after QA failure: Coding → 3 Reviewers → QA (all 3 reviewers must re-vote from scratch) +5. **Verify gate data reset via `resetOnCycle`**: When the QA→Coding cyclic channel fires, all gates with `resetOnCycle: true` have their data cleared to `{}` (M1 Task 1.4). Specifically: `review-votes-gate` (true) → `{}`, `qa-result-gate` (true) → `{}`, `review-reject-gate` (true) → `{}`, `qa-fail-gate` (true) → `{}`. `code-pr-gate` (`resetOnCycle: false`) is preserved. This ensures all 3 reviewers must re-vote from scratch. +6. Verify iteration counter increments on QA→Coding cycle +7. Unit tests for QA feedback loop + +**Acceptance Criteria**: +- QA node correctly wired in V2 pipeline +- QA failure feeds back to Coding via cyclic channel +- Full re-review cycle runs after Coder fixes QA issues +- Iteration counter is global across all cyclic channels +- Unit tests verify the feedback loop + +**Depends on**: Milestone 3 (V2 workflow template) + +**Agent type**: coder + +--- + +### Task 5.2: Implement Completion Flow + +**Description**: When QA passes and the Done node activates, the Task Agent produces a final summary for the human. + +**Subtasks**: +1. Verify `CompletionDetector` correctly detects when all nodes complete (QA passes → Done activates) +2. Ensure `SpaceRuntime` transitions the workflow run to `completed` status +3. Update the Task Agent prompt to produce a human-readable summary: + - What was implemented (from Coder's result) + - PR link and status (from `code-pr-gate` data) + - Review summary (from `review-votes-gate` data) + - QA verification status (from `qa-result-gate` data) + - Suggested next steps +4. Verify the Space chat agent surfaces the summary to the human +5. Unit tests for completion detection and summary generation + +**Acceptance Criteria**: +- Workflow run transitions to `completed` when QA passes +- Task Agent reads gate data to produce comprehensive summary +- Summary is surfaced in Space chat +- Unit tests verify completion flow + +**Depends on**: Task 5.1 + +**Agent type**: coder + +--- + +### Task 5.3: Space Chat Agent Task Creation from Conversation + +**Description**: Ensure the Space chat agent can create a task from conversation and start the V2 workflow. + +**Subtasks**: +1. Audit Space chat agent's intent recognition in `space-chat-agent.ts` +2. Verify: clear coding request → `create_standalone_task` → `start_workflow_run` with V2 +3. Verify: ambiguous request → ask for clarification (don't create task) +4. Verify task creation persists to DB +5. Verify workflow run starts: Planning node activates +6. Unit tests: task creation on clear request, no task on ambiguous request, correct workflow selection + +**Acceptance Criteria**: +- Clear coding request creates task and starts V2 workflow +- Ambiguous request triggers clarification +- Planning node activates on workflow start +- Unit tests cover decision logic + +**Depends on**: Task 5.1 (full pipeline must work) + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/06-human-gate-canvas-ui.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/06-human-gate-canvas-ui.md new file mode 100644 index 000000000..937b27f9f --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/06-human-gate-canvas-ui.md @@ -0,0 +1,187 @@ +# Milestone 6: Approval Gate Canvas UI + +## Goal and Scope + +Build a live workflow canvas visualization where humans can see the running workflow instance, interact with approval gates, and review artifacts. This is the primary way humans approve/reject at gates — not just chat-based, but a visual canvas similar to GitHub Actions workflow visualization but with human-in-the-loop nodes. + +## UX Specification + +### Canvas Visualization + +The workflow runs as a live instance on a **canvas/visualization**: +- **Nodes** are the primary visual elements showing agent status (pending, active, completed, failed) +- **Channels** are rendered as directional lines/arrows between nodes (with arrowheads showing direction) +- **Gates** are visual elements rendered ON the channel line (not as separate nodes) — like a valve on a pipe + - Gate states: blocked (red/gray lock icon), open (green check), waiting for human (amber pulsing) + - Channels without a gate show as plain directional arrows (always open) +- Active nodes pulse or animate to indicate work in progress +- Completed nodes show a checkmark with elapsed time +- Failed nodes show an error indicator +- **Gate editing (workflow template mode)**: User can drag a gate onto a channel line or click a "+" button on the line to add a gate. Removing a gate from a channel makes it always open. + +### Approval Gate Interaction + +When the workflow reaches an approval gate (`plan-approval-gate` with `check: approved == true`): +1. The gate node on the canvas highlights (pulsing, distinct color) +2. **Clicking the approval gate opens an artifacts view** showing all changes in the worktree +3. The artifacts view lists all changed files (like a PR diff view) +4. **Clicking individual changes renders the file or code diff** (side-by-side or unified diff view) +5. Approve/Reject buttons are prominently displayed in the artifacts view +6. The human can also approve via chat as a secondary mechanism + +### Artifacts View Detail + +The artifacts view is essentially an embedded PR review interface: +- File tree showing all changed files (added, modified, deleted) +- Click a file → shows the diff (syntax highlighted, line numbers) +- Summary section: number of files changed, lines added/removed +- Context from the agent: what was done and why (read from gate data) +- For `plan-pr-gate`: shows the plan document diff +- For `code-pr-gate`: shows the code changes diff + +## Tasks + +### Task 6.1: Implement Approval Gate Backend + +**Description**: Implement the backend for the approval gate (`plan-approval-gate`): blocking, state persistence, approval RPC, and notification. + +**Subtasks**: +1. Implement `spaceWorkflowRun.approveGate` RPC handler: + - Accepts `{ runId, gateId, decision: 'approve' | 'reject' }` + - Writes to gate data store: `{ approved: true, approvedBy, approvedAt }` or `{ rejected: true, rejectedBy, reason }` + - **Idempotency**: If the gate is already approved/rejected (e.g., two humans click simultaneously), return success without re-writing. Use optimistic locking: read current gate data, check if already decided, only write if still `{ waiting: true }`. + - Triggers gate re-evaluation (which unblocks the channel) +2. Implement `spaceWorkflowRun.getGateArtifacts` RPC handler: + - Accepts `{ runId, gateId }` + - **Worktree path lookup**: Queries the `space_worktrees` table (M4) using the workflow run's task ID to find the worktree path. Falls back to workflow run metadata if the worktree table lookup fails. + - Returns: list of changed files in the task worktree, git diff summary, gate context data + - Uses `git diff` and `git status` in the task worktree to get changes + - **Guard**: If the worktree no longer exists (cleaned up), return an error with the PR URL from gate data so the human can review on GitHub instead +3. Implement `spaceWorkflowRun.getFileDiff` RPC handler: + - Accepts `{ runId, gateId, filePath }` + - Returns: unified diff for the specified file +4. Add workflow run status: when approval gate blocks → run stays `in_progress` but gate data shows `{ waiting: true }` + - No need for a new `waiting_for_approval` status — the gate data IS the state. The workflow run is `in_progress`, and the gate's data tells you it's waiting. +5. Handle rejection: gate data gets `{ rejected: true, reason }`. The workflow run transitions to `needs_attention` with `failureReason: 'humanRejected'`. (Uses existing `needs_attention` status, not a new `failed` status — see overview WorkflowRunStatus Strategy.) +6. Add `failureReason` field to `SpaceWorkflowRun` interface (if not already added in M1 Task 1.1): `failureReason?: 'humanRejected' | 'maxIterationsReached' | 'nodeTimeout' | 'agentCrash'` +7. Implement post-rejection recovery via `spaceWorkflowRun.restart` RPC +8. Ensure gate data (including waiting/approved/rejected state) persists across daemon restart +9. Unit tests: approval, rejection, concurrent approval idempotency, artifacts retrieval (with and without worktree), file diff, restart, persistence + +**Acceptance Criteria**: +- Approval gate blocks workflow and gate data shows `{ waiting: true }` +- Approval writes to gate data and unblocks downstream channel +- Concurrent approvals are idempotent (no double-write) +- Rejection transitions run to `needs_attention` with `failureReason: 'humanRejected'` +- Artifacts RPC returns changed files and diffs from task worktree (or error + PR URL if worktree gone) +- State persists across daemon restart +- Unit tests cover all flows including concurrent approval edge case + +**Depends on**: Milestone 1 (gate data store), Milestone 4 (worktree for diff access) + +**Agent type**: coder + +--- + +### Task 6.2: Implement Workflow Canvas Component + +**Description**: Create the live workflow canvas visualization that shows the running workflow instance with node statuses and gate states. + +**Subtasks**: +1. Create `packages/web/src/components/space/WorkflowCanvas.tsx`: + - Renders the workflow graph as a visual canvas with three visual element types: + - **Nodes**: boxes showing name, agent role, status (pending/active/completed/failed), elapsed time + - **Channels**: directional lines/arrows between nodes (with arrowheads). Plain lines for gateless channels. + - **Gates**: visual elements rendered ON the channel line (icon/indicator on the line, not a separate node). Gate states: blocked (lock icon, gray), open (check, green), waiting for human (amber, pulsing). + - Active nodes have animation/pulse effect +2. Subscribe to `workflow_run_status_changed` and `gate_data_changed` live queries for real-time updates +3. On initial load, query current workflow run state, channels, and gate data to render correct initial state +4. Layout algorithm: horizontal pipeline layout (left to right), with parallel nodes stacked vertically +5. Handle the 3 parallel reviewer nodes: show them stacked vertically, sharing the same `review-votes-gate` indicator on their converging channel to QA +6. **Mode switching**: The canvas has two modes, determined by context: + - **Runtime mode** (default when a workflow run is active): Read-only visualization. Shows live node statuses, gate states, and progress. Gates show open/closed/waiting indicators. No editing allowed. + - **Template mode** (when viewing/editing a workflow template with no active run): Editable. Allows adding/removing nodes, channels, and gates. + - The mode is determined automatically: if the Space has an active workflow run → runtime mode; if no active run and the user navigates to the workflow editor → template mode. No explicit toggle button needed — the context determines the mode. +7. **Gate editing in template mode**: When in template mode, allow: + - Click "+" button on any channel line to add a new gate + - Drag a gate from a palette onto a channel line + - Click a gate indicator to edit its condition config + - Remove a gate from a channel (making it always open) +8. Style with Tailwind CSS, consistent with existing Space UI + +**Acceptance Criteria**: +- Canvas renders nodes, channels (directional arrows), and gates (on channel lines) as distinct visual elements +- Gates render ON the channel line, not as separate nodes +- Gateless channels render as plain arrows (always open) +- Real-time updates as nodes activate/complete and gates open/close +- Approval gates are visually distinct (amber, pulsing) +- Parallel reviewer nodes displayed correctly +- Runtime mode is read-only; template mode allows editing +- Mode determined automatically by context (active run → runtime, no run → template) +- Gate editing works in template mode (add/remove/configure gates on channels) +- Works on initial load (not just live updates) + +**Depends on**: Task 6.1 (backend RPCs for state) + +**Agent type**: coder + +--- + +### Task 6.3: Implement Artifacts View and Diff Rendering + +**Description**: Build the artifacts view that opens when a human clicks a gate on the canvas. Shows changed files and renders diffs. + +**Subtasks**: +1. Create `packages/web/src/components/space/GateArtifactsView.tsx`: + - Opens as a panel/overlay when approval gate is clicked on canvas + - Calls `spaceWorkflowRun.getGateArtifacts` RPC to get changed files + - Shows file tree: added (green), modified (yellow), deleted (red) + - Shows summary: N files changed, +X / -Y lines + - Shows gate context: what agent did and why (from gate data) +2. Create `packages/web/src/components/space/FileDiffView.tsx`: + - Clicking a file in the artifacts view opens the diff + - Calls `spaceWorkflowRun.getFileDiff` RPC + - Renders unified diff with syntax highlighting and line numbers + - Supports scroll through long diffs +3. Add Approve / Reject buttons at the top of the artifacts view: + - "Approve" calls `spaceWorkflowRun.approveGate` with `decision: 'approve'` + - "Reject" calls `spaceWorkflowRun.approveGate` with `decision: 'reject'` + - After action, close the artifacts view and update canvas +4. Also support chat-based approval as secondary mechanism (parse "approve"/"reject" in Space chat) +5. Style: clean diff view similar to GitHub PR review interface + +**Acceptance Criteria**: +- Clicking approval gate on canvas opens artifacts view +- Changed files listed with add/modify/delete indicators +- Clicking a file shows syntax-highlighted diff +- Approve/Reject buttons work and update workflow state +- Chat-based approval also works as fallback +- Vitest tests for approval parsing logic + +**Depends on**: Task 6.1 (backend RPCs), Task 6.2 (canvas component) + +**Agent type**: coder + +--- + +### Task 6.4: Integrate Canvas into Space View + +**Description**: Wire the workflow canvas into the existing Space UI, replacing or augmenting the current workflow view. + +**Subtasks**: +1. Add the `WorkflowCanvas` component to the Space view when a workflow run is active +2. Show the canvas below the Space chat (or in a split view / tab) +3. When no workflow run is active, show the workflow template editor (existing behavior) +4. When a workflow run completes, show the final state (all nodes completed) with summary +5. Ensure the canvas works alongside the Space chat (both visible) +6. Handle responsive layout for different screen sizes + +**Acceptance Criteria**: +- Workflow canvas appears when a run is active +- Canvas and chat work together (both visible) +- Canvas shows final state on completion +- Responsive layout works + +**Depends on**: Task 6.2, Task 6.3 + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/07-online-integration-test.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/07-online-integration-test.md new file mode 100644 index 000000000..7f8a78401 --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/07-online-integration-test.md @@ -0,0 +1,160 @@ +# Milestone 7: Online Integration Test + +## Goal and Scope + +Exercise the full happy path with the dev proxy (mocked SDK). Tests are broken into focused sub-tests per workflow stage with shared helpers. + +**Testing strategy**: These are **gate-level integration tests**, not full agent execution tests. Each test uses `mockAgentDone()` and `writeGateData()` helpers to simulate agent completion and gate writes directly, then verifies the gate evaluation, channel routing, and node activation logic. This keeps tests fast (no real LLM calls, no agent session startup) while testing the actual Gate + Channel architecture end-to-end. The dev proxy is used only for the agent session lifecycle (spawn/kill), not for full conversation turns. + +## Test File Structure + +``` +packages/daemon/tests/online/space/ + helpers/ + space-test-helpers.ts # Shared helpers + space-happy-path-plan-to-approve.test.ts # Planning → plan-pr-gate → Plan Review → plan-approval-gate → approve + space-happy-path-code-review.test.ts # Coding → code-pr-gate → 3 Reviewers → review-votes-gate + space-happy-path-qa-completion.test.ts # QA → Done (pass and fail loops) + space-happy-path-full-pipeline.test.ts # Full end-to-end + space-edge-cases.test.ts # Iteration cap, cancellation, concurrent tasks +``` + +## Shared Helpers + +`space-test-helpers.ts` provides: +- `createTestSpace(config)` — creates a Space with preset agents and V2 workflow +- `startWorkflowRun(spaceId, taskId)` — starts a workflow run +- `mockAgentDone(runId, nodeId, result)` — simulates agent completion +- `writeGateData(runId, gateId, data)` — writes data to a gate +- `readGateData(runId, gateId)` — reads gate data +- `approveGate(runId, gateId)` — approves an approval gate +- `rejectGate(runId, gateId)` — rejects an approval gate +- `waitForNodeStatus(runId, nodeId, status)` — waits for node status +- `waitForRunStatus(runId, status)` — waits for run status +- `getGateArtifacts(runId, gateId)` — gets artifacts for a gate + +## Tasks + +### Task 7.1: Test Helpers and Plan-to-Approve Flow + +**Description**: Create shared helpers and test Planning → `plan-pr-gate` → Plan Review → `plan-approval-gate` → Approve. + +**Subtasks**: +1. Create `space-test-helpers.ts` with all shared helpers +2. Write `space-happy-path-plan-to-approve.test.ts`: + a. Create Space with V2 workflow + b. Create task → start workflow run + c. Verify Planning node activates + d. Simulate Planner completion + write PR data to `plan-pr-gate` + e. Verify `plan-pr-gate` opens → Plan Review node activates + f. Simulate Plan Review completion + g. Verify `plan-approval-gate` blocks (gate data shows `{ waiting: true }`) + h. Approve via `approveGate()` helper + i. Verify Coding node activates + j. Test rejection: reject → verify `needs_attention` status with `failureReason: 'humanRejected'` + +**Acceptance Criteria**: +- Shared helpers work with dev proxy +- Plan → approve flow test passes +- `plan-pr-gate` correctly blocks until PR data is written +- `plan-approval-gate` correctly blocks and unblocks +- Rejection flow works + +**Depends on**: Milestone 5 (full pipeline), Milestone 6 (approval gate backend) + +**Agent type**: coder + +--- + +### Task 7.2: Test Code Review with Parallel Reviewers + +**Description**: Test Coding → `code-pr-gate` → 3 Reviewers (parallel) → `review-votes-gate`. + +**Subtasks**: +1. Write `space-happy-path-code-review.test.ts`: + a. Start from approved plan (reuse helpers) + b. Simulate Coder completion + write PR data to `code-pr-gate` + c. Verify all 3 Reviewer nodes activate simultaneously + d. Test happy path: all 3 reviewers approve → QA activates + e. Test partial approval: 2 of 3 approve → `review-votes-gate` stays blocked (count < 3) + f. Test rejection: any reviewer rejects → feedback to Coder, cyclic channel fires + g. Test iteration counter increments on reviewer reject cycle + +**Acceptance Criteria**: +- 3 reviewers activate in parallel +- `review-votes-gate` requires all 3 approvals (`count: votes.approve >= 3`) +- Partial approval doesn't unblock +- Rejection cycles back to Coding +- Iteration counter works + +**Depends on**: Task 7.1 + +**Agent type**: coder + +--- + +### Task 7.3: Test QA-Completion Flow + +**Description**: Test QA → Done (pass) and QA → Coding (fail) flows. + +**Subtasks**: +1. Write `space-happy-path-qa-completion.test.ts`: + a. Start from all 3 reviewers approved (reuse helpers) + b. Test happy path: QA passes → Done → workflow completes + c. Test QA failure: QA fails → Coding re-activates → full re-review cycle + d. Verify re-review cycle: Coding → 3 Reviewers → QA (all 3 must re-vote) + e. Verify iteration counter on QA→Coding cycle + +**Acceptance Criteria**: +- QA pass → Done flow works +- QA failure → full re-review cycle works +- Iteration counter is global +- Completion notification emitted + +**Depends on**: Task 7.2 + +**Agent type**: coder + +--- + +### Task 7.4: Full Pipeline End-to-End Test + +**Description**: Single end-to-end test: task creation → completion. + +**Subtasks**: +1. Write `space-happy-path-full-pipeline.test.ts`: + a. Create Space → task → workflow run + b. Planning → `plan-pr-gate` → Plan Review → `plan-approval-gate` approve → Coding → `code-pr-gate` → 3 Reviewers approve → QA pass → Done + c. Verify completion summary +2. Test failure-and-recovery path with one reviewer rejection + one QA failure + +**Acceptance Criteria**: +- Full happy path test passes +- Failure-and-recovery test passes +- Relies on shared helpers (concise) + +**Depends on**: Task 7.3 + +**Agent type**: coder + +--- + +### Task 7.5: Edge Case Tests + +**Description**: Test edge cases. + +**Subtasks**: +1. Write `space-edge-cases.test.ts`: + a. Concurrent tasks: separate worktrees, separate iteration counters + b. Cancellation: agents cleaned up, worktree removed + c. Agent crash: Task Agent detects failure, run transitions to `needs_attention` with `failureReason: 'agentCrash'` + d. Approval gate persistence: `waiting` state in gate data survives daemon restart + e. **Vote gate partial + restart**: (1) write 2 approve votes to `review-votes-gate`, (2) verify gate still blocked (`count: votes.approve >= 3` not met), (3) restart daemon, (4) verify gate data persisted (2 votes present in `gate_data` table), (5) write 3rd approve vote, (6) verify gate passes and QA activates. This proves gate data survives restart via the `gate_data` SQLite table. + +**Acceptance Criteria**: +- All edge cases pass +- Gate data persists across restarts + +**Depends on**: Task 7.1 (helpers; parallel with 7.2-7.4) + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/08-e2e-test.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/08-e2e-test.md new file mode 100644 index 000000000..604004fd0 --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/08-e2e-test.md @@ -0,0 +1,97 @@ +# Milestone 8: E2E Test + +## Goal and Scope + +Create Playwright E2E tests that exercise the full UI flow including the workflow canvas visualization, approval gate artifacts view, and diff rendering. + +**Testing strategy**: E2E tests use real agent execution via dev proxy (mocked LLM responses). To manage timing, tests wait for **visible UI state changes** on the canvas (e.g., node status indicators, gate highlights) rather than polling internal state. Each "wait for X to complete" step uses `page.waitForSelector()` on the canvas node's status indicator. The dev proxy returns fast, deterministic responses so tests complete in seconds, not minutes. + +## Tasks + +### Task 8.1: Space Happy Path E2E Test + +**Description**: Create `packages/e2e/tests/features/space-happy-path-pipeline.e2e.ts` exercising the full UI flow. + +**Subtasks**: +1. Set up test infrastructure: + - Navigate to Spaces view, create new Space + - Verify preset agents seeded (Coder, General, Planner, Reviewer, QA) + - Verify V2 workflow seeded +2. Test flow: + a. Open Space chat + b. Type a task request + c. Verify Space Agent creates task and starts workflow + d. **Verify workflow canvas appears** with Planning node active + e. Wait for planner to complete + f. **Verify approval gate highlights on canvas** (amber pulsing) + g. **Click the approval gate on canvas** → verify artifacts view opens + h. **Verify artifacts view shows plan PR changes** (file list, diff summary) + i. **Click a file** → verify diff renders with syntax highlighting + j. **Click "Approve" button** in artifacts view + k. Verify canvas updates: approval gate opens, Coding node activates + l. Wait for coder to complete + m. Verify 3 Reviewer nodes activate on canvas (parallel) + n. Wait for reviewers to complete + o. Verify QA node activates + p. Wait for QA to complete + q. Verify canvas shows all nodes completed + r. Verify completion summary in Space chat + +**Acceptance Criteria**: +- E2E test exercises the full happy path through canvas UI +- Approval gate interaction via canvas + artifacts view works +- Diff rendering is visible and correct +- Parallel reviewer nodes visible on canvas +- Completion summary displayed + +**Depends on**: Milestone 6 (canvas UI), Milestone 5 (pipeline) + +**Agent type**: coder + +--- + +### Task 8.2: E2E Test for Reviewer Feedback Loop + +**Description**: E2E test where a reviewer rejects and the coder fixes. + +**Subtasks**: +1. Proceed through pipeline to reviewer phase +2. **Wait for `review-votes-gate` vote display** on canvas (use `page.waitForSelector` on vote indicator elements). Verify one reviewer rejects (visible rejection indicator on canvas) +3. **Wait for Coding node status change** to "active" on canvas. Verify Coding node re-activates. +4. **Wait for `review-votes-gate` to show reset** (votes cleared after cyclic traversal via `resetOnCycle`) +5. Wait for coder fix + re-review (wait for all 3 reviewer node statuses to change to "completed") +6. Verify all 3 reviewers approve (`review-votes-gate` shows 3/3) +7. Verify flow continues to QA and completion + +**Acceptance Criteria**: +- Reviewer rejection visible on canvas +- Coder re-activation visible +- `review-votes-gate` vote display works +- Final completion achieved + +**Depends on**: Task 8.1 + +**Agent type**: coder + +--- + +### Task 8.3: E2E Test for Approval Gate Rejection + +**Description**: E2E test for human rejection via artifacts view. + +**Subtasks**: +1. Proceed to approval gate (`plan-approval-gate`) +2. Click approval gate on canvas → artifacts view opens +3. Click "Reject" button +4. Verify workflow run transitions to `needs_attention` state (visible on canvas as error/attention indicator) +5. Verify confirmation message in chat +6. Verify space remains usable after rejection + +**Acceptance Criteria**: +- Rejection via artifacts view works +- Canvas shows needs_attention/error state +- Space remains usable + +**Depends on**: Task 8.1 + +**Agent type**: coder diff --git a/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/09-bug-fixes-and-hardening.md b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/09-bug-fixes-and-hardening.md new file mode 100644 index 000000000..77fb01e84 --- /dev/null +++ b/docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/09-bug-fixes-and-hardening.md @@ -0,0 +1,92 @@ +# Milestone 9: Bug Fixes and Hardening + +## Goal and Scope + +Fix issues discovered during integration and E2E testing. Add robust error handling and edge case coverage. + +## Tasks + +### Task 9.1: Bug Triage and Prioritization + +**Description**: After M7 and M8 complete, create a concrete bug list. + +**Subtasks**: +1. Collect all test failures from M7 and M8 +2. Create triage document at `docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/bug-triage.md` +3. Group by area: gate routing, approval gate UI, worktree isolation, canvas, agent prompts +4. Prioritize: P0/P1/P2 + +**Acceptance Criteria**: +- Triage document with all bugs and priorities +- Clear scope for remaining tasks + +**Depends on**: Task 7.4, Task 8.2 + +**Agent type**: general + +--- + +### Task 9.2: Fix Integration Test Bugs + +**Description**: Fix P0/P1 bugs from online integration tests. + +**Subtasks**: +1. Reproduce each P0/P1 bug +2. Fix root cause (not the test) +3. Add unit test covering the bug +4. Verify integration test passes + +**Acceptance Criteria**: +- All P0/P1 integration bugs fixed with unit tests +- Integration tests pass + +**Depends on**: Task 9.1 + +**Agent type**: coder + +--- + +### Task 9.3: Fix E2E Test Bugs + +**Description**: Fix P0/P1 bugs from E2E tests. + +**Subtasks**: +1. Reproduce each P0/P1 bug +2. Fix root cause +3. Add regression test +4. Verify E2E tests pass + +**Acceptance Criteria**: +- All P0/P1 E2E bugs fixed +- E2E tests pass + +**Depends on**: Task 9.1 + +**Agent type**: coder + +--- + +### Task 9.4: Error Handling and Edge Case Hardening + +**Description**: Add robust error handling for common failure modes. + +**Subtasks**: +1. **Agent session crash handling**: Task Agent detects crash, transitions to `needs_attention` with `failureReason: 'agentCrash'`, notifies human +2. **Network errors**: Retry with exponential backoff for `gh` CLI commands (max 3, 5s/10s/20s) +3. **Rate limit handling**: Wait and retry using `Retry-After` header +4. **Timeout enforcement**: Per-node configurable timeouts (30min coder, 15min reviewer/QA, 20min planner) +5. **Cancellation cleanup**: Kill sessions, remove worktree, transition to `cancelled`, notify human +6. **Gate data corruption recovery**: If gate data is malformed (fails JSON parse or schema validation), reset data to `{}` and log error. Since all gates use the unified Gate entity, the reset is always `{}` — the gate's condition will re-evaluate against the empty data store (e.g., `check: prUrl exists` will fail, `count: votes.approve >= 3` will return 0, etc.). For human-approval gates specifically, also set `{ waiting: true }` to re-show the approval UI. +7. **Structured error messages**: All failures produce human-readable messages in Space chat + +**Acceptance Criteria**: +- Crashes produce clear failure status +- Network errors retry before failing +- Timeouts enforced per-node +- Cancellation cleans up all resources +- Human-readable error messages +- Unit tests for each scenario + +**Depends on**: Task 9.2, Task 9.3 + +**Agent type**: coder