-
Notifications
You must be signed in to change notification settings - Fork 16
Multi Instance Architecture Tasks
Input: Design documents from .speckit/features/109-multi-instance-architecture/
Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, contracts/api-changes.yaml, quickstart.md
Tests: Per Constitution III (Test-Alongside), unit and integration test tasks are included in each user story phase. Tests MUST be written during implementation in the same PR/commit.
Organization: Tasks are grouped by user story to enable independent implementation and testing of each story.
- [P]: Can run in parallel (different files, no dependencies)
- [Story]: Which user story this task belongs to (e.g., US1, US2, US3, US4)
- Include exact file paths in descriptions
Purpose: Branch initialization and project structure preparation
- T001 Verify branch
109-multi-instance-architectureis checked out and up to date with main - T002 Add
.claude/Agent Brain/to.gitignorein the repository root (runtime.json, lock, pid files should not be tracked) - T003 [P] Create empty module files for new server modules:
agent-brain-server/agent_brain_server/runtime.py,agent-brain-server/agent_brain_server/locking.py,agent-brain-server/agent_brain_server/storage_paths.py,agent-brain-server/agent_brain_server/project_root.py - T004 [P] Create CLI commands directory structure:
agent-brain-cli/agent_brain_cli/commands/__init__.py
Purpose: Make all storage paths configurable so the server can run with any state directory. This is Plan Phase 1 and MUST complete before any user story work.
- T005 Add
state_dirparameter andDOC_SERVE_STATE_DIRenvironment variable toagent-brain-server/agent_brain_server/config/settings.py— default toNone(current behavior) for backward compatibility; when set, all storage paths resolve relative to it - T006 Modify
VectorStoreManager.__init__inagent-brain-server/agent_brain_server/storage/vector_store.pyto accept an absolutepersist_dirparameter instead of reading from global settings - T007 Modify
BM25IndexManager.__init__inagent-brain-server/agent_brain_server/indexing/bm25_index.pyto accept an absolutepersist_dirparameter instead of reading from global settings - T008 Modify
IndexingService.__init__inagent-brain-server/agent_brain_server/services/indexing_service.pyto accept injectedVectorStoreManagerandBM25IndexManagerdependencies instead of importing global singletons - T009 Remove global singleton factory functions (
get_vector_store,get_bm25_manager,get_indexing_service) from their respective modules and replace with FastAPIapp.state-based initialization inagent-brain-server/agent_brain_server/api/main.pylifespan handler - T010 Update all router files that depend on singletons (
agent-brain-server/agent_brain_server/api/routers/health.py,index.py,query.py) to retrieve services fromrequest.app.stateinstead of importing singletons - T011 Verify existing tests pass with refactored dependency injection — run
cd agent-brain-server && poetry run pytest - T053 [P] Implement configuration loading in
agent-brain-server/agent_brain_server/config/settings.py— addload_project_config(state_dir: Path) -> ProjectConfigthat readsconfig.jsonfrom the state directory and merges with environment variables and built-in defaults per FR-009 precedence chain (CLI flags > env vars > project config > global config > defaults). CLI flag overrides are applied at call site. - T054 Update FastAPI-generated OpenAPI spec to reflect new/modified endpoints — verify
/docsendpoint exposes health_response_v2 (mode, instance_id, project_id, active_projects fields), index_request_v2 and query_request_v2 (optional project_id), and new/projects/{project_id}/statusendpoint. Ensure OpenAPI schema is auto-generated from Pydantic models per Constitution II (OpenAPI-First).
Checkpoint: Foundation ready — server runs with configurable state_dir, config precedence chain, no global singletons, and updated OpenAPI spec. All existing tests pass. User story implementation can now begin.
Goal: Full per-project lifecycle with discovery, locking, crash recovery, and CLI commands. A developer can start, discover, and stop a dedicated Agent Brain instance for any project.
Independent Test: Start Agent Brain from a project root, verify it binds to a unique port, confirm runtime.json is written to <repo>/.claude/Agent Brain/, query the server from a nested subdirectory, stop the server, verify all runtime artifacts are cleaned up.
- T055 [P] [US1] Write unit tests for
project_root.pyinagent-brain-server/tests/unit/test_project_root.py— test git root resolution, cwd fallback, symlink resolution - T056 [P] [US1] Write unit tests for
runtime.pyinagent-brain-server/tests/unit/test_runtime.py— test RuntimeState model validation, write/read/delete cycle, stale PID detection - T057 [P] [US1] Write unit tests for
locking.pyinagent-brain-server/tests/unit/test_locking.py— test lock acquisition, release, stale detection, cleanup - T058 [P] [US1] Write unit tests for
storage_paths.pyinagent-brain-server/tests/unit/test_storage_paths.py— test state dir resolution, directory creation, path determinism - T059 [US1] Write integration test for per-project lifecycle in
agent-brain-server/tests/integration/test_lifecycle.py— test start/stop cycle, runtime.json creation/deletion, lock acquisition/release, port binding
- T012 [P] [US1] Implement
project_root.pyinagent-brain-server/agent_brain_server/project_root.py— basic project root resolution usinggit rev-parse --show-toplevel(5s timeout) with fallback toPath.cwd().resolve(). Always resolve symlinks. Exportresolve_project_root(start_path: Path) -> Path. Note: US2 (T026-T027) adds full fallback chain (.claude/ marker, pyproject.toml walk-up, edge cases). - T013 [P] [US1] Implement
storage.pyinagent-brain-server/agent_brain_server/storage_paths.py— path resolution for per-project state directory. Given a project root, return<root>/.claude/Agent Brain/and subdirectories (data/,logs/,data/llamaindex/,data/chroma_db/,data/bm25_index/). Create directories if they don't exist. Exportresolve_state_dir(project_root: Path) -> Pathandresolve_storage_paths(state_dir: Path) -> dict. - T014 [P] [US1] Implement
runtime.pyinagent-brain-server/agent_brain_server/runtime.py— PydanticRuntimeStatemodel matchingruntime_json_projectschema fromcontracts/api-changes.yaml(fields:schema_version,mode,project_root,instance_id,base_url,bind_host,port,pid,started_at). Includewrite_runtime(state_dir: Path, state: RuntimeState),read_runtime(state_dir: Path) -> Optional[RuntimeState],delete_runtime(state_dir: Path), andvalidate_runtime(state: RuntimeState) -> bool(checks PID alive + health endpoint). - T015 [P] [US1] Implement
locking.pyinagent-brain-server/agent_brain_server/locking.py—fcntl.flock()-based exclusive lock onAgent Brain.lockwith separateAgent Brain.pidfile. Exportacquire_lock(state_dir: Path) -> bool(non-blocking, returns False if held),release_lock(state_dir: Path),read_pid(state_dir: Path) -> Optional[int],is_stale(state_dir: Path) -> bool(check PID alive viaos.kill(pid, 0)),cleanup_stale(state_dir: Path).
- T016 [US1] Modify
agent-brain-server/agent_brain_server/api/main.pyto support port 0 binding — change uvicorn config to acceptport=0, after server starts read actual port from bound socket, writeruntime.jsonviaruntime.py, register shutdown hook to calldelete_runtime()andrelease_lock() - T017 [US1] Modify health endpoint in
agent-brain-server/agent_brain_server/api/routers/health.pyto includemodefield ("project"or"shared"),instance_id, andproject_idin response perhealth_response_v2schema fromcontracts/api-changes.yaml
- T018 [P] [US1] Implement
startcommand inagent-brain-cli/agent_brain_cli/commands/start.py— resolve project root, check for existing lock/runtime, detect and clean stale state, spawn server subprocess (daemonized), wait for health endpoint readiness, print base URL - T019 [P] [US1] Implement
stopcommand inagent-brain-cli/agent_brain_cli/commands/stop.py— readruntime.json, send SIGTERM to PID, wait for process exit, verify cleanup of runtime artifacts - T020 [P] [US1] Implement
statuscommand inagent-brain-cli/agent_brain_cli/commands/status.py— resolve project root from cwd, readruntime.json, validate health endpoint, report address and indexing status (or "not running") - T021 [P] [US1] Implement
listcommand inagent-brain-cli/agent_brain_cli/commands/list_cmd.py— maintain a registry file at~/.Agent Brain/registry.json(list of known project state directories, updated bystart/stop/init). Scan registry entries forruntime.jsonfiles, validate each via health check, report table of running instances with project name, URL, mode, and PID. Fall back to scanning~/.Agent Brain/projects/for shared mode instances. - T022 [P] [US1] Implement
initcommand inagent-brain-cli/agent_brain_cli/commands/init.py— resolve project root, create.claude/Agent Brain/directory, writeconfig.jsonwith defaults perconfig_jsonschema fromcontracts/api-changes.yaml - T023 [US1] Register all new commands in
agent-brain-cli/agent_brain_cli/cli.py— addstart,stop,status(update existing),list, andinitas Click commands/groups
- T024 [US1] Add startup integration in
agent-brain-server/agent_brain_server/api/main.pylifespan — on startup: resolve project root → resolve state dir → acquire lock (fail if held and not stale) → bind port 0 → resolve storage paths → initialize services with resolved paths → write runtime.json. On shutdown: delete runtime.json → release lock → delete PID file. - T025 [US1] Verify per-project lifecycle end-to-end: start server via CLI, confirm
runtime.jsonwritten with actual port, run status from subdirectory, stop server, confirm all artifacts cleaned up
Checkpoint: Per-project mode is fully functional. A developer can init → start → status → stop for any project. Two projects can run concurrently on different ports. Crashed instances recover on next start. This is the MVP.
Goal: Reliable canonical project root resolution ensures the same project always maps to the same state directory regardless of access path, symlinks, or current subdirectory.
Independent Test: Navigate to various subdirectories and symlinked paths within a project, call resolve_project_root(), verify it always returns the same canonical path.
Note: The core
project_root.pymodule was created in T012 as part of US1 (it's a shared prerequisite). This phase adds edge case handling, fallback strategies, and ensures isolation guarantees.
- T060 [P] [US2] Write unit tests for enhanced project root resolution in
agent-brain-server/tests/unit/test_project_root.py— test symlink resolution, git submodule handling, monorepo subdirectories, missing git binary fallback, .claude/ marker walk-up, pyproject.toml walk-up, cwd fallback - T061 [US2] Write integration test for state isolation in
agent-brain-server/tests/integration/test_concurrent.py— test that two different project roots produce completely separate ChromaDB and BM25 storage directories with no cross-contamination
- T026 [P] [US2] Enhance
resolve_project_root()inagent-brain-server/agent_brain_server/project_root.pyto handle edge cases: symlink resolution (Path.resolve()), git submodules (use outermost repo root), monorepo subdirectories (still use git root), missing git binary (fall through gracefully), and 5-second timeout ongit rev-parse - T027 [P] [US2] Add
.claude/marker walk-up detection toproject_root.py— if git resolution fails, walk parent directories looking for.claude/directory, thenpyproject.toml, then fall back tocwd.resolve() - T028 [US2] Validate state isolation in
storage_paths.py— given the same canonical project root, always return the same state directory; given different roots, always return different state directories. Add assertion thatstate_diris an absolute path. - T029 [US2] Update
startandstatusCLI commands (agent-brain-cli/agent_brain_cli/commands/start.py,status.py) to pass resolved project root through the full chain (CLI → server startup), ensuring consistent resolution
Checkpoint: Project root resolution is robust across symlinks, subdirectories, non-git projects, and missing tools. The same project always maps to the same state regardless of access path.
Goal: A single long-running process serves multiple projects with per-project index isolation. Power users can reduce resource overhead while maintaining full isolation.
Independent Test: Start a shared daemon, register two projects, verify isolated indexes per project, query each independently, confirm discovery pointers are written into each project's state directory.
- T062 [P] [US3] Write unit tests for SharedConfig and shared RuntimeState models in
agent-brain-server/tests/unit/test_runtime.py— test shared config read/write, discovery pointer schema, project_id generation - T063 [US3] Write integration test for shared daemon isolation in
agent-brain-server/tests/integration/test_concurrent.py— test two projects registered with shared daemon have isolated indexes, query one does not return results from the other
- T030 [P] [US3] Add
SharedConfigPydantic model toagent-brain-server/agent_brain_server/runtime.pypershared_config.jsonschema from data-model.md — fields:bind_host,port(default 45123),embedding_model,chunk_size,chunk_overlap,max_concurrent_indexing. Read/write from~/.Agent Brain/shared_config.json. - T031 [P] [US3] Add shared-mode
RuntimeStatevariant toagent-brain-server/agent_brain_server/runtime.pyperruntime_json_sharedschema — fields:schema_version,mode="shared",project_root,project_id,base_url. This is the discovery pointer written into each project's state directory.
- T032 [US3] Implement project ID generation in
agent-brain-server/agent_brain_server/project_root.py—generate_project_id(project_root: Path) -> strreturnsp_+ first 8 chars of SHA-256 hash of canonical path string. Deterministic and filesystem-safe.
- T033 [US3] Modify
agent-brain-server/agent_brain_server/api/routers/index.pyto accept optionalproject_idfield in request body perindex_request_v2schema — required in shared mode, ignored in project mode. Route indexing to project-specific storage. - T034 [US3] Modify
agent-brain-server/agent_brain_server/api/routers/query.pyto accept optionalproject_idfield in request body perquery_request_v2schema — required in shared mode, ignored in project mode. Route queries to project-specific indexes. - T035 [US3] Add new endpoint
GET /projects/{project_id}/statusinagent-brain-server/agent_brain_server/api/routers/health.pyperproject_statusschema fromcontracts/api-changes.yaml— returns project-specific indexing status.
- T036 [US3] Implement per-project storage under shared daemon in
agent-brain-server/agent_brain_server/storage_paths.py—resolve_shared_project_dir(project_id: str) -> Pathreturns~/.Agent Brain/projects/<project_id>/data/. Create directories on first use. - T037 [US3] Modify
app.stateinitialization inagent-brain-server/agent_brain_server/api/main.pyto support per-project service instances in shared mode — useDict[str, ServiceBundle]keyed byproject_idinstead of single instances
- T038 [US3] Add
--mode sharedflag tostartcommand inagent-brain-cli/agent_brain_cli/commands/start.py— starts shared daemon binding to configured port (default 45123), writesruntime.jsonto~/.Agent Brain/ - T039 [US3] Add
--mode sharedflag toinitcommand inagent-brain-cli/agent_brain_cli/commands/init.py— creates~/.Agent Brain/shared_config.jsonwith defaults - T040 [US3] Update
statusandlistcommands to detect and display shared mode instances — show(shared)label andactive_projectscount
- T041 [US3] Implement discovery pointer writing — when a project registers with a shared daemon, write a shared-mode
runtime.jsonpointer into<project_root>/.claude/Agent Brain/runtime.jsoncontaining the daemon'sbase_urland the project'sproject_id - T042 [US3] Update health endpoint in
agent-brain-server/agent_brain_server/api/routers/health.pyfor shared mode — includeactive_projectscount in response perhealth_response_v2schema
Checkpoint: Shared daemon mode is functional. Multiple projects share one process with isolated indexes. Discovery pointers let agents find the daemon from any project directory.
Goal: Skills and agents can programmatically discover a running Agent Brain instance for the current project, or auto-start one if none is running.
Independent Test: From a Python context, call discover_or_start(), verify it finds a running instance or starts one, confirm the returned base URL is valid and responds to health checks.
- T064 [P] [US4] Write unit tests for discovery client in
agent-brain-server/tests/unit/test_discovery.py— test discover() with valid/stale/missing runtime.json, test discover_or_start() with mock subprocess, test stale cleanup path
- T043 [P] [US4] Create discovery client module at
agent-brain-server/agent_brain_server/discovery.py— exportdiscover(project_root: Path) -> Optional[RuntimeState]that readsruntime.json, validates health endpoint, returns state or None - T044 [P] [US4] Add
discover_or_start(project_root: Path) -> RuntimeStatetoagent-brain-server/agent_brain_server/discovery.py— callsdiscover()first; if None, spawns server subprocess via the same logic as CLIstart, waits for health readiness, returns RuntimeState - T045 [US4] Handle stale discovery in
discovery.py— ifruntime.jsonexists but health check fails, callcleanup_stale()fromlocking.py, then either return None (discover-only) or auto-start (discover_or_start) - T046 [US4] Update
agent-brain-skill/Agent Brain/SKILL.mdto document the discovery contract — explain how skills should usediscover_or_start()to connect, include Python code examples fromquickstart.md"For Skill Authors" section
Checkpoint: Skills can reliably discover or auto-start Agent Brain for any project. The discovery contract is documented for skill authors.
Purpose: Improvements that affect multiple user stories
- T047 [P] Update
CLAUDE.mdand.claude/CLAUDE.mdwith new CLI commands and multi-instance architecture overview - T048 [P] Add structured logging to all new modules using Python
loggingwith JSON format and correlation IDs per Constitution IV (Observability) — log to per-project<state_dir>/logs/directory. Include request correlation IDs for tracing. Add metrics hooks for startup time, discovery latency, and per-instance resource usage. - T049 [P] Update
agent-brain-server/pyproject.tomlandagent-brain-cli/pyproject.tomlif any new dependencies are needed (verify:fcntlis stdlib, no new external deps expected) - T050 Run
task before-pushfrom repository root to verify formatting, linting, type checking, and all tests pass - T051 Run quickstart.md verification checklist — confirm all 7 items from quickstart.md pass
- T052 Update
.speckit/features/109-multi-instance-architecture/spec.mdstatus from "Draft" to "Implemented"
- Setup (Phase 1): No dependencies — can start immediately
- Foundational (Phase 2): Depends on Setup — BLOCKS all user stories
- US1 (Phase 3): Depends on Foundational — delivers MVP
- US2 (Phase 4): Depends on Foundational; enhances T012 from US1
- US3 (Phase 5): Depends on Foundational; extends runtime.py and storage from US1
- US4 (Phase 6): Depends on Foundational; uses runtime.py from US1
- Polish (Phase 7): Depends on all desired user stories being complete
- US1 (P1): Can start after Phase 2 — no dependencies on other stories. This is the MVP.
-
US2 (P2): Can start after Phase 2 — enhances
project_root.pycreated in US1 but is independently testable. Can run in parallel with US1 if T012 is completed first. - US3 (P3): Can start after Phase 2 — extends runtime.py and storage from US1 but adds its own models and routing. Recommend completing US1 first.
- US4 (P4): Can start after Phase 2 — creates discovery client using runtime.py from US1. Recommend completing US1 first.
- Models/modules before services
- Services before endpoints/CLI
- Core implementation before integration
- Story complete before moving to next priority
Phase 1: T003 and T004 can run in parallel Phase 2: T006, T007, T008 can run in parallel (different files); T009 depends on T006-T008; T010 depends on T009 Phase 3 (US1): T012, T013, T014, T015 can all run in parallel (4 independent new modules); T018-T022 can run in parallel (5 independent CLI commands); T016-T017 depend on T012-T015 Phase 4 (US2): T026, T027 can run in parallel Phase 5 (US3): T030, T031 can run in parallel; T033, T034 can run in parallel Phase 6 (US4): T043, T044 can run in parallel
# Launch all 4 core modules in parallel (no dependencies between them):
Task: "T012 [P] [US1] Implement project_root.py"
Task: "T013 [P] [US1] Implement storage_paths.py"
Task: "T014 [P] [US1] Implement runtime.py"
Task: "T015 [P] [US1] Implement locking.py"
# After core modules complete, launch 5 CLI commands in parallel:
Task: "T018 [P] [US1] Implement start command"
Task: "T019 [P] [US1] Implement stop command"
Task: "T020 [P] [US1] Implement status command"
Task: "T021 [P] [US1] Implement list command"
Task: "T022 [P] [US1] Implement init command"- Complete Phase 1: Setup (T001-T004)
- Complete Phase 2: Foundational — State Decoupling (T005-T011)
- Complete Phase 3: User Story 1 — Per-Project Lifecycle (T012-T025)
- STOP and VALIDATE: Test per-project lifecycle end-to-end
- Run
task before-pushto verify quality gate
- Setup + Foundational → Server runs with configurable state_dir
- Add US1 → Per-project mode works (
init → start → status → stop) — MVP! - Add US2 → Robust project root resolution (symlinks, non-git projects)
- Add US3 → Shared daemon mode for power users
- Add US4 → Agent/skill discovery integration
- Each story adds value without breaking previous stories
With multiple developers after Foundational phase:
- Developer A: US1 (Per-Project Lifecycle) — MVP path
- Developer B: US2 (Project Root Discovery) — can start after T012 lands
- Developer C: US3 (Shared Daemon) — recommend waiting for US1 completion
- Developer D: US4 (Agent Integration) — recommend waiting for US1 completion
- [P] tasks = different files, no dependencies on incomplete tasks in same phase
- [USn] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- Test tasks included per Constitution III (Test-Alongside): T055-T064
-
storage.pyin plan.md renamed tostorage_paths.pyto avoid collision with existingstorage/directory - T053 adds config precedence chain per FR-009; T054 ensures OpenAPI compliance per Constitution II
- T021 uses
~/.Agent Brain/registry.jsonfor instance discovery (resolves U1)
- Design-Architecture-Overview
- Design-Query-Architecture
- Design-Storage-Architecture
- Design-Class-Diagrams
- GraphRAG-Guide
- Agent-Skill-Hybrid-Search-Guide
- Agent-Skill-Graph-Search-Guide
- Agent-Skill-Vector-Search-Guide
- Agent-Skill-BM25-Search-Guide
Search
Server
Setup
- Pluggable-Providers-Spec
- GraphRAG-Integration-Spec
- Agent-Brain-Plugin-Spec
- Multi-Instance-Architecture-Spec