-
Notifications
You must be signed in to change notification settings - Fork 16
Product Roadmap
Version: 1.2.0 Last Updated: 2026-02-01 Status: Active
Agent Brain is a local-first RAG (Retrieval-Augmented Generation) service that indexes documentation and source code, providing intelligent semantic search via API for CLI tools and Claude integration. The core principles are:
- Privacy First: Runs entirely on your machine with disk persistence
- High Retrieval Quality: Vector + keyword + hybrid search strategies
- Operational Ergonomics: CLI-first management experience
- Future-Proof Flexibility: Built on LlamaIndex abstractions
- Phased Delivery: Incremental value with each release
| Phase | Name | Spec ID | Status | Priority | Transport |
|---|---|---|---|---|---|
| 1 | Core Document RAG | 001-005 | COMPLETED | - | HTTP |
| 2 | BM25 & Hybrid Retrieval | 100 | COMPLETED | P1 | HTTP |
| 3 | Source Code Ingestion | 101 | COMPLETED | P2 | HTTP |
| 3.1 | Multi-Instance Architecture | 109 | COMPLETED | P1 | HTTP |
| 3.2 | C# Code Indexing | 110 | COMPLETED | P2 | HTTP |
| 3.3 | Skill Instance Discovery | 111 | COMPLETED | P2 | HTTP |
| 3.4 | Agent Brain Naming | 112 | COMPLETED | P1 | HTTP |
| 3.5 | GraphRAG Integration | 113 | COMPLETED | P2 | HTTP |
| 3.6 | Agent Brain Plugin | 114 | COMPLETED | P2 | HTTP |
| 4 | UDS & Claude Plugin Evolution | 102 | Future | P3 | HTTP + UDS |
| 5 | Pluggable Model Providers | 103 | Next | P3 | HTTP + UDS |
| 6 | PostgreSQL/AlloyDB Backend | 104 | Future | P4 | HTTP + UDS |
| 7 | AWS Bedrock Provider | 105 | Future | P4 | HTTP + UDS |
| 8 | Google Vertex AI Provider | 106 | Future | P4 | HTTP + UDS |
Status: COMPLETED
Spec Directory: specs/001-005
- Document Ingestion: PDF + Markdown (.md, .mdx) support
- Context-Enriched Chunking: Section/heading-aware chunking with Claude summarization
- Vector Search: Chroma vector database with OpenAI text-embedding-3-large
- Persistent Storage: Disk-based persistence across restarts
-
REST API:
/query,/index,/healthendpoints -
CLI Tool:
agent-brainwith status, query, index, reset commands - Claude Skill: Basic integration for conversational document search
- FastAPI + Uvicorn server
- LlamaIndex for document processing
- ChromaDB for vector storage
- OpenAI embeddings + Claude Haiku summarization
Status: NEXT
Spec Directory: specs/100-bm25-hybrid-retrieval
Transport: HTTP only
Add classic keyword search (BM25) alongside vector search, with hybrid retrieval combining both strategies using Reciprocal Rank Fusion (RRF).
- BM25 Retriever: Classic full-text keyword search
- Hybrid Retrieval: Combined vector + BM25 with configurable fusion
-
Retrieval Mode Selection: API parameter for
vector,bm25, orhybrid(default) - Alpha Tuning: Configurable weight between vector and keyword scores
- Enhanced Scoring Metadata: Detailed score breakdowns in responses
POST /query
{
"query": "search text",
"mode": "hybrid", // "vector" | "bm25" | "hybrid"
"alpha": 0.5, // fusion weight (0=BM25 only, 1=vector only)
"top_k": 10
}
- Better precision for exact term matching (function names, error codes)
- Improved recall through combined retrieval strategies
- User control over search behavior
Status: Planned
Spec Directory: specs/101-code-ingestion
Transport: HTTP
Enable indexing of source code alongside documentation for unified corpus searches. This is critical for the book generation and corpus use cases.
-
Code Ingestion via CodeSplitter:
- Python (.py)
- TypeScript/JavaScript (.ts, .tsx, .js, .jsx)
- Unified Indexing: Vector + BM25 across documents and code
- Code Summaries: SummaryExtractor generates natural language descriptions per code chunk
-
Extended Filters:
source_type(doc vs code),languagefilters in queries - AST-Aware Chunking: Preserves function/class boundaries
POST /index
{
"folder_path": "/path/to/project",
"include_code": true,
"languages": ["python", "typescript"]
}
POST /query
{
"query": "authentication handler",
"source_type": "code", // "doc" | "code" | "all"
"language": "python"
}
- Single search across documentation and implementation
- Code context improves answer quality for technical queries
- Enables corpus-based book/tutorial generation
Status: COMPLETED
Spec Directory: .speckit/features/109-multi-instance-architecture/
Transport: HTTP
Enable running multiple concurrent Agent Brain instances with per-project isolation, automatic port allocation, and runtime discovery for agent integration.
- Per-Project Isolation: Each project gets its own server instance with isolated indexes
- Auto-Port Allocation: OS-assigned ports prevent conflicts between projects
-
State Directory: Project state stored in
.claude/doc-serve/ -
Runtime Discovery:
runtime.jsonenables agents and skills to find running instances - Lock File Protocol: Prevents double-start of instances
- Project Root Resolution: Consistent detection via git or marker files
agent-brain init # Initialize project for Agent Brain
agent-brain start --daemon # Start server with auto-port
agent-brain stop # Stop the server
agent-brain list # List all running instances- Work on multiple projects simultaneously
- No port conflicts between projects
- State travels with the project (can be version-controlled)
- Agents can automatically discover running servers
Status: COMPLETED
Spec Directory: .speckit/features/110-csharp-code-indexing/
Transport: HTTP
Add C# language support to the code ingestion pipeline with AST-aware parsing.
-
File Extensions:
.csand.csx(C# scripts) - AST-Aware Chunking: Classes, methods, interfaces, properties, enums
-
XML Documentation: Extracts
/// <summary>comments as metadata -
Language Filter: Query with
--languages csharp
- Full .NET ecosystem support
- Semantic search across C# codebases
- Rich metadata extraction for better search quality
Status: IN-PROGRESS
Spec Directory: .speckit/features/111-skill-instance-discovery/
Transport: HTTP
Update the Agent Brain Claude Code skill to leverage multi-instance architecture for automatic server discovery and lifecycle management.
-
Auto-Initialization: Skill automatically runs
agent-brain initwhen needed -
Server Discovery: Reads
runtime.jsonto find running instances - Auto-Start: Starts server automatically if no instance is running
- Status Reporting: Reports port, mode, instance ID, document count
- Cross-Agent Sharing: Multiple agents share the same server instance
- Zero-configuration skill usage
- No manual server management required
- Seamless multi-agent workflows
Status: COMPLETED
Spec Directory: .speckit/features/112-agent-brain-naming/
Transport: HTTP
Unify branding across all packages from "doc-serve" to "agent-brain" for consistent identity and discoverability.
-
Package Renaming:
doc-serve-rag→agent-brain-rag,doc-serve-cli→agent-brain-cli -
Command Renaming:
doc-serveCLI →agent-brainCLI - Backward Compatibility: Legacy aliases maintained for transition period
- Documentation Updates: All references updated to new naming
- Consistent branding across ecosystem
- Improved discoverability
- Better alignment with AI agent workflows
Status: COMPLETED
Spec Directory: .speckit/features/113-graphrag-integration/
Transport: HTTP
Add knowledge graph extraction alongside vector search for enhanced entity-centric retrieval and relationship queries.
- Entity Extraction: Named entities, concepts, and relationships extracted from documents
- Property Graph Store: LlamaIndex PropertyGraphStore with configurable backends
- Graph-Enhanced Retrieval: Entity disambiguation and relationship traversal
- Hybrid Mode: Combines graph context with vector similarity for richer results
-
Query Modes:
vector,graph,hybrid(default)
POST /query
{
"query": "search text",
"mode": "hybrid", // "vector" | "graph" | "hybrid"
"include_graph": true,
"top_k": 10
}
- Better handling of entity-centric queries
- Relationship discovery across documents
- Improved context for complex questions
Status: COMPLETED
Spec Directory: .speckit/features/114-agent-brain-plugin/
Transport: HTTP
Claude Code plugin providing commands, agents, and skills for seamless Agent Brain integration within Claude Code workflows.
-
Slash Commands:
-
/agent-brain:search- Semantic search across indexed content -
/agent-brain:status- Check server status and document count -
/agent-brain:index- Index documents or code
-
-
Specialized Agents:
-
agent-brain-researcher- Multi-step research with citations -
agent-brain-indexer- Automated indexing workflows
-
- Skill Integration: Updated skill with auto-discovery and lifecycle management
# Install plugin
claude plugins add agent-brain-plugin
# Or from local path
claude plugins add ./agent-brain-plugin- Native Claude Code integration
- Autonomous research capabilities
- Simplified server lifecycle management
Status: Future
Spec Directory: specs/102-uds-claude-plugin
Transport: HTTP + optional UDS
Add Unix Domain Socket (UDS) transport for lower-latency local communication and evolve the Claude plugin with richer capabilities.
- UDS Transport: Optional Unix Domain Socket alongside HTTP
-
Rich Slash Commands:
-
/search- General semantic search -
/doc- Documentation-only search -
/code- Code-only search
-
- Server Lifecycle Management: Plugin auto-starts/stops server
- Multi-Step Research Agent: Break down complex questions
- ~10-50x latency improvement for local queries via UDS
- More intuitive Claude interaction patterns
- Autonomous research capabilities
Status: Future
Spec Directory: specs/103-pluggable-providers
Transport: HTTP + UDS
Full configuration-driven model selection using LlamaIndex abstractions. No code changes required to switch providers.
| Provider | Models | Notes |
|---|---|---|
| OpenAI | text-embedding-3-small/large, ada-002 | Default |
| Ollama | nomic-embed-text, bge, etc. | Local, offline |
| Cohere | embed-english-v3, embed-multilingual-v3 | Via API |
Note: Grok and Gemini currently lack public embedding APIs and are not supported for embeddings.
| Provider | Models | Notes |
|---|---|---|
| Anthropic | Claude Haiku, Sonnet, Opus | Default |
| OpenAI | GPT-4o, GPT-4o-mini | Via API |
| Gemini | Flash, Pro | Via API |
| Grok | Via OpenAI-compatible endpoint | Via API |
| Ollama | Llama 3, Mistral, Qwen | Local, offline |
# config.yaml
embedding:
provider: ollama
model: nomic-embed-text
params:
base_url: http://localhost:11434
summarization:
provider: anthropic
model: claude-3-5-sonnet-20241022
params:
api_key_env: ANTHROPIC_API_KEY- Run completely offline with Ollama
- Cost optimization with different providers
- Enterprise flexibility
Status: Future
Spec Directory: specs/104-postgresql-backend
Transport: HTTP + UDS
Optional configuration-driven switch to PostgreSQL (local) or AlloyDB (managed/cloud) as persistent storage backend.
- pgvector Extension: High-performance vector similarity search
- AlloyDB ScaNN Indexes: Google Cloud's optimized vector indexing
- Native Full-Text Search: PostgreSQL tsvector/tsquery (BM25-like)
-
Hybrid Retrieval:
hybrid_search=Trueon PGVectorStore - JSONB Metadata: Rich metadata with GIN indexes
- Superior scalability for very large corpora
- Built-in replication/backup
- Transactional consistency
- Mature PostgreSQL full-text engine replaces custom BM25
- Install
llama-index-vector-stores-postgres - Update config.yaml with PostgreSQL connection
- Re-index documents (one-time migration)
Status: Future
Spec Directory: specs/105-aws-bedrock
Transport: HTTP + UDS
Add full support for AWS Bedrock as a pluggable provider for both embeddings and summarization/completion LLMs.
Embeddings:
- Amazon Titan Embed Text v1/v2
- Cohere Embed English/Multilingual v3
Summarization/LLM:
- Claude (via Bedrock)
- Titan Text
- Meta Llama
- Mistral
- Cohere Command
embedding:
provider: bedrock
model: amazon.titan-embed-text-v2
params:
region: us-east-1
summarization:
provider: bedrock
model: anthropic.claude-3-sonnet- Default AWS credentials chain
- Profile-based authentication
- Explicit keys/region configuration
- Enterprise-grade security/compliance
- Access to high-performance AWS-hosted models
- Cost optimization for AWS users
Status: Future
Spec Directory: specs/106-vertex-ai
Transport: HTTP + UDS
Add full support for Google Vertex AI as a pluggable provider for embeddings and summarization/completion LLMs.
Embeddings:
- textembedding-gecko
- multimodalembedding
Summarization/LLM:
- Gemini 1.5 Flash/Pro
- gemini-1.0-pro
embedding:
provider: vertex
model: textembedding-gecko@003
params:
project: my-gcp-project
location: us-central1
summarization:
provider: vertex
model: gemini-3-flash- Service account JSON
- Application Default Credentials (ADC)
- Explicit project/location
- Integration with Google Cloud ecosystem
- Strong multimodal capabilities with Gemini
- Enterprise features for GCP users
Agent Brain enables creating searchable, AI-queryable corpora from large documentation sets, codebases, or book collections. This is a key differentiator for technical content development.
Problem: AWS CDK has extensive documentation across multiple languages and services. Developers need quick access to specific patterns and configurations.
Solution with Agent Brain:
# Index AWS CDK documentation
agent-brain index ~/aws-cdk-docs/
# Index AWS CDK Python library source (Phase 3)
agent-brain index ~/aws-cdk-python/src/ --include-code
# Query during development
agent-brain query "S3 bucket with lifecycle rules and versioning"Sample Queries:
- "How to create a Lambda function with VPC access?"
- "DynamoDB table with global secondary index pattern"
- "Cross-stack references best practices"
- "EventBridge rule for S3 object creation"
Book Generation Application:
- Index AWS CDK source + comprehensive PDF guide
- Create corpus for writing AWS CDK tutorials
- Claude can cite specific documentation and code examples
Problem: Building AI applications requires referencing Claude API docs, SDK documentation, and best practices frequently.
Solution with Agent Brain:
# Index Claude documentation
agent-brain index ~/anthropic-docs/
# Index Claude SDK source (Phase 3)
agent-brain index ~/claude-sdk/src/ --include-code
# Query via Claude skill
"How do I implement streaming responses with the Python SDK?"Sample Queries:
- "Claude tool use best practices"
- "Prompt caching implementation"
- "Error handling for rate limits"
- "Vision API usage patterns"
Skill Generation Application:
- Index Claude Code documentation
- Create corpus for writing Claude Code skills
- Skills can reference authoritative source material
Problem: Large organizations have extensive internal documentation that teams need to search effectively for onboarding and reference.
Solution with Agent Brain:
# Index internal documentation
agent-brain index ~/company-docs/architecture/
agent-brain index ~/company-docs/onboarding/
agent-brain index ~/company-docs/api-guides/
# Index internal code (Phase 3)
agent-brain index ~/company-monorepo/libs/ --include-code
# Team members query via Claude skill
"What is our authentication flow for mobile apps?"Benefits:
- New team members find answers faster
- Reduced dependency on tribal knowledge
- Consistent answers across the organization
Problem: Contributing to large open source projects requires understanding existing patterns, conventions, and implementation details.
Solution with Agent Brain:
# Index project documentation
agent-brain index ~/kubernetes/docs/
# Index project source code (Phase 3)
agent-brain index ~/kubernetes/pkg/ --include-code
# Query for contribution patterns
agent-brain query "How are admission controllers implemented?"Sample Queries:
- "Pod scheduling algorithm"
- "Custom resource definition validation"
- "Controller reconciliation loop pattern"
Benefits:
- Faster ramp-up for contributors
- Understand conventions before submitting PRs
- Find similar implementations to reference
- Implement Phase 2 (BM25 + Hybrid Retrieval) - Immediate retrieval quality improvements
- Proceed to Phase 3 (Code Ingestion) - Enables unified corpus searches
- Phase 5 for Local-Only - Run completely offline with Ollama
- Phase 6 for Scale - PostgreSQL backend for large corpora
Model provider flexibility (Phases 5, 7, 8) and PostgreSQL backend (Phase 6) are cleanly deferred until the core retrieval enhancements are stable.
- 001-099: Phase 1 specs (existing, completed)
- 100-199: Roadmap phases 2-8 (future development)
- 200+: Reserved for future expansion
See docs/roadmaps/spec-mapping.md for full phase-to-spec mapping.
- Design-Architecture-Overview
- Design-Query-Architecture
- Design-Storage-Architecture
- Design-Class-Diagrams
- GraphRAG-Guide
- Agent-Skill-Hybrid-Search-Guide
- Agent-Skill-Graph-Search-Guide
- Agent-Skill-Vector-Search-Guide
- Agent-Skill-BM25-Search-Guide
Search
Server
Setup
- Pluggable-Providers-Spec
- GraphRAG-Integration-Spec
- Agent-Brain-Plugin-Spec
- Multi-Instance-Architecture-Spec