-
Notifications
You must be signed in to change notification settings - Fork 16
Pluggable Providers Spec
Feature Branch: 103-pluggable-providers
Created: 2025-12-18
Status: Draft
Input: Product Roadmap Phase 5
A user wants to use a different embedding model provider than the default OpenAI to reduce costs or use local models.
Why this priority: Core capability for provider flexibility. Many users want Ollama for local/offline operation.
Independent Test: Configure Ollama embeddings in config.yaml, index documents, verify embeddings work.
Acceptance Scenarios:
-
Given config.yaml with
embedding.provider: ollama, When server starts, Then it uses Ollama for embeddings -
Given config.yaml with
embedding.provider: openai, When server starts, Then it uses OpenAI (current default) -
Given config.yaml with
embedding.provider: cohere, When server starts, Then it uses Cohere embeddings - Given invalid provider specified, When server starts, Then it fails with descriptive error message
A user wants to use a different LLM for summarization/extraction than the default Claude Haiku.
Why this priority: Summarization is used for context injection. Users may prefer different models for cost or quality.
Independent Test: Configure GPT-4o for summarization, index documents, verify summaries are generated.
Acceptance Scenarios:
-
Given config with
summarization.provider: openai, When indexing, Then GPT-4o generates summaries -
Given config with
summarization.provider: ollama, When indexing, Then local Llama model generates summaries -
Given config with
summarization.provider: gemini, When indexing, Then Gemini generates summaries -
Given config with
summarization.provider: grok, When indexing, Then Grok generates summaries via OpenAI-compatible endpoint
Users configure providers via a simple YAML configuration file without code changes.
Why this priority: Configuration-driven is the goal. No code changes should be required.
Independent Test: Create config.yaml, start server, verify configured providers are used.
Acceptance Scenarios:
- Given config.yaml in project root, When server starts, Then it loads and applies configuration
- Given config.yaml with all settings, When validated, Then all required fields are present
-
Given environment variable
DOC_SERVE_CONFIG, When set, Then server loads config from that path - Given no config.yaml exists, When server starts, Then it uses sensible defaults (current behavior)
A user wants to run doc-serve entirely locally without any external API calls for privacy reasons.
Why this priority: Key value proposition for privacy-conscious users. Enables air-gapped operation.
Independent Test: Configure Ollama for both embeddings and summarization, disconnect from internet, verify full functionality.
Acceptance Scenarios:
- Given Ollama configured for embeddings and summarization, When offline, Then indexing works
- Given Ollama configured, When querying, Then no external API calls are made
- Given Ollama models downloaded, When first query after restart, Then response time < 5 seconds
- Given local-only configuration, When API keys not set, Then server starts without errors
API keys for providers are managed securely via environment variables referenced in config.
Why this priority: Security best practice. Keys should not be in config files.
Independent Test: Configure provider with api_key_env: OPENAI_API_KEY, verify key is read from environment.
Acceptance Scenarios:
-
Given config with
params.api_key_env: OPENAI_API_KEY, When server starts, Then it reads key from environment - Given environment variable not set, When server starts with that provider, Then it fails with clear error
- Given multiple providers configured, When each has different key env vars, Then all are read correctly
- Given Ollama provider (no key needed), When configured, Then no API key validation occurs
- What happens when switching providers with existing index? (Re-indexing required, warn user)
- How does system handle model not found in provider? (Descriptive error with available models)
- What happens when provider API is temporarily unavailable? (Retry with backoff, fail gracefully)
- How does system handle embedding dimension mismatch? (Error during indexing, not silently corrupt index)
- What happens when mixing providers across restarts? (Detect mismatch, require re-index)
-
FR-001: System MUST support configuration via
config.yamlfile - FR-002: System MUST support OpenAI, Ollama, and Cohere embedding providers
- FR-003: System MUST support Anthropic, OpenAI, Gemini, Grok, and Ollama summarization providers
- FR-004: Configuration MUST NOT require code changes to switch providers
- FR-005: API keys MUST be read from environment variables, not stored in config
- FR-006: System MUST validate provider configuration on startup
- FR-007: System MUST detect embedding dimension mismatches and prevent index corruption
- FR-008: System MUST log which providers are in use on startup
- FR-009: System MUST support Ollama for completely offline operation
- FR-010: Provider switch with existing index MUST require explicit re-indexing
Embeddings:
| Provider | Models | Authentication |
|---|---|---|
| OpenAI | text-embedding-3-small/large, ada-002 | API key |
| Ollama | nomic-embed-text, bge, mxbai-embed-large | None (local) |
| Cohere | embed-english-v3, embed-multilingual-v3 | API key |
Note: Grok and Gemini do NOT provide public embedding APIs.
Summarization/LLM:
| Provider | Models | Authentication |
|---|---|---|
| Anthropic | Claude 4.5 Haiku, Sonnet 4.5, Opus 4.5 | API key |
| OpenAI | GPT-5, GPT-5-mini | API key |
| Gemini | gemini-3-flash, gemini-3-pro | API key |
| Grok | grok-4, grok-4-fast | API key (OpenAI-compatible) |
| Ollama | llama4:scout, mistral-small3.2, qwen3-coder, gemma3 | None (local) |
- ProviderConfig: Configuration for embedding or summarization provider
- EmbeddingProvider: Abstract interface for embedding generation
- SummarizationProvider: Abstract interface for text summarization
- ConfigLoader: Reads and validates config.yaml
- ProviderFactory: Creates provider instances from configuration
- SC-001: Switching providers requires only config.yaml change, no code changes
- SC-002: Ollama enables fully offline operation for both embeddings and summarization
- SC-003: All Phase 1-4 functionality remains working with any supported provider
- SC-004: Provider configuration is validated on startup with clear error messages
- SC-005: API keys are never logged or exposed in error messages
- SC-006: Documentation includes setup guides for each supported provider
- Design-Architecture-Overview
- Design-Query-Architecture
- Design-Storage-Architecture
- Design-Class-Diagrams
- GraphRAG-Guide
- Agent-Skill-Hybrid-Search-Guide
- Agent-Skill-Graph-Search-Guide
- Agent-Skill-Vector-Search-Guide
- Agent-Skill-BM25-Search-Guide
Search
Server
Setup
- Pluggable-Providers-Spec
- GraphRAG-Integration-Spec
- Agent-Brain-Plugin-Spec
- Multi-Instance-Architecture-Spec