Pluggable Providers Spec

Feature Specification: Pluggable Model Providers

Feature Branch: 103-pluggable-providers Created: 2025-12-18 Status: Draft Input: Product Roadmap Phase 5

User Scenarios & Testing

User Story 1 - Switch Embedding Provider (Priority: P1)

A user wants to use a different embedding model provider than the default OpenAI to reduce costs or use local models.

Why this priority: Core capability for provider flexibility. Many users want Ollama for local/offline operation.

Independent Test: Configure Ollama embeddings in config.yaml, index documents, verify embeddings work.

Acceptance Scenarios:

Given config.yaml with embedding.provider: ollama, When server starts, Then it uses Ollama for embeddings
Given config.yaml with embedding.provider: openai, When server starts, Then it uses OpenAI (current default)
Given config.yaml with embedding.provider: cohere, When server starts, Then it uses Cohere embeddings
Given invalid provider specified, When server starts, Then it fails with descriptive error message

User Story 2 - Switch Summarization LLM (Priority: P1)

A user wants to use a different LLM for summarization/extraction than the default Claude Haiku.

Why this priority: Summarization is used for context injection. Users may prefer different models for cost or quality.

Independent Test: Configure GPT-4o for summarization, index documents, verify summaries are generated.

Acceptance Scenarios:

Given config with summarization.provider: openai, When indexing, Then GPT-4o generates summaries
Given config with summarization.provider: ollama, When indexing, Then local Llama model generates summaries
Given config with summarization.provider: gemini, When indexing, Then Gemini generates summaries
Given config with summarization.provider: grok, When indexing, Then Grok generates summaries via OpenAI-compatible endpoint

User Story 3 - Configuration via YAML File (Priority: P1)

Users configure providers via a simple YAML configuration file without code changes.

Why this priority: Configuration-driven is the goal. No code changes should be required.

Independent Test: Create config.yaml, start server, verify configured providers are used.

Acceptance Scenarios:

Given config.yaml in project root, When server starts, Then it loads and applies configuration
Given config.yaml with all settings, When validated, Then all required fields are present
Given environment variable DOC_SERVE_CONFIG, When set, Then server loads config from that path
Given no config.yaml exists, When server starts, Then it uses sensible defaults (current behavior)

User Story 4 - Run Completely Offline with Ollama (Priority: P2)

A user wants to run doc-serve entirely locally without any external API calls for privacy reasons.

Why this priority: Key value proposition for privacy-conscious users. Enables air-gapped operation.

Independent Test: Configure Ollama for both embeddings and summarization, disconnect from internet, verify full functionality.

Acceptance Scenarios:

Given Ollama configured for embeddings and summarization, When offline, Then indexing works
Given Ollama configured, When querying, Then no external API calls are made
Given Ollama models downloaded, When first query after restart, Then response time < 5 seconds
Given local-only configuration, When API keys not set, Then server starts without errors

User Story 5 - API Key Management via Environment (Priority: P2)

API keys for providers are managed securely via environment variables referenced in config.

Why this priority: Security best practice. Keys should not be in config files.

Independent Test: Configure provider with api_key_env: OPENAI_API_KEY, verify key is read from environment.

Acceptance Scenarios:

Given config with params.api_key_env: OPENAI_API_KEY, When server starts, Then it reads key from environment
Given environment variable not set, When server starts with that provider, Then it fails with clear error
Given multiple providers configured, When each has different key env vars, Then all are read correctly
Given Ollama provider (no key needed), When configured, Then no API key validation occurs

Edge Cases

What happens when switching providers with existing index? (Re-indexing required, warn user)
How does system handle model not found in provider? (Descriptive error with available models)
What happens when provider API is temporarily unavailable? (Retry with backoff, fail gracefully)
How does system handle embedding dimension mismatch? (Error during indexing, not silently corrupt index)
What happens when mixing providers across restarts? (Detect mismatch, require re-index)

Requirements

Functional Requirements

FR-001: System MUST support configuration via config.yaml file
FR-002: System MUST support OpenAI, Ollama, and Cohere embedding providers
FR-003: System MUST support Anthropic, OpenAI, Gemini, Grok, and Ollama summarization providers
FR-004: Configuration MUST NOT require code changes to switch providers
FR-005: API keys MUST be read from environment variables, not stored in config
FR-006: System MUST validate provider configuration on startup
FR-007: System MUST detect embedding dimension mismatches and prevent index corruption
FR-008: System MUST log which providers are in use on startup
FR-009: System MUST support Ollama for completely offline operation
FR-010: Provider switch with existing index MUST require explicit re-indexing

Supported Providers

Embeddings:

Provider	Models	Authentication
OpenAI	text-embedding-3-small/large, ada-002	API key
Ollama	nomic-embed-text, bge, mxbai-embed-large	None (local)
Cohere	embed-english-v3, embed-multilingual-v3	API key

Note: Grok and Gemini do NOT provide public embedding APIs.

Summarization/LLM:

Provider	Models	Authentication
Anthropic	Claude 4.5 Haiku, Sonnet 4.5, Opus 4.5	API key
OpenAI	GPT-5, GPT-5-mini	API key
Gemini	gemini-3-flash, gemini-3-pro	API key
Grok	grok-4, grok-4-fast	API key (OpenAI-compatible)
Ollama	llama4:scout, mistral-small3.2, qwen3-coder, gemma3	None (local)

Key Entities

ProviderConfig: Configuration for embedding or summarization provider
EmbeddingProvider: Abstract interface for embedding generation
SummarizationProvider: Abstract interface for text summarization
ConfigLoader: Reads and validates config.yaml
ProviderFactory: Creates provider instances from configuration

Success Criteria

Measurable Outcomes

SC-001: Switching providers requires only config.yaml change, no code changes
SC-002: Ollama enables fully offline operation for both embeddings and summarization
SC-003: All Phase 1-4 functionality remains working with any supported provider
SC-004: Provider configuration is validated on startup with clear error messages
SC-005: API keys are never logged or exposed in error messages
SC-006: Documentation includes setup guides for each supported provider

Pluggable Providers Spec

Feature Specification: Pluggable Model Providers

User Scenarios & Testing

User Story 1 - Switch Embedding Provider (Priority: P1)

User Story 2 - Switch Summarization LLM (Priority: P1)

User Story 3 - Configuration via YAML File (Priority: P1)

User Story 4 - Run Completely Offline with Ollama (Priority: P2)

User Story 5 - API Key Management via Environment (Priority: P2)

Edge Cases

Requirements

Functional Requirements

Supported Providers

Key Entities

Success Criteria

Measurable Outcomes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!