diff --git a/changes/unreleased/Added-20260422-004204.yaml b/changes/unreleased/Added-20260422-004204.yaml
new file mode 100644
index 00000000..4b8ad287
--- /dev/null
+++ b/changes/unreleased/Added-20260422-004204.yaml
@@ -0,0 +1,3 @@
+kind: Added
+body: Add RAG service with hybrid vector + LLM search
+time: 2026-04-22T00:42:04.283582+05:30
diff --git a/docs/services/index.md b/docs/services/index.md
index d428f3f4..3adb2489 100644
--- a/docs/services/index.md
+++ b/docs/services/index.md
@@ -9,7 +9,8 @@ following service types:
 - The [pgEdge Postgres MCP Server](mcp.md) connects AI agents and
   LLM-powered applications to your database.
 - The [pgEdge RAG Server](rag.md) enables retrieval-augmented generation
-  workflows using your database as a knowledge store.
+  workflows using your database as a knowledge store, returning
+  LLM-synthesized answers grounded in your data.
 - [PostgREST](postgrest.md) automatically generates a REST API from
   your PostgreSQL schema, making your data accessible over HTTP without
   writing backend code.
diff --git a/docs/services/rag.md b/docs/services/rag.md
new file mode 100644
index 00000000..55e439eb
--- /dev/null
+++ b/docs/services/rag.md
@@ -0,0 +1,1099 @@
+# pgEdge RAG Server
+
+The RAG (Retrieval-Augmented Generation) service runs an intelligent
+query server alongside your database. The service uses vector and
+keyword search to retrieve relevant document chunks from PostgreSQL
+and synthesizes LLM-generated answers based on the retrieved context.
+For more information, see the
+[pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server)
+project.
+
+## Overview
+
+The Control Plane provisions a RAG service container on each specified
+host. The service connects to the database using an existing user
+specified in the `connect_as` field, which must be defined in
+`database_users`, and automatically embeds that user's credentials in
+the service configuration. Client applications submit natural language
+queries to the service, which performs hybrid vector and keyword search
+against document tables and returns LLM-synthesized answers with source
+citations.
+
+See [Managing Services](managing.md) for instructions on adding,
+updating, and removing services. The sections below cover RAG-specific
+configuration.
+
+## Database Prerequisites
+
+Before deploying a RAG service, your PostgreSQL database must have the
+following items configured:
+
+- The pgvector extension must be installed and enabled.
+- The database must have document tables with text and vector columns.
+- An HNSW index on vector columns enables fast similarity search.
+- A GIN index on text columns enables keyword search (BM25).
+
+The Control Plane can automatically provision all of these during
+database creation using the `scripts.post_database_create` hook. See
+[Preparing the Database](#preparing-the-database) for a complete
+example. Alternatively, you can provision these manually after
+database creation.
+
+## Configuration Reference
+
+All configuration fields are provided in the `config` object of the
+service spec.
+
+### Service Connection
+
+The `connect_as` field at the service level specifies which database
+user the RAG service authenticates as. This user must already be
+defined in the `database_users` array when creating the database. The
+Control Plane automatically embeds that user's credentials in the
+service configuration.
+
+The following example shows the `connect_as` field in the service
+spec:
+
+```json
+{
+  "service_id": "rag",
+  "service_type": "rag",
+  "connect_as": "app_read_only",
+  "config": { ... }
+}
+```
+
+In this example, `app_read_only` must be defined in `database_users`:
+
+```json
+{
+  "username": "app_read_only",
+  "password": "your_password",
+  "attributes": ["LOGIN"]
+}
+```
+
+### Pipeline Configuration
+
+The `pipelines` array (required) defines one or more RAG workflows.
+Each pipeline specifies which tables to search, which embedding
+provider to use, and which LLM to use to generate answers.
+
+The following table describes the pipeline configuration fields:
+
+| Field | Type | Description |
+|---|---|---|
+| `pipelines[].name` | string | Required. Pipeline identifier used in query URLs. Lowercase alphanumeric, hyphens, and underscores. Must not start with a hyphen. |
+| `pipelines[].description` | string | Optional. Human-readable pipeline description. |
+| `pipelines[].tables[]` | array | Required. Array of table specifications. See [Table Configuration](#table-configuration). |
+| `pipelines[].embedding_llm` | object | Required. Embedding provider config. See [Embedding Configuration](#embedding-configuration). |
+| `pipelines[].rag_llm` | object | Required. LLM provider config. See [LLM Configuration](#llm-configuration). |
+| `pipelines[].token_budget` | integer | Optional. Max tokens for context documents sent to the LLM. |
+| `pipelines[].top_n` | integer | Optional. Number of documents to retrieve per query. |
+| `pipelines[].system_prompt` | string | Optional. Custom system prompt prepended to every LLM request for this pipeline. |
+| `pipelines[].search` | object | Optional. Search behavior settings. See [Search Configuration](#search-configuration). |
+
+### Embedding Configuration
+
+The `embedding_llm` object configures the embedding provider used to
+vectorize each incoming query. The embedding vector is then used for
+similarity search against stored document vectors. All required fields
+must be set; `api_key` is not required for `ollama`.
+
+The following table describes the embedding configuration fields:
+
+| Field | Type | Description |
+|---|---|---|
+| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `ollama`. |
+| `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). |
+| `api_key` | string | API key for the provider. Required for `openai` and `voyage`. Not used for `ollama`. |
+| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` - set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
+
+### LLM Configuration
+
+The `rag_llm` object configures the LLM provider used to synthesize
+the final answer from retrieved documents. `api_key` is required for
+all providers except `ollama`.
+
+The following table describes the LLM configuration fields:
+
+| Field | Type | Description |
+|---|---|---|
+| `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. |
+| `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). |
+| `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. |
+| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` - set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
+
+!!! note
+    If `embedding_llm` and `rag_llm` share the same provider and both
+    specify an `api_key`, the values must be identical. The pgEdge RAG
+    Server maintains one key slot per provider and cannot reconcile
+    two different values.
+
+### Table Configuration
+
+Each table in a pipeline specifies how to access document text and
+embeddings. The following table describes the table configuration
+fields:
+
+| Field | Type | Description |
+|---|---|---|
+| `table` | string | Required. The table or view name containing documents. |
+| `text_column` | string | Required. Column name containing the document text. |
+| `vector_column` | string | Required. Column name containing the embedding vectors. |
+| `id_column` | string | Optional. Column name for document IDs. Defaults to the table's primary key. Required for views. |
+
+### Search Configuration
+
+The `search` object tunes how documents are retrieved before being
+passed to the LLM. The following table describes the search
+configuration fields:
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `hybrid_enabled` | boolean | `true` | Enable hybrid search combining vector similarity and BM25 keyword matching. Set to `false` for vector-only search. |
+| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0-1.0). Higher values prioritize semantic relevance. |
+
+### Defaults Configuration
+
+The optional `defaults` object sets fallback values applied to any
+pipeline that does not specify its own `token_budget` or `top_n`. The
+following table describes the defaults configuration fields:
+
+| Field | Type | Description |
+|---|---|---|
+| `defaults.token_budget` | integer | Default max tokens for context documents. Must be a positive integer. |
+| `defaults.top_n` | integer | Default number of documents to retrieve. Must be a positive integer. |
+
+## Preparing the Database
+
+Before deploying a RAG service, you must prepare your PostgreSQL
+database with pgvector, document tables, and indexes. The Control
+Plane automatically executes these during database creation when you
+include them in the `scripts.post_database_create` array in your
+database specification.
+
+### Required Schema
+
+Include the following SQL statements in `scripts.post_database_create`
+to automatically initialize the database schema during creation:
+
+```sql
+-- Enable pgvector extension
+CREATE EXTENSION IF NOT EXISTS vector;
+
+-- Create documents table with embeddings
+CREATE TABLE IF NOT EXISTS documents_content_chunks (
+    id BIGSERIAL PRIMARY KEY,
+    content TEXT NOT NULL,
+    embedding vector(1536),
+    title TEXT,
+    source TEXT,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- HNSW index for vector similarity search
+CREATE INDEX IF NOT EXISTS documents_embedding_idx
+    ON documents_content_chunks USING hnsw (embedding vector_cosine_ops);
+
+-- GIN index for keyword search (BM25)
+CREATE INDEX IF NOT EXISTS documents_content_idx
+    ON documents_content_chunks USING gin (to_tsvector('english', content));
+```
+
+These statements are included as individual entries in the
+`scripts.post_database_create` array (see examples below).
+
+### Vector Dimensions
+
+Adjust the `vector(N)` dimension to match your embedding model. The
+following table shows common models and their vector dimensions:
+
+| Provider | Model | Dimensions |
+|----------|-------|-----------|
+| OpenAI | `text-embedding-3-small` | 1536 |
+| OpenAI | `text-embedding-3-large` | 3072 |
+| Voyage AI | `voyage-3` / `voyage-3-large` | 1024 |
+| Ollama | `nomic-embed-text` | 768 |
+| Ollama | Other models | Check model documentation |
+
+## Examples
+
+The following examples show how to configure the RAG service for
+common use cases. The first example includes the complete
+`scripts.post_database_create` setup to automatically provision the
+database schema (pgvector extension, tables, and indexes) using
+`vector(1536)` for OpenAI embeddings. Subsequent examples focus on
+service configuration variations and omit the schema setup for brevity.
+If you use a different embedding model, adjust the `vector(N)` dimension
+in your schema to match - for example, `vector(1024)` for `voyage-3` or
+`vector(768)` for `nomic-embed-text`.
+
+### Minimal (OpenAI + Anthropic)
+
+In the following example, a `curl` command provisions a RAG service
+that uses OpenAI for embeddings and Anthropic Claude to generate answers:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "port": 5432,
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "scripts": {
+                    "post_database_create": [
+                        "CREATE EXTENSION IF NOT EXISTS vector",
+                        "CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)",
+                        "CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)",
+                        "CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))"
+                    ]
+                },
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "description": "Main RAG pipeline",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "token_budget": 4000,
+                                    "top_n": 10
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### OpenAI End-to-End
+
+In the following example, OpenAI is used for both embeddings and to generate
+answers:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "openai",
+                                        "model": "gpt-4o",
+                                        "api_key": "sk-..."
+                                    }
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Voyage AI with Vector-Only Search
+
+In the following example, Voyage AI is used for embeddings and the
+service is configured for vector-only search (disabling BM25 keyword
+matching):
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "voyage",
+                                        "model": "voyage-3",
+                                        "api_key": "pa-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "search": {
+                                        "hybrid_enabled": false
+                                    }
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Ollama (Self-Hosted)
+
+In the following example, the RAG service uses a self-hosted Ollama
+server for both embeddings and answer generation. No API key is
+required; the Ollama server URL is provided via `base_url`:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "ollama",
+                                        "model": "nomic-embed-text",
+                                        "base_url": "http://ollama-host:11434"
+                                    },
+                                    "rag_llm": {
+                                        "provider": "ollama",
+                                        "model": "llama3.2",
+                                        "base_url": "http://ollama-host:11434"
+                                    }
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Multiple Pipelines with Shared Defaults
+
+In the following example, two pipelines share default values for
+`token_budget` and `top_n`, set with the `defaults` properties:
+
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "defaults": {
+                                "token_budget": 4000,
+                                "top_n": 10
+                            },
+                            "pipelines": [
+                                {
+                                    "name": "docs",
+                                    "description": "Product documentation",
+                                    "tables": [
+                                        {
+                                            "table": "doc_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-..."
+                                    }
+                                },
+                                {
+                                    "name": "support",
+                                    "description": "Support ticket history",
+                                    "tables": [
+                                        {
+                                            "table": "ticket_chunks",
+                                            "text_column": "body",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "top_n": 5
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+## Deployment Guide
+
+This section shows the complete flow from database creation to a
+working pipeline query.
+
+### Step 1 - Create the Database
+
+Include `scripts.post_database_create` to automatically provision the
+pgvector schema during database creation. This avoids any manual setup
+after deployment. Use a fixed `port` value for the RAG service so the
+URL stays stable across container restarts.
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    },
+                    {
+                        "username": "app_read_only",
+                        "password": "readonly_password",
+                        "attributes": ["LOGIN"]
+                    }
+                ],
+                "port": 5432,
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "scripts": {
+                    "post_database_create": [
+                        "CREATE EXTENSION IF NOT EXISTS vector",
+                        "CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)",
+                        "CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)",
+                        "CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))",
+                        "GRANT SELECT ON documents_content_chunks TO app_read_only"
+                    ]
+                },
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "app_read_only",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "description": "Main RAG pipeline",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "token_budget": 4000,
+                                    "top_n": 10
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Step 2 - Check the Database and Service Status
+
+Run the following command after approximately 60-90 seconds to check
+that the database is ready and the RAG service is running:
+
+=== "curl"
+
+    ```sh
+    curl -s http://host-1:3000/v1/databases/knowledge-base
+    ```
+
+In the response, look for the following items:
+
+- The `state: "available"` field at the top level confirms that the
+  database is provisioned and healthy.
+- The `service_ready: true` field inside `service_instances[].status`
+  confirms that the RAG container is up and accepting requests.
+
+```text
+{
+  state: "available"
+  instances: [
+    {
+      state: "available"
+      postgres: {
+        patroni_state: "running"
+        role: "primary"
+      }
+    }
+  ]
+  service_instances: [
+    {
+      state: "running"
+      status: {
+        service_ready: true
+        ports: [
+          {
+            container_port: 8080
+            host_port: 9200
+            name: "tcp"
+          }
+        ]
+        last_health_at: "2026-04-22T10:00:00Z"
+      }
+    }
+  ]
+}
+```
+
+The `host_port` value is the port to use when querying the RAG
+service. If you used a fixed `port: 9200` in the service spec, the
+host port will always be `9200`.
+
+!!! tip
+    Use a fixed `port` value (e.g. `9200`) in the service spec rather
+    than `port: 0`. When `port: 0` is used, Docker assigns a random
+    host port that changes each time the RAG container is replaced
+    (e.g. after an API key update), requiring you to look up the new
+    port each time.
+
+### Step 3 - Load Documents
+
+The RAG service needs documents with embeddings in the database before
+it can answer queries. The following Python script generates embeddings
+using OpenAI and inserts them into `documents_content_chunks`:
+
+```python
+#!/usr/bin/env python3
+import psycopg2
+from psycopg2.extras import execute_values
+from openai import OpenAI
+import os
+
+client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
+conn = psycopg2.connect(
+    host=os.environ.get("DB_HOST", "host-1"),
+    port=int(os.environ.get("DB_PORT", "5432")),
+    user=os.environ.get("DB_USER", "admin"),
+    password=os.environ.get("DB_PASSWORD", "admin_password"),
+    database=os.environ.get("DB_NAME", "knowledge_base"),
+)
+cur = conn.cursor()
+
+documents = [
+    {"title": "My Doc", "content": "Full document text goes here...", "source": "docs"},
+]
+
+def chunk_text(text, size=500, overlap=50):
+    return [text[i:i+size] for i in range(0, len(text), size-overlap) if text[i:i+size].strip()]
+
+for doc in documents:
+    chunks = chunk_text(doc["content"])
+    resp = client.embeddings.create(model="text-embedding-3-small", input=chunks)
+    embeddings = [item.embedding for item in resp.data]
+    execute_values(cur,
+        "INSERT INTO documents_content_chunks (content, embedding, title, source) VALUES %s",
+        [(c, e, doc["title"], doc["source"]) for c, e in zip(chunks, embeddings)],
+    )
+    conn.commit()
+    print(f"Loaded {len(chunks)} chunks from '{doc['title']}'")
+
+cur.close()
+conn.close()
+```
+
+Install the dependencies and run the script with the following
+commands:
+
+```bash
+pip install psycopg2-binary openai
+export OPENAI_API_KEY="sk-..."
+export DB_HOST="host-1"
+export DB_USER="admin"
+export DB_PASSWORD="admin_password"
+export DB_NAME="knowledge_base"
+python3 load_documents.py
+```
+
+To verify that documents were inserted, run the following query:
+
+```bash
+psql "postgresql://admin:admin_password@host-1:5432/knowledge_base" \
+  -c "SELECT COUNT(*), COUNT(embedding) FROM documents_content_chunks;"
+```
+
+### Step 4 - Query the Pipeline
+
+Send a query to the RAG service using the following command:
+
+```bash
+curl -X POST http://host-1:9200/v1/pipelines/default \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "How does multi-active replication work?",
+    "include_sources": true
+  }'
+```
+
+A successful response looks like this:
+
+```json
+{
+    "answer": "Multi-active replication allows multiple PostgreSQL nodes to accept writes simultaneously...",
+    "sources": [
+        {"id": "5", "content": "...", "score": 0.00820},
+        {"id": "1", "content": "...", "score": 0.00806}
+    ],
+    "tokens_used": 1243
+}
+```
+
+`sources` is only populated when `include_sources: true` is set in
+the request.
+
+### Step 5 - Update the Service Config
+
+To update the service (for example, to rotate an API key or change
+the LLM model), submit a `POST /v1/databases/{id}` with the complete
+updated spec. The update endpoint requires all fields - include
+`database_name`, `nodes`, `database_users`, and the full `services`
+array:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases/knowledge-base \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "spec": {
+                "database_name": "knowledge_base",
+                "port": 5432,
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    },
+                    {
+                        "username": "app_read_only",
+                        "password": "readonly_password",
+                        "attributes": ["LOGIN"]
+                    }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "app_read_only",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-NEW-KEY"
+                                    },
+                                    "token_budget": 4000,
+                                    "top_n": 10
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+The RAG service container is replaced with the new configuration.
+Poll the database status until `state` is `"available"` and
+`service_ready` is `true` before sending queries.
+
+## Querying the RAG Service
+
+Once the service is running, submit queries to retrieve answers based
+on your documents.
+
+### List Available Pipelines
+
+To list all configured pipelines, send the following request:
+
+=== "curl"
+
+    ```bash
+    curl http://host-1:9200/v1/pipelines
+    ```
+
+### Query a Pipeline
+
+To submit a query to a pipeline, send a POST request with the query
+text:
+
+=== "curl"
+
+    ```bash
+    curl -X POST http://host-1:9200/v1/pipelines/default \
+      -H "Content-Type: application/json" \
+      -d '{
+        "query": "How does RAG improve LLM responses?",
+        "include_sources": true
+      }'
+    ```
+
+### Request Fields
+
+The following table describes the query request fields:
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `query` | string | - | Required. The natural language question to answer. |
+| `include_sources` | boolean | `false` | Return the source documents used to generate the answer. |
+| `top_n` | integer | - | Override the pipeline's `top_n` for this request. |
+| `stream` | boolean | `false` | Stream the answer as Server-Sent Events. |
+
+### Response Format
+
+A successful query response looks like this:
+
+```json
+{
+    "answer": "RAG (Retrieval-Augmented Generation) improves LLM responses by retrieving relevant documents from your database before generating answers. This grounds the LLM in your specific data, reducing hallucinations and improving accuracy...",
+    "sources": [
+        {
+            "id": "42",
+            "content": "The RAG service enables retrieval-augmented generation workflows...",
+            "score": 0.00820
+        }
+    ],
+    "tokens_used": 1243
+}
+```
+
+`sources` is only populated when `include_sources` is `true` in the
+request.
+
+The RAG service's hybrid search combines two complementary techniques,
+merged using Reciprocal Rank Fusion (RRF):
+
+- Vector similarity search retrieves documents semantically similar to
+  the query using cosine distance on embeddings.
+- BM25 keyword search retrieves documents with exact keyword matches
+  using TF-IDF scoring.
+
+This combination ensures the LLM receives context that is both
+semantically relevant and keyword-relevant. Documents appearing in
+both result sets receive higher scores, naturally prioritizing
+highly-relevant results.
+
+### Token Budget
+
+The `token_budget` field controls how much context is sent to the LLM.
+The service ranks documents and packs them in order until the budget
+is exhausted. The final document is truncated at a sentence boundary.
+Increase the budget to send more context, or decrease it to reduce
+LLM costs.
+
+## Troubleshooting
+
+The following sections describe common issues and how to resolve them.
+
+### About Automated Scripts
+
+The `scripts.post_database_create` field executes SQL automatically
+during database creation. The following details apply:
+
+| Property | Details |
+|---|---|
+| Execution timing | Scripts run once, immediately after Spock is initialized. |
+| Transactional | All statements execute within a single transaction. |
+| No re-execution | If you update the database spec later, scripts are not re-run. |
+| Constraints | Some SQL commands are not allowed within transactions, including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, `CREATE DATABASE`, and `DROP DATABASE`. |
+
+If a script fails during database creation, you can use
+`update-database` to retry after fixing the problematic statement.
+
+### Service Fails to Start
+
+To diagnose a service that fails to start, check database
+connectivity and user permissions.
+
+To verify that the database is accessible, run the following command:
+
+```bash
+psql -h host-1 -U admin -d knowledge_base -c "SELECT 1"
+```
+
+To verify that the service user (`app_read_only`) exists and has table
+access, run the following query:
+
+```sql
+\du+ app_read_only
+\dt documents_content_chunks
+```
+
+### Poor Query Results
+
+To diagnose poor query results, verify that documents are loaded and
+embeddings are present.
+
+To check document counts and embedding coverage, run the following
+queries:
+
+```sql
+SELECT COUNT(*) FROM documents_content_chunks;
+
+SELECT COUNT(*) FROM documents_content_chunks WHERE embedding IS NOT NULL;
+```
+
+To find documents similar to a test query embedding, run the following
+query:
+
+```sql
+SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
+FROM documents_content_chunks
+ORDER BY similarity DESC
+LIMIT 5;
+```
+
+Start with factual, keyword-based questions before complex analytical
+questions to verify that the pipeline is working correctly.
+
+### Empty Context Window
+
+If the RAG service returns limited context, the token budget may be
+exhausted. Increase the budget in the pipeline configuration:
+
+```json
+"token_budget": 8000
+```
+
+Alternatively, store smaller, more focused document chunks to fit more
+context within the budget.
+
+## Responsibility Summary
+
+The following table summarizes which tasks are handled by the Control
+Plane and which are your responsibility:
+
+| Step | Who | How |
+|---|---|---|
+| Provision schema (pgvector, tables, indexes) | Control Plane | `scripts.post_database_create` in database spec |
+| Deploy RAG container | Control Plane | Automatic on `POST /v1/databases` |
+| Inject database credentials | Control Plane | Automatic via `connect_as` field |
+| Health monitoring and restart | Control Plane | Automatic |
+| Generate embeddings | You | Call OpenAI / Voyage / Ollama API |
+| Load documents into table | You | `INSERT` using psycopg2 or any Postgres client |
+| Submit queries | Your application | `POST /v1/pipelines/{name}` on the RAG service |
+
+## Next Steps
+
+The following resources provide more information on related topics:
+
+- The [Managing Services](managing.md) guide describes how to add,
+  update, and remove services.
+- The [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server)
+  repository contains the pgEdge RAG Server source code.
+- The [pgEdge RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/)
+  covers the pgEdge RAG Server API and configuration in detail.
+- The [pgvector Documentation](https://github.com/pgvector/pgvector)
+  explains how to install and use the pgvector extension.