Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 1 addition & 6 deletions .hydra_config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ defaults:
- retriever: ${oc.env:RETRIEVER_TYPE, single} # single # multiQuery # hyde
- rag: ChatBotRag
- websearch: ${oc.env:WEBSEARCH_PROVIDER, staan}
- reranker: ${oc.env:RERANKER_PROVIDER, infinity}

llm_params: &llm_params
temperature: 0.1
Expand Down Expand Up @@ -49,12 +50,6 @@ rdb:
password: ${oc.env:POSTGRES_PASSWORD, root_password}
default_file_quota: ${oc.decode:${oc.env:DEFAULT_FILE_QUOTA, -1}}

reranker:
enable: ${oc.decode:${oc.env:RERANKER_ENABLED, true}}
model_name: ${oc.env:RERANKER_MODEL, Alibaba-NLP/gte-multilingual-reranker-base}
top_k: ${oc.decode:${oc.env:RERANKER_TOP_K, 10}} # Number of documents to return after reranking. Upgrade for better results if your llm has a wider context window.
base_url: ${oc.env:RERANKER_BASE_URL, http://reranker:${oc.env:RERANKER_PORT, 7997}}

map_reduce:
# Number of documents to process in the initial mapping phase
initial_batch_size: ${oc.decode:${oc.env:MAP_REDUCE_INITIAL_BATCH_SIZE, 10}}
Expand Down
6 changes: 6 additions & 0 deletions .hydra_config/reranker/base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
provider: ""
model_name: ${oc.env:RERANKER_MODEL, Alibaba-NLP/gte-multilingual-reranker-base}
top_k: ${oc.decode:${oc.env:RERANKER_TOP_K, 10}} # Number of documents to return after reranking. Upgrade for better results if your llm has a wider context window.
base_url: ${oc.env:RERANKER_BASE_URL, http://reranker:${oc.env:RERANKER_PORT, 7997}}
semaphore: ${oc.decode:${oc.env:RERANKER_SEMAPHORE, 40}} # Number of concurrent reranking operations. Adjust based on your server capacity.
enabled: ${oc.decode:${oc.env:RERANKER_ENABLED, true}}
5 changes: 5 additions & 0 deletions .hydra_config/reranker/infinity.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
defaults:
- base

provider: infinity
base_url: ${oc.env:RERANKER_BASE_URL, http://reranker:${oc.env:RERANKER_PORT, 7997}}
6 changes: 6 additions & 0 deletions .hydra_config/reranker/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
defaults:
- base

provider: openai
api_key: ${oc.env:RERANKER_API_KEY, "EMPTY"}
base_url: ${oc.env:RERANKER_BASE_URL, http://reranker:${oc.env:RERANKER_PORT, 8000}}
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
include:
- vdb/milvus.yaml
- ${CHAINLIT_DATALAYER_COMPOSE:-extern/dummy.yaml}
- extern/infinity.yaml
- extern/reranker/${RERANKER_PROVIDER:-infinity}.yaml
- ${TRANSCRIBER_COMPOSE:-extern/dummy.yaml}

x-openrag: &openrag_template
Expand Down
20 changes: 13 additions & 7 deletions docs/content/docs/documentation/env_vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,19 +229,25 @@ The retriever fetches relevant documents from the vector database based on query

### Reranker Configuration

The reranker enhances search quality by re-scoring and reordering retrieved documents according to their relevance to the user's query. Currently, the system uses [Infinity server](https://github.com/michaelfeil/infinity) for reranking functionality.

:::info[Future Improvements]
The current Infinity server interface is not OpenAI-compatible, which limits integration flexibility. We plan to improve this by supporting OpenAI-compatible reranker interfaces in future releases.
:::
The reranker enhances search quality by re-scoring and reordering retrieved documents according to their relevance to the user's query. Two providers are supported: **Infinity** (default) and **OpenAI-compatible** endpoints.

| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `RERANKER_ENABLED` | `bool` | true | Enable or disable the reranking mechanism |
| `RERANKER_PROVIDER` | `str` | `infinity` | Reranker backend to use. Accepted values: `infinity`, `openai` |
| `RERANKER_MODEL` | `str` | Alibaba-NLP/gte-multilingual-reranker-base | Model used for reranking documents.|
| `RERANKER_TOP_K` | `int` | 5 | Number of top documents to return after reranking. Increase to 8 for better results if your LLM has a wider context window |
| `RERANKER_TOP_K` | `int` | 10 | Number of top documents to return after reranking. Increase for better results if your LLM has a wider context window |
| `RERANKER_BASE_URL` | `str` | `http://reranker:7997` | Base URL of the reranker service |
| `RERANKER_PORT` | `int` | 7997 | Port on which the reranker service listens |
| `RERANKER_PORT` | `int` | 7997 | Port on which the reranker service listens (used as fallback in `RERANKER_BASE_URL` when no URL is set) |
| `RERANKER_API_KEY` | `str` | `EMPTY` | API key for the reranker service. Required when using the `openai` provider |
| `RERANKER_SEMAPHORE` | `int` | 40 | Maximum number of concurrent reranking requests. Adjust based on your server capacity |

#### Reranker Providers

| Provider | `RERANKER_PROVIDER` value | Description |
|----------|--------------------------|-------------|
| **Infinity** | `infinity` | Uses the [Infinity server](https://github.com/michaelfeil/infinity) via its native client. Default port: `7997` |
| **OpenAI-compatible** | `openai` | Uses any OpenAI-compatible reranker endpoint (e.g. vLLM, LiteLLM, TEI). Default port: `8000` |

## Extra
### Prompts
Expand Down
File renamed without changes.
61 changes: 61 additions & 0 deletions extern/reranker/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
x-vllm-env: &vllm_env
HUGGING_FACE_HUB_TOKEN:
VLLM_SLEEP_WHEN_IDLE: 1 # Avoid 100% CPU usage when idle

x-reranker: &reranker_template
networks:
default:
aliases:
- reranker
# restart: on-failure
environment:
- HUGGING_FACE_HUB_TOKEN
- VLLM_SLEEP_WHEN_IDLE=1 # Avoid 100% CPU usage when idle
ipc: "host"
volumes:
- ${VLLM_CACHE:-/root/.cache/huggingface}:/root/.cache/huggingface
command: >
--model ${RERANKER_MODEL:-BAAI/bge-reranker-v2-m3}
--trust-remote-code
--gpu_memory_utilization 0.3
Comment on lines +17 to +20
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect RERANKER_PORT usage and vLLM listen-port flags"
rg -n 'RERANKER_PORT|--port|ports:' extern/reranker/openai.yaml .hydra_config/reranker/openai.yaml

Repository: linagora/openrag

Length of output: 343


🏁 Script executed:

cat -n extern/reranker/openai.yaml

Repository: linagora/openrag

Length of output: 1962


🏁 Script executed:

# Check vLLM documentation or code comments in the repo
rg -i 'port|listen' extern/reranker/ --context 3

Repository: linagora/openrag

Length of output: 1628


🏁 Script executed:

# Check how RERANKER_PORT is used in configs and environment
rg -n 'RERANKER_PORT' --context 2

Repository: linagora/openrag

Length of output: 3180


RERANKER_PORT override breaks service connectivity when overridden.

The openai.yaml vLLM service lacks a --port flag in its commands (lines 17-20 and 56-59), so it always listens on port 8000 internally. However, the port mapping on line 28 (${RERANKER_PORT:-8003}:8000) changes when RERANKER_PORT is set to a non-default value. When overridden, the port mapping remaps the container's port 8000 to the host, but the Hydra configuration constructs URLs using the overridden port number, causing a mismatch. Compare this to the infinity.yaml reranker variant, which correctly passes --port ${RERANKER_PORT:-7997} to the service.

Add the --port flag to both reranker command definitions and update the port mapping to maintain port alignment across the stack.

Proposed fix
   command: >
     --model ${RERANKER_MODEL:-BAAI/bge-reranker-v2-m3}
     --trust-remote-code
     --gpu_memory_utilization 0.3
+    --port ${RERANKER_PORT:-8000}
   ports:
-      - ${RERANKER_PORT:-8003}:8000
+      - ${RERANKER_PORT:-8003}:${RERANKER_PORT:-8000}

Apply the same change to the reranker-cpu command block (lines 56-59).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extern/reranker/openai.yaml` around lines 17 - 20, The service listens on
container port 8000 but the Docker port mapping uses ${RERANKER_PORT:-8003},
causing mismatch when RERANKER_PORT is overridden; update the OpenAI vLLM
service command blocks in openai.yaml to include a --port flag using the same
variable (e.g., add --port ${RERANKER_PORT:-8003}) in both the main reranker
command and the reranker-cpu command so the container binds to the same
overridable port used in the mapping and Hydra URLs, ensuring port alignment
across RERANKER_PORT, the command lines, and the port mapping.

healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 20s
timeout: 5s
retries: 4
start_period: 90s
ports:
- ${RERANKER_PORT:-8003}:8000

services:
reranker-gpu:
<<: *reranker_template
image: vllm/vllm-openai:v0.17.1
environment:
<<: *vllm_env
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility
runtime: nvidia
profiles:
- ""
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

reranker-cpu:
<<: *reranker_template
image: vllm/vllm-openai-cpu:v0.17.1
deploy: {}
environment:
<<: *vllm_env
VLLM_CPU_KVCACHE_SPACE: 8
command: >
--model ${RERANKER_MODEL:-BAAI/bge-reranker-v2-m3}
--trust-remote-code
--dtype float32
profiles:
- "cpu"
1 change: 1 addition & 0 deletions openrag/app_front.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ async def chat_profile(current_user: cl.User):
name=m.id,
markdown_description=description_template.format(name=m.id, partition=partition),
icon="/public/favicon.svg",
default=m.id == "openrag-all",
)
)
return chat_profiles
Expand Down
8 changes: 4 additions & 4 deletions openrag/components/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

from .llm import LLM
from .map_reduce import RAGMapReduce
from .reranker import Reranker
from .reranker import BaseReranker, RerankerFactory
from .retriever import BaseRetriever, RetrieverFactory
from .utils import SOURCE_SEPARATOR

Expand Down Expand Up @@ -49,9 +49,9 @@ def __init__(self) -> None:
self.retriever: BaseRetriever = RetrieverFactory.create_retriever(config=config)

# reranker
self.reranker_enabled = config.reranker["enable"]
self.reranker = Reranker(logger, config)
logger.debug("Reranker", enabled=self.reranker_enabled)
self.reranker_enabled = config.reranker.get("enabled", True)
self.reranker: BaseReranker = RerankerFactory.get_reranker(config)
logger.debug("Reranker", enabled=self.reranker_enabled, provider=config.reranker.provider)
self.reranker_top_k = config.reranker["top_k"]

async def retrieve_docs(
Expand Down
82 changes: 0 additions & 82 deletions openrag/components/reranker.py

This file was deleted.

20 changes: 20 additions & 0 deletions openrag/components/reranker/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from .base import BaseReranker
from .infinity import InfinityReranker
from .openai import OpenAIReranker

RERANKER_MAPPING = {
"infinity": InfinityReranker,
"openai": OpenAIReranker,
}


class RerankerFactory:
@staticmethod
def get_reranker(config: dict) -> BaseReranker:
provider = config.reranker.get("provider")
reranker_class = RERANKER_MAPPING.get(provider, None)

if not reranker_class:
raise ValueError(f"Unsupported reranker provider: {provider}")

return reranker_class(config)
37 changes: 37 additions & 0 deletions openrag/components/reranker/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
from langchain_core.documents.base import Document


class BaseReranker:
async def rerank(self, query: str, documents: list[Document], top_k: int | None = None) -> list[Document]:
"""Rerank a list of documents based on a query and an optional top_k parameter"""
raise NotImplementedError("Rerank method must be implemented by subclasses")

@staticmethod
def rrf_reranking(doc_lists: list[list], k: int = 60) -> list[Document]:
"""Reciprocal_rank_fusion that takes multiple lists of ranked documents
and an optional parameter k used in the RRF formula
RRF formula: \\sum_{i=1}^{n} \frac{1}{k + rank_i}
where rank_i is the rank of the document in the i-th list and n is the number of lists.

k small: High sensitivity to top ranks
k large: More balanced sensitivity across ranks
k = 60 a common and balanced choice in practice.
"""

if len(doc_lists) == 1:
return doc_lists[0]

# Initialize a dictionary to hold fused scores for each unique document
fused_scores = {}

for doc_list in doc_lists:
doc_list: list[Document]
for rank, doc in enumerate(doc_list, start=1):
doc_id = doc.metadata.get("_id")

score, d = fused_scores.get(doc_id, (0, doc))
fused_scores[doc_id] = (score + 1 / (rank + k), d)

# sort the docs
reranked_docs = [doc for _, doc in sorted(fused_scores.values(), key=lambda x: x[0], reverse=True)]
return reranked_docs
52 changes: 52 additions & 0 deletions openrag/components/reranker/infinity.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import asyncio

from infinity_client import Client
from infinity_client.api.default import rerank
from infinity_client.models import RerankInput, ReRankResult
from langchain_core.documents.base import Document
from utils.logger import get_logger
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use absolute import from openrag/ directory.

Per coding guidelines, imports should use absolute paths from the openrag/ directory.

-from utils.logger import get_logger
+from openrag.utils.logger import get_logger
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from utils.logger import get_logger
from openrag.utils.logger import get_logger
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openrag/components/reranker/infinity.py` at line 7, Change the relative-style
import to an absolute import from the openrag package: replace the current
import of get_logger in infinity.py with an absolute import that references
openrag.utils.logger (e.g., import get_logger from openrag.utils.logger) so the
symbol get_logger is imported via the project's top-level package name per
coding guidelines.


from .base import BaseReranker

logger = get_logger()


class InfinityReranker(BaseReranker):
def __init__(self, config):
self.model_name = config.reranker["model_name"]
self.client = Client(base_url=config.reranker["base_url"])
semaphore = config.reranker.get("semaphore", 40)
self.semaphore = asyncio.Semaphore(semaphore)
logger.debug("Reranker initialized", model_name=self.model_name)

async def rerank(self, query: str, documents: list[Document], top_k: int | None = None) -> list[Document]:
async with self.semaphore:
logger.debug("Reranking documents", documents_count=len(documents), top_k=top_k)
top_k = min(top_k, len(documents)) if top_k is not None else len(documents)
rerank_input = RerankInput.from_dict(
{
"model": self.model_name,
"query": query,
"documents": [doc.page_content for doc in documents],
"top_n": top_k,
"return_documents": True,
"raw_scores": True, # Normalized score between 0 and 1
}
)
try:
rerank_result: ReRankResult = await rerank.asyncio(client=self.client, body=rerank_input)
output = []
for rerank_res in rerank_result.results:
doc = documents[rerank_res.index]
doc.metadata["relevance_score"] = rerank_res.relevance_score
output.append(doc)
return output

except Exception as e:
logger.error(
"Reranking failed",
error=str(e),
model_name=self.model_name,
documents_count=len(documents),
)
raise e
Loading
Loading