Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .hydra_config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ vectordb:
collection_name: ${oc.env:VDB_COLLECTION_NAME, vdb_test}
hybrid_search: ${oc.env:VDB_HYBRID_SEARCH, true}
enable: true
schema_version: 1 # Increment when the collection schema changes and a migration is required

rdb:
host: ${oc.env:POSTGRES_HOST, rdb}
Expand Down
18 changes: 17 additions & 1 deletion docs/content/docs/documentation/API.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,18 @@ Upload a new file to a specific partition for indexing.
- `201 Created`: Returns task status URL
- `409 Conflict`: File already exists in partition

##### Temporal Filtering
OpenRAG supports temporal filtering to retrieve documents from specific time periods.
The client can include the temporal field to allow temporal-aware search in search endpoints.

* `created_at`: ISO 8601 format date of when the file was created

:::info
`created_at` is provided by the client in the metadata of the file during upload.
This is a first iteration — additional temporal fields (e.g. `updated_at`) may be added in future releases as needed.
:::


##### Upload files while modeling relations between them

OpenRAG supports document relationships to enable context-aware retrieval.
Expand Down Expand Up @@ -202,6 +214,8 @@ Perform semantic search across specified partitions.
| `include_related` (optional) | boolean | `false` | Include chunks from files with same `relationship_id` |
| `include_ancestors` (optional) | boolean | `false` | Include chunks from ancestor files (via `parent_id` chain) |
| `related_limit` (optional) | integer | 20 | Max related/ancestor chunks to fetch per result (used when `include_related` or `include_ancestors` is true) |
| `filter` (optional) | string | None | Milvus filter expression string for additional filtering (optional). Supports comparison, range, and logical operators. |
| `filter_params` (optional) | string (JSON) | None | JSON-encoded dictionary of parameter values for templated filters (URL-encode the JSON). |

**Responses:**
- `200 OK`: JSON list of document links (HATEOAS format)
Expand All @@ -223,6 +237,8 @@ Search within a specific partition only.
| `include_related` (optional) | boolean | `false` | Include chunks from files with same `relationship_id` |
| `include_ancestors` (optional) | boolean | `false` | Include chunks from ancestor files (via `parent_id` chain) |
| `related_limit` (optional) | integer | 20 | Max related/ancestor chunks to fetch per result (used when `include_related` or `include_ancestors` is true) |
| `filter` (optional) | string | None | Milvus filter expression string for additional filtering (optional). Supports comparison, range, and logical operators. |
| `filter_params` (optional) | string (JSON) | None | JSON-encoded dictionary of parameter values for templated filters (URL-encode the JSON). |

**Response:** Same as multi-partition search

Expand All @@ -233,7 +249,7 @@ GET /search/partition/{partition}/file/{file_id}

Search within a particular file in a partition.

**Query Parameters:** Same as partition search
**Query Parameters:** Same as partition search, including `filter` and `filter_params`.
**Response:** Same as other search endpoints

---
Expand Down
176 changes: 176 additions & 0 deletions docs/content/docs/documentation/milvus_migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
title: Milvus Migrations
---

# Milvus Upgrade
OpenRAG has been upgraded from Milvus **2.5.4** to **2.6.11** to leverage the enhancements introduced in the latest releases, particularly the new temporal querying capabilities added in version **2.6.6+**.

## What's New in 2.6.x

Milvus 2.6.6+ introduced the **`TIMESTAMPTZ`** field type, which enables:

- **Comparison and range filtering** using standard operators (`=`, `!=`, `<`, `>`, etc.)
- **Interval arithmetic** — add or subtract durations (days, hours, minutes) directly in filter expressions
- **Time-based indexing** for faster temporal queries
- **Combined filtering** — pair timestamp conditions with vector similarity search

**Example — basic comparison:**
```python
expr = "tsz != ISO '2025-01-03T00:00:00+08:00'"
results = client.query(
collection_name,
filter=expr,
output_fields=["id", "tsz"],
limit=10
)
```

**Example — interval arithmetic:**
```python
expr = "tsz + INTERVAL 'P1D' > ISO '2025-01-03T00:00:00+08:00'"
results = client.query(
collection_name,
filter=expr,
output_fields=["id", "tsz"],
limit=10
)
```

> `INTERVAL` values follow [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) syntax:
> * `P1D` = 1 day
> * `PT3H` = 3 hours
> * `P2DT6H` = 2 days and 6 hours.

## Milvus version upgrade Steps
:::danger[Before running Milvus Version Migration]
These steps must be performed on a deployment running OpenRAG **prior to version 1.1.6** (Milvus 2.5.4) before switching to the newest version of OpenRAG.
:::

> For the full official reference, see the [Milvus upgrade guide](https://milvus.io/docs/upgrade_milvus_standalone-docker.md#Upgrade-process).

### Step 1 — Upgrade to Milvus 2.5.16 first

Milvus requires an intermediate upgrade to **v2.5.16** before jumping to 2.6.x.

Edit `vdb/milvus.yaml` and set the Milvus image tag:

```diff lang=yaml
// vdb/milvus.yaml
milvus:
- image: milvusdb/milvus:v2.5.4
+ image: milvusdb/milvus:v2.5.16 # Migrate to milvus 2.5.16
```

Then restart the stack:

```bash
docker compose down
docker compose up --build milvus -d
```

Wait for all services to be healthy before continuing.

### Step 2 — Upgrade to Milvus 2.6.11

Update `vdb/milvus.yaml` with the target versions (MinIO must also be updated for compatibility):

```diff lang=yaml
// vdb/milvus.yaml
minio:
- image: minio/minio:RELEASE.2023-03-20T20-16-18Z
+ image: minio/minio:RELEASE.2024-12-18T13-15-44Z

...
milvus:
- image: milvusdb/milvus:v2.5.16
+ image: milvusdb/milvus:v2.6.11
```

### Step 3 — Stop all services

```bash
docker compose down
```

Verify that all containers are stopped before proceeding:

```bash
docker ps | grep milvus
```

### Step 4 — Start with the new image

```bash
docker compose up -d
```

Once healthy, confirm the running version:

```bash
docker inspect milvus-standalone --format '{{ .Config.Image }}'
# Expected: milvusdb/milvus:v2.6.11
```

Now you can switch to the newest release of OpenRAG and it should work fine.

## Schema Migration — Add Temporal Fields

:::info
This migration adds a `TIMESTAMPTZ` fields `created_at` and a `STL_SORT` index to an existing collection.

Existing documents will have `null` for that field; new documents will have them populated at index time.
:::

:::danger[OpenRAG must be stopped]
Stop the OpenRAG application before running this migration.
:::

### Step 1 — Start only the Milvus container

```bash
docker compose up -d milvus
```

Wait until Milvus is healthy:

```bash
docker compose ps milvus
```

### Step 2 — Dry-run (inspect, no changes)

```bash
docker compose run --no-deps --rm --build --entrypoint "" openrag \
uv run python scripts/migrations/milvus/1.add_temporal_fields.py --dry-run
```

Review the output to confirm which fields and indexes are missing.

### Step 3 — Apply the migration

```bash
docker compose run --no-deps --rm --build --entrypoint "" openrag \
uv run python scripts/migrations/milvus/1.add_temporal_fields.py
```

The script will:
1. Add any missing `TIMESTAMPTZ` fields (nullable)
2. Create `STL_SORT` indexes for each field
3. Stamp the collection with `schema_version=1` so OpenRAG no longer reports a migration error on startup

### Step 4 — Restart OpenRAG

```bash
docker compose up --build -d
```

### Rollback

Milvus does not yet support dropping fields. The rollback only removes the indexes and resets the version stamp — the fields remain in the schema but are unused:

```bash
docker compose run --no-deps --rm --build --entrypoint "" openrag \
uv run python scripts/migrations/milvus/1.add_temporal_fields.py --downgrade
```

To fully remove the fields you would need to recreate the collection from scratch.
138 changes: 138 additions & 0 deletions docs/content/docs/documentation/temporality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: Temporality
---

# Milvus representation

* As scalar field

Scalar fields store primitive, structured values—commonly referred to as metadata—such as numbers, strings, or dates.

They allow you to narrow search results based on specific attributes, like limiting documents to a particular category or a defined **time range**.

* You can set nullable=True for TIMESTAMPTZ fields to allow missing values.
* You can specify a default timestamp value using the default_value attribute in ISO 8601 format.

* format: timestamp (ISO 8601 format)
* All temporal fields are stored in ISO 8601 format

* **Automatic date extraction**

# Operation
## Add a TIMESTAMPTZ field that allows null values
* schema.add_field("tsz", DataType.TIMESTAMPTZ, nullable=True)
* You can specify a default timestamp value using the **`default_value`** attribute in **`ISO 8601` format**.


## Filtering operations

Compatible with milvus 2.6.6

* **`TIMESTAMPTZ`** supports scalar comparisons, interval arithmetic, and extraction of time components.

* **Comparison and filtering**: All filtering and ordering operations are performed in UTC, ensuring consistent and predictable results across different time zones.

* Query with timestamp filtering
* Use arithmetic operators like ==, !=, <, >, <=, >=. For a full list of arithmetic operators available in Milvus, refer to [Arithmetic Operators](https://milvus.io/docs/basic-operators.md#Arithmetic-Operators)

* timestamp filtering

```python
expr = "tsz != ISO '2025-01-03T00:00:00+08:00'"

results = client.query(
collection_name=collection_name,
filter=expr,
output_fields=["id", "tsz"],
limit=10
)

print("Query result: ", results)
```

* Interval operations
* You can perform arithmetic on TIMESTAMPTZ fields using INTERVAL values in the ISO 8601 duration format. This allows you to add or subtract durations, such as days, hours, or minutes, from a timestamp when filtering data.

```python
expr = "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'"

results = client.query(
collection_name,
filter=expr,
output_fields=["id", "tsz"],
limit=10
)

print("Query result: ", results)
```

* **`INTERVAL`** values follow the **`ISO 8601` duration** syntax. For example:
* P1D → 1 day
* PT3H → 3 hours
* P2DT6H → 2 days and 6 hours

* You can use **`INTERVAL`** arithmetic directly in filter expressions, such as:
* tsz + INTERVAL 'P3D' → Adds 3 days
* tsz - INTERVAL 'PT2H' → Subtracts 2 hours

* Search with timestamp filtering
* You can combine **`TIMESTAMPTZ`** filtering with vector similarity search to narrow results by both time and similarity.



--------

* Migration from Milvus v2.5.4 to v2.6.11
* TIMESTAMPTZ is compatible with Milvus 2.6.6+

* Migration according to the release notes for Milvus Standalone: https://milvus.io/docs/upgrade_milvus_standalone-docker.md
* `You must upgrade to v2.5.16 or later before upgrading to v2.6.11.`

* Steps for upgrading: https://milvus.io/docs/upgrade_milvus_standalone-docker.md#Upgrade-process

* Issue: I've moved from Milvs 2.5.4 to 2.6.11 following https://milvus.io/docs/upgrade_milvus_standalone-docker.md. Previous collections created in 2.5.4 can't be loaded. It runs forever.

* https://github.com/milvus-io/milvus/issues/43295

* https://www.perplexity.ai/search/i-ve-moved-from-milvs-2-5-4-to-CDHCle5hQl.qsUa_nw4WHQ




* Done successfully

-----

* Setting "datatype=DataType.TIMESTAMPTZ" datatype for the field created_at

* Search
* search_params for search https://milvus.io/api-reference/pymilvus/v2.6.x/MilvusClient/Vector/search.md#Request-syntax
* param via AnnSearchRequest: https://milvus.io/api-reference/pymilvus/v2.6.x/MilvusClient/Vector/hybrid_search.md#Request-Syntax


-----

* Finally i manage to make it work following the migration steps

* Logical operators
* Logical operators are used to combine multiple conditions into a more complex filter expression. These include AND, OR, and NOT.

* Range operators
* https://milvus.io/docs/basic-operators.md#Range-operators
* Supported Range Operators:
* IN: Used to match values within a specific set or range.
* LIKE: Used to match a pattern (mostly for text fields). Milvus allows you to build an NGRAM index on VARCHAR or JSON fields to accelerate text queries. For details, refer to [NGRAM](https://milvus.io/docs/ngram.md).


## Time

Time fields

* datetime
* modified_at
* created_at
==> Added
* indexed_at


# Reorder
2 changes: 1 addition & 1 deletion extern/indexer-ui
7 changes: 4 additions & 3 deletions openrag/components/indexer/indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,19 +224,20 @@ async def asearch(
self,
query: str,
top_k: int = 5,
similarity_threshold: float = 0.80,
similarity_threshold: float = 0.60,
partition: str | list[str] | None = None,
filter: dict | None = None,
filter: str | None = None,
filter_params: dict | None = None,
) -> list[Document]:
partition_list = self._check_partition_list(partition)
filter = filter or {}
vectordb = ray.get_actor("Vectordb", namespace="openrag")
return await vectordb.async_search.remote(
query=query,
partition=partition_list,
top_k=top_k,
similarity_threshold=similarity_threshold,
filter=filter,
filter_params=filter_params,
)

def _check_partition_str(self, partition: str | None) -> str:
Expand Down
Loading
Loading