-
Notifications
You must be signed in to change notification settings - Fork 34
BREAKING CHANGE Step 2/3: Adding temporal fields #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Ahmath-Gadji
wants to merge
3
commits into
dev
Choose a base branch
from
feat/add_temporal_fields
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
f2a49ce
BREAKING CHANGE: Upgrading Milvus from 2.5.4 to 2.6.11.
Ahmath-Gadji d42cc14
feat(api+vdb): enable advanced filtering in search endpoints and add …
Ahmath-Gadji 0f36f9c
BREAKING CHANGE: Adding a temporal field and migration script for milvus
Ahmath-Gadji File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,176 @@ | ||
| --- | ||
| title: Milvus Migrations | ||
| --- | ||
|
|
||
| # Milvus Upgrade | ||
| OpenRAG has been upgraded from Milvus **2.5.4** to **2.6.11** to leverage the enhancements introduced in the latest releases, particularly the new temporal querying capabilities added in version **2.6.6+**. | ||
|
|
||
| ## What's New in 2.6.x | ||
|
|
||
| Milvus 2.6.6+ introduced the **`TIMESTAMPTZ`** field type, which enables: | ||
|
|
||
| - **Comparison and range filtering** using standard operators (`=`, `!=`, `<`, `>`, etc.) | ||
| - **Interval arithmetic** — add or subtract durations (days, hours, minutes) directly in filter expressions | ||
| - **Time-based indexing** for faster temporal queries | ||
| - **Combined filtering** — pair timestamp conditions with vector similarity search | ||
|
|
||
| **Example — basic comparison:** | ||
| ```python | ||
| expr = "tsz != ISO '2025-01-03T00:00:00+08:00'" | ||
| results = client.query( | ||
| collection_name, | ||
| filter=expr, | ||
| output_fields=["id", "tsz"], | ||
| limit=10 | ||
| ) | ||
| ``` | ||
|
|
||
| **Example — interval arithmetic:** | ||
| ```python | ||
| expr = "tsz + INTERVAL 'P1D' > ISO '2025-01-03T00:00:00+08:00'" | ||
| results = client.query( | ||
| collection_name, | ||
| filter=expr, | ||
| output_fields=["id", "tsz"], | ||
| limit=10 | ||
| ) | ||
| ``` | ||
|
|
||
| > `INTERVAL` values follow [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) syntax: | ||
| > * `P1D` = 1 day | ||
| > * `PT3H` = 3 hours | ||
| > * `P2DT6H` = 2 days and 6 hours. | ||
|
|
||
| ## Milvus version upgrade Steps | ||
| :::danger[Before running Milvus Version Migration] | ||
| These steps must be performed on a deployment running OpenRAG **prior to version 1.1.6** (Milvus 2.5.4) before switching to the newest version of OpenRAG. | ||
| ::: | ||
|
|
||
| > For the full official reference, see the [Milvus upgrade guide](https://milvus.io/docs/upgrade_milvus_standalone-docker.md#Upgrade-process). | ||
|
|
||
| ### Step 1 — Upgrade to Milvus 2.5.16 first | ||
|
|
||
| Milvus requires an intermediate upgrade to **v2.5.16** before jumping to 2.6.x. | ||
|
|
||
| Edit `vdb/milvus.yaml` and set the Milvus image tag: | ||
|
|
||
| ```diff lang=yaml | ||
| // vdb/milvus.yaml | ||
| milvus: | ||
| - image: milvusdb/milvus:v2.5.4 | ||
| + image: milvusdb/milvus:v2.5.16 # Migrate to milvus 2.5.16 | ||
| ``` | ||
|
|
||
| Then restart the stack: | ||
|
|
||
| ```bash | ||
| docker compose down | ||
| docker compose up --build milvus -d | ||
| ``` | ||
|
|
||
| Wait for all services to be healthy before continuing. | ||
|
|
||
| ### Step 2 — Upgrade to Milvus 2.6.11 | ||
|
|
||
| Update `vdb/milvus.yaml` with the target versions (MinIO must also be updated for compatibility): | ||
|
|
||
| ```diff lang=yaml | ||
| // vdb/milvus.yaml | ||
| minio: | ||
| - image: minio/minio:RELEASE.2023-03-20T20-16-18Z | ||
| + image: minio/minio:RELEASE.2024-12-18T13-15-44Z | ||
|
|
||
| ... | ||
| milvus: | ||
| - image: milvusdb/milvus:v2.5.16 | ||
| + image: milvusdb/milvus:v2.6.11 | ||
| ``` | ||
|
|
||
| ### Step 3 — Stop all services | ||
|
|
||
| ```bash | ||
| docker compose down | ||
| ``` | ||
|
|
||
| Verify that all containers are stopped before proceeding: | ||
|
|
||
| ```bash | ||
| docker ps | grep milvus | ||
| ``` | ||
|
|
||
| ### Step 4 — Start with the new image | ||
|
|
||
| ```bash | ||
| docker compose up -d | ||
| ``` | ||
|
|
||
| Once healthy, confirm the running version: | ||
|
|
||
| ```bash | ||
| docker inspect milvus-standalone --format '{{ .Config.Image }}' | ||
| # Expected: milvusdb/milvus:v2.6.11 | ||
| ``` | ||
|
|
||
| Now you can switch to the newest release of OpenRAG and it should work fine. | ||
|
|
||
| ## Schema Migration — Add Temporal Fields | ||
|
|
||
| :::info | ||
| This migration adds a `TIMESTAMPTZ` fields `created_at` and a `STL_SORT` index to an existing collection. | ||
|
|
||
| Existing documents will have `null` for that field; new documents will have them populated at index time. | ||
| ::: | ||
|
|
||
| :::danger[OpenRAG must be stopped] | ||
| Stop the OpenRAG application before running this migration. | ||
| ::: | ||
|
|
||
| ### Step 1 — Start only the Milvus container | ||
|
|
||
| ```bash | ||
| docker compose up -d milvus | ||
| ``` | ||
|
|
||
| Wait until Milvus is healthy: | ||
|
|
||
| ```bash | ||
| docker compose ps milvus | ||
| ``` | ||
|
|
||
| ### Step 2 — Dry-run (inspect, no changes) | ||
|
|
||
| ```bash | ||
| docker compose run --no-deps --rm --build --entrypoint "" openrag \ | ||
| uv run python scripts/migrations/milvus/1.add_temporal_fields.py --dry-run | ||
| ``` | ||
|
|
||
| Review the output to confirm which fields and indexes are missing. | ||
|
|
||
| ### Step 3 — Apply the migration | ||
|
|
||
| ```bash | ||
| docker compose run --no-deps --rm --build --entrypoint "" openrag \ | ||
| uv run python scripts/migrations/milvus/1.add_temporal_fields.py | ||
| ``` | ||
|
|
||
| The script will: | ||
| 1. Add any missing `TIMESTAMPTZ` fields (nullable) | ||
| 2. Create `STL_SORT` indexes for each field | ||
| 3. Stamp the collection with `schema_version=1` so OpenRAG no longer reports a migration error on startup | ||
|
|
||
| ### Step 4 — Restart OpenRAG | ||
|
|
||
| ```bash | ||
| docker compose up --build -d | ||
| ``` | ||
|
|
||
| ### Rollback | ||
|
|
||
| Milvus does not yet support dropping fields. The rollback only removes the indexes and resets the version stamp — the fields remain in the schema but are unused: | ||
|
|
||
| ```bash | ||
| docker compose run --no-deps --rm --build --entrypoint "" openrag \ | ||
| uv run python scripts/migrations/milvus/1.add_temporal_fields.py --downgrade | ||
| ``` | ||
|
|
||
| To fully remove the fields you would need to recreate the collection from scratch. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| --- | ||
| title: Temporality | ||
| --- | ||
|
|
||
| # Milvus representation | ||
|
|
||
| * As scalar field | ||
|
|
||
| Scalar fields store primitive, structured values—commonly referred to as metadata—such as numbers, strings, or dates. | ||
|
|
||
| They allow you to narrow search results based on specific attributes, like limiting documents to a particular category or a defined **time range**. | ||
|
|
||
| * You can set nullable=True for TIMESTAMPTZ fields to allow missing values. | ||
| * You can specify a default timestamp value using the default_value attribute in ISO 8601 format. | ||
|
|
||
| * format: timestamp (ISO 8601 format) | ||
| * All temporal fields are stored in ISO 8601 format | ||
|
|
||
| * **Automatic date extraction** | ||
|
|
||
| # Operation | ||
| ## Add a TIMESTAMPTZ field that allows null values | ||
| * schema.add_field("tsz", DataType.TIMESTAMPTZ, nullable=True) | ||
| * You can specify a default timestamp value using the **`default_value`** attribute in **`ISO 8601` format**. | ||
|
|
||
|
|
||
| ## Filtering operations | ||
|
|
||
| Compatible with milvus 2.6.6 | ||
|
|
||
| * **`TIMESTAMPTZ`** supports scalar comparisons, interval arithmetic, and extraction of time components. | ||
|
|
||
| * **Comparison and filtering**: All filtering and ordering operations are performed in UTC, ensuring consistent and predictable results across different time zones. | ||
|
|
||
| * Query with timestamp filtering | ||
| * Use arithmetic operators like ==, !=, <, >, <=, >=. For a full list of arithmetic operators available in Milvus, refer to [Arithmetic Operators](https://milvus.io/docs/basic-operators.md#Arithmetic-Operators) | ||
|
|
||
| * timestamp filtering | ||
|
|
||
| ```python | ||
| expr = "tsz != ISO '2025-01-03T00:00:00+08:00'" | ||
|
|
||
| results = client.query( | ||
| collection_name=collection_name, | ||
| filter=expr, | ||
| output_fields=["id", "tsz"], | ||
| limit=10 | ||
| ) | ||
|
|
||
| print("Query result: ", results) | ||
| ``` | ||
|
|
||
| * Interval operations | ||
| * You can perform arithmetic on TIMESTAMPTZ fields using INTERVAL values in the ISO 8601 duration format. This allows you to add or subtract durations, such as days, hours, or minutes, from a timestamp when filtering data. | ||
|
|
||
| ```python | ||
| expr = "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" | ||
|
|
||
| results = client.query( | ||
| collection_name, | ||
| filter=expr, | ||
| output_fields=["id", "tsz"], | ||
| limit=10 | ||
| ) | ||
|
|
||
| print("Query result: ", results) | ||
| ``` | ||
|
|
||
| * **`INTERVAL`** values follow the **`ISO 8601` duration** syntax. For example: | ||
| * P1D → 1 day | ||
| * PT3H → 3 hours | ||
| * P2DT6H → 2 days and 6 hours | ||
|
|
||
| * You can use **`INTERVAL`** arithmetic directly in filter expressions, such as: | ||
| * tsz + INTERVAL 'P3D' → Adds 3 days | ||
| * tsz - INTERVAL 'PT2H' → Subtracts 2 hours | ||
|
|
||
| * Search with timestamp filtering | ||
| * You can combine **`TIMESTAMPTZ`** filtering with vector similarity search to narrow results by both time and similarity. | ||
|
|
||
|
|
||
|
|
||
| -------- | ||
|
|
||
| * Migration from Milvus v2.5.4 to v2.6.11 | ||
| * TIMESTAMPTZ is compatible with Milvus 2.6.6+ | ||
|
|
||
| * Migration according to the release notes for Milvus Standalone: https://milvus.io/docs/upgrade_milvus_standalone-docker.md | ||
| * `You must upgrade to v2.5.16 or later before upgrading to v2.6.11.` | ||
|
|
||
| * Steps for upgrading: https://milvus.io/docs/upgrade_milvus_standalone-docker.md#Upgrade-process | ||
|
|
||
| * Issue: I've moved from Milvs 2.5.4 to 2.6.11 following https://milvus.io/docs/upgrade_milvus_standalone-docker.md. Previous collections created in 2.5.4 can't be loaded. It runs forever. | ||
|
|
||
| * https://github.com/milvus-io/milvus/issues/43295 | ||
|
|
||
| * https://www.perplexity.ai/search/i-ve-moved-from-milvs-2-5-4-to-CDHCle5hQl.qsUa_nw4WHQ | ||
|
|
||
|
|
||
|
|
||
|
|
||
| * Done successfully | ||
|
|
||
| ----- | ||
|
|
||
| * Setting "datatype=DataType.TIMESTAMPTZ" datatype for the field created_at | ||
|
|
||
| * Search | ||
| * search_params for search https://milvus.io/api-reference/pymilvus/v2.6.x/MilvusClient/Vector/search.md#Request-syntax | ||
| * param via AnnSearchRequest: https://milvus.io/api-reference/pymilvus/v2.6.x/MilvusClient/Vector/hybrid_search.md#Request-Syntax | ||
|
|
||
|
|
||
| ----- | ||
|
|
||
| * Finally i manage to make it work following the migration steps | ||
|
|
||
| * Logical operators | ||
| * Logical operators are used to combine multiple conditions into a more complex filter expression. These include AND, OR, and NOT. | ||
|
|
||
| * Range operators | ||
| * https://milvus.io/docs/basic-operators.md#Range-operators | ||
| * Supported Range Operators: | ||
| * IN: Used to match values within a specific set or range. | ||
| * LIKE: Used to match a pattern (mostly for text fields). Milvus allows you to build an NGRAM index on VARCHAR or JSON fields to accelerate text queries. For details, refer to [NGRAM](https://milvus.io/docs/ngram.md). | ||
|
|
||
|
|
||
| ## Time | ||
|
|
||
| Time fields | ||
|
|
||
| * datetime | ||
| * modified_at | ||
| * created_at | ||
| ==> Added | ||
| * indexed_at | ||
|
|
||
|
|
||
| # Reorder |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.