test(relationships): add test for standalone file yielding no extra c… by Ahmath-Gadji · Pull Request #289 · linagora/openrag

Ahmath-Gadji · 2026-03-17T14:23:46Z

Fix: exclude standalone files from relationship expansion

Ensures files without a relationship_id are not incorrectly expanded with related or ancestor chunks.

Changes:

get_file_ancestors in utils.py now filters out rows where relationship_id is None
Added TestStandaloneFileNoExpansion test class covering: no related files returned, only the file itself as ancestor, and correct storage of falsy relationship_id

Summary by CodeRabbit

Bug Fixes
- Fixed handling of standalone files in relationship queries to exclude files without relationships.
Tests
- Added tests for relationship handling and data model serialization to ensure reliability.

coderabbitai · 2026-03-17T14:23:54Z

Warning

Rate limit exceeded

@Ahmath-Gadji has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06f84aa2-f907-4d42-ad81-cdd74f9fcf90

📥 Commits

Reviewing files that changed from the base of the PR and between a2920bf and 5cfd4f5.

📒 Files selected for processing (2)

openrag/components/indexer/vectordb/utils.py
openrag/components/test_relationships.py

📝 Walkthrough

Walkthrough

The changes fix a behavior in get_file_ancestors to filter out entries lacking a relationship_id, ensuring only files part of a relationship are included. Corresponding tests verify that standalone files without relationships do not produce extra related chunks and that the to_dict method includes relationship fields.

Changes

Cohort / File(s)	Summary
Relationship Filter Fix `openrag/components/indexer/vectordb/utils.py`	Updated `get_file_ancestors` to filter out entries without a relationship_id, ensuring only related files are included in results.
Test Coverage for Standalone Files `openrag/components/test_relationships.py`	Added `TestStandaloneFileNoExpansion` class with test validating that files without relationship_id do not expand to extra related chunks; added `test_to_dict_includes_relationship_fields` test to verify relationship fields are serialized correctly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

fix

Poem

🐰 A lone file hops without a tie,
No relationship to multiply,
We filter clean, keep ancestors true,
Each connection now shines through! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: adding a test for standalone files that should not produce extra chunks. The truncated title ('test(relationships): add test for standalone file yielding no extra c…') is concise and directly related to the PR's primary objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/get_files_by_relationship

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

openrag/components/indexer/vectordb/utils.py (1)

571-582: ⚠️ Potential issue | 🔴 Critical

Preserve the target row for standalone files in get_file_ancestors().

At Line 581, filtering only on row.relationship_id is not None drops the depth-0 target row when the file is standalone, so this method returns [] instead of returning the file itself.

💡 Proposed fix

             return [
                 {
                     "file_id": row.file_id,
                     "partition": row.partition_name,
                     "parent_id": row.parent_id,
                     "relationship_id": row.relationship_id,
                     "depth": row.depth,
                     **(row.file_metadata or {}),
                 }
                 for row in result
-                if row.relationship_id is not None  # Only include files that are part of a relationship
+                if row.depth == 0 or row.relationship_id is not None
             ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@openrag/components/indexer/vectordb/utils.py` around lines 571 - 582,
get_file_ancestors() currently filters out rows with relationship_id == None
which drops the depth-0 target row for standalone files; update the list
comprehension filter to include the target row by allowing rows where
row.relationship_id is not None OR row.depth == 0 (i.e., change the condition to
"if row.relationship_id is not None or row.depth == 0") so standalone files at
depth 0 are preserved in the returned list.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openrag/components/test_relationships.py`:
- Around line 587-624: The test
test_no_extra_chunks_for_file_without_relationship_id relies on
PartitionFileManagerHelper.get_file_ancestors which does not implement the new
ancestor filtering added in openrag/components/indexer/vectordb/utils.py; update
the test to exercise the same production logic by calling the actual function
that contains the new filter (or by invoking the production helper in
openrag/components/indexer/vectordb/utils.py that performs ancestor filtering)
instead of the helper stub, or modify
PartitionFileManagerHelper.get_file_ancestors to replicate the exact filter
logic used in the vectordb utils so the test validates the real ancestor
behavior introduced in this PR.

---

Outside diff comments:
In `@openrag/components/indexer/vectordb/utils.py`:
- Around line 571-582: get_file_ancestors() currently filters out rows with
relationship_id == None which drops the depth-0 target row for standalone files;
update the list comprehension filter to include the target row by allowing rows
where row.relationship_id is not None OR row.depth == 0 (i.e., change the
condition to "if row.relationship_id is not None or row.depth == 0") so
standalone files at depth 0 are preserved in the returned list.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 717f6ad6-a889-4384-879f-3075c0a57df2

📥 Commits

Reviewing files that changed from the base of the PR and between 5f05d37 and a2920bf.

📒 Files selected for processing (2)

openrag/components/indexer/vectordb/utils.py
openrag/components/test_relationships.py

openrag/components/test_relationships.py

…hunks on expansion

Ahmath-Gadji marked this pull request as ready for review March 17, 2026 14:45

coderabbitai bot added the fix Fix issue label Mar 17, 2026

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

openrag/components/test_relationships.py Show resolved Hide resolved

test(relationships): add test for standalone file yielding no extra c…

5cfd4f5

…hunks on expansion

Ahmath-Gadji force-pushed the fix/get_files_by_relationship branch from a2920bf to 5cfd4f5 Compare March 17, 2026 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(relationships): add test for standalone file yielding no extra c…#289

test(relationships): add test for standalone file yielding no extra c…#289
Ahmath-Gadji wants to merge 1 commit intodevfrom
fix/get_files_by_relationship

Ahmath-Gadji commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ahmath-Gadji commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ahmath-Gadji commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading