Skip to content

test(relationships): add test for standalone file yielding no extra c…#289

Open
Ahmath-Gadji wants to merge 1 commit intodevfrom
fix/get_files_by_relationship
Open

test(relationships): add test for standalone file yielding no extra c…#289
Ahmath-Gadji wants to merge 1 commit intodevfrom
fix/get_files_by_relationship

Conversation

@Ahmath-Gadji
Copy link
Collaborator

@Ahmath-Gadji Ahmath-Gadji commented Mar 17, 2026

Fix: exclude standalone files from relationship expansion

Ensures files without a relationship_id are not incorrectly expanded with related or ancestor chunks.

Changes:

  • get_file_ancestors in utils.py now filters out rows where relationship_id is None
  • Added TestStandaloneFileNoExpansion test class covering: no related files returned, only the file itself as ancestor, and correct storage of falsy relationship_id

Summary by CodeRabbit

  • Bug Fixes

    • Fixed handling of standalone files in relationship queries to exclude files without relationships.
  • Tests

    • Added tests for relationship handling and data model serialization to ensure reliability.

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Warning

Rate limit exceeded

@Ahmath-Gadji has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06f84aa2-f907-4d42-ad81-cdd74f9fcf90

📥 Commits

Reviewing files that changed from the base of the PR and between a2920bf and 5cfd4f5.

📒 Files selected for processing (2)
  • openrag/components/indexer/vectordb/utils.py
  • openrag/components/test_relationships.py
📝 Walkthrough

Walkthrough

The changes fix a behavior in get_file_ancestors to filter out entries lacking a relationship_id, ensuring only files part of a relationship are included. Corresponding tests verify that standalone files without relationships do not produce extra related chunks and that the to_dict method includes relationship fields.

Changes

Cohort / File(s) Summary
Relationship Filter Fix
openrag/components/indexer/vectordb/utils.py
Updated get_file_ancestors to filter out entries without a relationship_id, ensuring only related files are included in results.
Test Coverage for Standalone Files
openrag/components/test_relationships.py
Added TestStandaloneFileNoExpansion class with test validating that files without relationship_id do not expand to extra related chunks; added test_to_dict_includes_relationship_fields test to verify relationship fields are serialized correctly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

fix

Poem

🐰 A lone file hops without a tie,
No relationship to multiply,
We filter clean, keep ancestors true,
Each connection now shines through! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding a test for standalone files that should not produce extra chunks. The truncated title ('test(relationships): add test for standalone file yielding no extra c…') is concise and directly related to the PR's primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/get_files_by_relationship
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ahmath-Gadji Ahmath-Gadji marked this pull request as ready for review March 17, 2026 14:45
@coderabbitai coderabbitai bot added the fix Fix issue label Mar 17, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openrag/components/indexer/vectordb/utils.py (1)

571-582: ⚠️ Potential issue | 🔴 Critical

Preserve the target row for standalone files in get_file_ancestors().

At Line 581, filtering only on row.relationship_id is not None drops the depth-0 target row when the file is standalone, so this method returns [] instead of returning the file itself.

💡 Proposed fix
             return [
                 {
                     "file_id": row.file_id,
                     "partition": row.partition_name,
                     "parent_id": row.parent_id,
                     "relationship_id": row.relationship_id,
                     "depth": row.depth,
                     **(row.file_metadata or {}),
                 }
                 for row in result
-                if row.relationship_id is not None  # Only include files that are part of a relationship
+                if row.depth == 0 or row.relationship_id is not None
             ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openrag/components/indexer/vectordb/utils.py` around lines 571 - 582,
get_file_ancestors() currently filters out rows with relationship_id == None
which drops the depth-0 target row for standalone files; update the list
comprehension filter to include the target row by allowing rows where
row.relationship_id is not None OR row.depth == 0 (i.e., change the condition to
"if row.relationship_id is not None or row.depth == 0") so standalone files at
depth 0 are preserved in the returned list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openrag/components/test_relationships.py`:
- Around line 587-624: The test
test_no_extra_chunks_for_file_without_relationship_id relies on
PartitionFileManagerHelper.get_file_ancestors which does not implement the new
ancestor filtering added in openrag/components/indexer/vectordb/utils.py; update
the test to exercise the same production logic by calling the actual function
that contains the new filter (or by invoking the production helper in
openrag/components/indexer/vectordb/utils.py that performs ancestor filtering)
instead of the helper stub, or modify
PartitionFileManagerHelper.get_file_ancestors to replicate the exact filter
logic used in the vectordb utils so the test validates the real ancestor
behavior introduced in this PR.

---

Outside diff comments:
In `@openrag/components/indexer/vectordb/utils.py`:
- Around line 571-582: get_file_ancestors() currently filters out rows with
relationship_id == None which drops the depth-0 target row for standalone files;
update the list comprehension filter to include the target row by allowing rows
where row.relationship_id is not None OR row.depth == 0 (i.e., change the
condition to "if row.relationship_id is not None or row.depth == 0") so
standalone files at depth 0 are preserved in the returned list.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 717f6ad6-a889-4384-879f-3075c0a57df2

📥 Commits

Reviewing files that changed from the base of the PR and between 5f05d37 and a2920bf.

📒 Files selected for processing (2)
  • openrag/components/indexer/vectordb/utils.py
  • openrag/components/test_relationships.py

@Ahmath-Gadji Ahmath-Gadji force-pushed the fix/get_files_by_relationship branch from a2920bf to 5cfd4f5 Compare March 17, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Fix issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant