Skip to content

Support parent-scoped asset counts and assets listing for glossary terms#26801

Merged
sonika-shah merged 6 commits intomainfrom
fix-3241
Mar 31, 2026
Merged

Support parent-scoped asset counts and assets listing for glossary terms#26801
sonika-shah merged 6 commits intomainfrom
fix-3241

Conversation

@sonika-shah
Copy link
Copy Markdown
Collaborator

@sonika-shah sonika-shah commented Mar 26, 2026

Describe your changes:

Fixes #3241

  • Added optional parent query parameter for glossary term asset endpoints: GET /assets/counts,
  • Enables efficient retrieval of aggregated asset counts for a specific glossary or glossary term hierarchy without fetching all terms globally

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copilot AI review requested due to automatic review settings March 26, 2026 12:05
@sonika-shah sonika-shah added safe to test Add this label to run secure Github workflows on PRs backend labels Mar 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional parent scope filter to glossary-term asset count and asset listing APIs, enabling callers (e.g., UI) to request results limited to a specific glossary or glossary-term subtree.

Changes:

  • Add parent query parameter to /v1/glossaryTerms/assets/counts, /v1/glossaryTerms/{id}/assets, and /v1/glossaryTerms/name/{fqn}/assets.
  • Extend GlossaryTermRepository methods to accept parent and apply parent-scope filtering.
  • Add integration tests covering the new parent query parameter on these endpoints.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/resources/glossary/GlossaryTermResource.java Exposes the new parent query parameter on glossary term asset count/listing endpoints.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java Implements parent-scope filtering for asset counts and asset listing queries.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/GlossaryTermResourceIT.java Adds integration tests intended to validate parent-filtered asset counts and asset listing behavior.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

OpenMetadata Service New-Code Coverage

FAIL. Required changed-line coverage: 90.00% overall and per touched production file.

  • Overall executable changed lines: 2/28 covered (7.14%)
  • Missed executable changed lines: 26
  • Non-executable changed lines ignored by JaCoCo: 35
  • Changed production files: 4

Files below threshold:

  • openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java: 0/13 covered (0.00%), uncovered lines 177, 178, 179, 182, 183, 184, 190, 200, 212, 213, 214, 215, +1 more
  • openmetadata-service/src/main/java/org/openmetadata/service/search/InheritedFieldEntitySearch.java: 0/9 covered (0.00%), uncovered lines 249, 250, 251, 252, 253, 254, 255, 256, 257
  • openmetadata-service/src/main/java/org/openmetadata/service/resources/glossary/GlossaryTermResource.java: 0/3 covered (0.00%), uncovered lines 447, 883, 931
  • openmetadata-service/src/main/java/org/openmetadata/service/search/QueryFilterBuilder.java: 2/3 covered (66.67%), uncovered lines 139
File Covered Missed Executable Non-exec Coverage Uncovered lines
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java 0 13 13 7 0.00% 177, 178, 179, 182, 183, 184, 190, 200, 212, 213, 214, 215, +1 more
openmetadata-service/src/main/java/org/openmetadata/service/search/InheritedFieldEntitySearch.java 0 9 9 4 0.00% 249, 250, 251, 252, 253, 254, 255, 256, 257
openmetadata-service/src/main/java/org/openmetadata/service/resources/glossary/GlossaryTermResource.java 0 3 3 22 0.00% 447, 883, 931
openmetadata-service/src/main/java/org/openmetadata/service/search/QueryFilterBuilder.java 2 1 3 2 66.67% 139

Only changed executable lines under openmetadata-service/src/main/java are counted. Test files, comments, imports, and non-executable lines are excluded.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

🟡 Playwright Results — all passed (27 flaky)

✅ 3434 passed · ❌ 0 failed · 🟡 27 flaky · ⏭️ 223 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 452 0 3 2
🟡 Shard 2 613 0 1 32
🟡 Shard 3 612 0 7 27
🟡 Shard 4 617 0 7 47
🟡 Shard 5 586 0 1 67
🟡 Shard 6 554 0 8 48
🟡 27 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the Pipeline entity item action after rules disabled (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Resolving incident & re-run pipeline (shard 3, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/QueryEntity.spec.ts › Query Entity (shard 3, 1 retry)
  • Flow/CustomizeWidgets.spec.ts › Activity Feed Widget (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should not display soft deleted assets in search suggestions (shard 3, 1 retry)
  • Flow/IngestionBot.spec.ts › Ingestion bot should be able to access domain specific domain (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Spreadsheet (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Add and update Security and SLA tabs (shard 4, 1 retry)
  • Pages/DomainAdvanced.spec.ts › Remove multiple assets from domain at once (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Follow/unfollow subdomain and create nested sub domain (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Multiple consecutive domain renames preserve all associations (shard 4, 1 retry)
  • Pages/DomainUIInteractions.spec.ts › Add owner to data product via UI (shard 4, 1 retry)
  • Pages/EntityDataConsumer.spec.ts › Glossary Term Add, Update and Remove (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 6, 2 retries)
  • Pages/Glossary.spec.ts › Create glossary with all optional fields (tags, owners, reviewers, domain) (shard 6, 1 retry)
  • Pages/InputOutputPorts.spec.ts › Lineage section collapse/expand (shard 6, 1 retry)
  • Pages/InputOutputPorts.spec.ts › Multiple sections can be collapsed independently (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings March 31, 2026 06:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings March 31, 2026 07:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Mar 31, 2026

Code Review 👍 Approved with suggestions 3 resolved / 4 findings

Adds parent-scoped asset counts and assets listing for glossary terms with performance optimizations including Elasticsearch query pushdown and reduced deserialization. Consider consolidating the redundant getCertificationClassification() call to avoid duplicate processing per entity read.

💡 Performance: getCertificationClassification() called redundantly per entity read

The getCertificationClassification() method is called separately from getCertification(), getTags(), and getTagsByPrefix() during a single entity read operation. While SettingsCache is memory-cached (3-min TTL), the repeated cache lookups, string operations, and AssetCertificationSettings deserialization are unnecessary overhead that compounds in bulk/list operations. Consider caching the result in a local variable at the operation level or passing it as a parameter.

Suggested fix
// Option: add a thread-local or pass certClassification as parameter
// For example, in getTags():
private List<TagLabel> filterCertTags(List<TagLabel> tags, String certClassification) {
  if (certClassification != null && tags != null) {
    tags = new ArrayList<>(tags);
    tags.removeIf(tag -> certClassification.equals(
        FullyQualifiedName.getParentFQN(tag.getTagFQN())));
  }
  return tags;
}
✅ 3 resolved
Performance: getAllGlossaryTermsWithAssetsCount loads all terms before filtering

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java:211-218 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java:170-176
In getAllGlossaryTermsWithAssetsCount, listAll() at line 212 fetches every glossary term in the system into memory, and only then applies the parent filter at line 214-218. When a parent is provided, this loads potentially thousands of terms only to discard most of them. Since each surviving term also triggers an individual search query (line 224-235), the combination of full table scan + N search calls can be expensive.

Consider pushing the parent filter into the listAll query (e.g., via a ListFilter that filters by FQN prefix) so only matching terms are loaded, reducing both memory usage and the number of search queries.

Bug: Parent filter silently ignored when term is not under parent

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java:178-184
The old code had a guard clause: if parent is provided but the term's FQN is not a child of parent, return an empty result list. This guard was removed in the refactor. Now when parent is provided but FullyQualifiedName.isParent(fqn, parent) is false, useParentScope is false and the code falls through to forGlossaryTerm(fqn, ...), returning the term's assets as if no parent filter was specified at all.

This is a behavioral regression — callers that pass a parent parameter expect it to act as a scope filter, but now an unrelated term's assets are returned instead of an empty list.

Performance: Parent-scoped path deserializes full entity JSON unnecessarily

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryTermRepository.java:204-210
When parent is provided, getNestedTerms(parent) returns the complete JSON blob for every matching glossary term, and JsonUtils.readObjects(...) fully deserializes all fields. However, only fullyQualifiedName is used in the subsequent loop. In contrast, the no-parent path uses listAll(getFields("fullyQualifiedName"), ...) which is more selective.

For large glossaries with many nested terms, this means deserializing potentially large JSON documents (with references, tags, descriptions, etc.) just to extract the FQN. Consider adding a lighter DAO query that selects only the FQN column, or reuse the listAll approach with an appropriate ListFilter that scopes to the parent FQN.

🤖 Prompt for agents
Code Review: Adds parent-scoped asset counts and assets listing for glossary terms with performance optimizations including Elasticsearch query pushdown and reduced deserialization. Consider consolidating the redundant getCertificationClassification() call to avoid duplicate processing per entity read.

1. 💡 Performance: getCertificationClassification() called redundantly per entity read

   The `getCertificationClassification()` method is called separately from `getCertification()`, `getTags()`, and `getTagsByPrefix()` during a single entity read operation. While `SettingsCache` is memory-cached (3-min TTL), the repeated cache lookups, string operations, and `AssetCertificationSettings` deserialization are unnecessary overhead that compounds in bulk/list operations. Consider caching the result in a local variable at the operation level or passing it as a parameter.

   Suggested fix:
   // Option: add a thread-local or pass certClassification as parameter
   // For example, in getTags():
   private List<TagLabel> filterCertTags(List<TagLabel> tags, String certClassification) {
     if (certClassification != null && tags != null) {
       tags = new ArrayList<>(tags);
       tags.removeIf(tag -> certClassification.equals(
           FullyQualifiedName.getParentFQN(tag.getTagFQN())));
     }
     return tags;
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonika-shah
Copy link
Copy Markdown
Collaborator Author

@gitar-bot can you review the backend changes again #26801 (comment)

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Mar 31, 2026

@gitar-bot can you review the backend changes again #26801 (comment)

Here's my review of the latest backend changes:

GlossaryTermRepository.javagetAllGlossaryTermsWithAssetsCount(String parent)

The refactor correctly addresses the previous concerns:

  • When parent is provided, getNestedTerms(parent) is called and only the fullyQualifiedName field is extracted via JsonUtils.readTree(json).path("fullyQualifiedName").asText(). This avoids full GlossaryTerm deserialization — good.
  • When parent is null/empty, listAll(getFields("fullyQualifiedName"), ...) is used — consistent with the previous unfiltered path.
  • The loop now operates on String fqn instead of GlossaryTerm objects — clean and correct.

One remaining concern:

getNestedTerms may return empty strings silently. If JsonUtils.readTree(json).path("fullyQualifiedName") is missing or null in a record, asText() returns "" (empty string). The resulting empty key in glossaryTermAssetCounts could produce a spurious Elasticsearch query. Consider filtering with .filter(fqn -> !fqn.isEmpty()) after the .map(...) call.


GlossaryTermRepository.javagetGlossaryTermAssets

The change here is a minor cosmetic refactor (extract fqn variable to reduce repetition in the log call). No behavioral change. Looks fine.

Note: the earlier finding about the missing parent-filter guard in getGlossaryTermAssets (where an out-of-scope term would return assets instead of an empty list) was addressed in a prior iteration via Elasticsearch-side filtering, so that concern is resolved.


GlossaryTermResource.java

Clean — @QueryParam("parent") is correctly wired through with a descriptive @Parameter annotation. No issues.


DefaultInheritedFieldEntitySearchTest.java

The added assertion assertFalse(glossary.isSupportsHierarchy()) is valid and correctly documents the expected behavior. No issues.


GlossaryTermResourceIT.java

Test coverage is solid overall. A few observations:

  1. get_assetsCountsWithParentFilter verifies that filtering by glossary FQN includes only that glossary's terms — good. However, it only asserts that keys exist in the map, not their values. This is acceptable for an integration smoke test, but be aware it won't catch count miscalculations.

  2. get_assetsCountsWithNoParent only asserts assertNotNull — effectively just a "doesn't crash" test. Low value but not harmful.

  3. getAssetCounts, getTermAssetsById, and getTermAssetsByName helpers directly call raw HTTP endpoints. This is fine for integration tests where the typed SDK doesn't expose these endpoints yet.


Summary: The changes are in good shape. The only actionable item is adding a .filter(fqn -> !fqn.isEmpty()) guard after the .map(...) in the parent-scoped branch of getAllGlossaryTermsWithAssetsCount to handle potential missing FQN fields in the raw JSON.

@sonarqubecloud
Copy link
Copy Markdown

@sonika-shah sonika-shah enabled auto-merge (squash) March 31, 2026 11:48
@sonika-shah sonika-shah merged commit dde03d1 into main Mar 31, 2026
51 checks passed
@sonika-shah sonika-shah deleted the fix-3241 branch March 31, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants