Skip to content

[#9736][followup] feat(iceberg-rest): Skip full table load on ETag match via SupportsMetadataLocation#10536

Open
laserninja wants to merge 1 commit intoapache:mainfrom
laserninja:freshness-aware-loading-iceberg-rest-followup
Open

[#9736][followup] feat(iceberg-rest): Skip full table load on ETag match via SupportsMetadataLocation#10536
laserninja wants to merge 1 commit intoapache:mainfrom
laserninja:freshness-aware-loading-iceberg-rest-followup

Conversation

@laserninja
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Optimize the ETag-based freshness check in loadTable by leveraging SupportsMetadataLocation to resolve the metadata file location cheaply without loading full table metadata. When the client sends If-None-Match and the catalog supports it, the server can return 304 Not Modified without the cost of a full loadTable call.

Also fixes ETag consistency between createTable/updateTable and loadTable: ETags from create and update now include the default snapshots value ("all"), matching the default loadTable ETag. This allows clients to reuse ETags across endpoints as specified by the Iceberg REST spec.

Why are the changes needed?

  1. Performance: The original implementation always performs a full loadTable before comparing ETags. For read-heavy workloads where clients already have fresh metadata (ETag matches), this full load is wasted. By using SupportsMetadataLocation (already implemented for JDBC and Hive catalogs), we can compare ETags via a lightweight metadata location query and skip the full load entirely.

  2. ETag consistency (as reported by @FANNG1 in [#9736] feat(iceberg-rest): Support freshness-aware table loading with ETag #10498): createTable returned an ETag derived from metadataLocation only, while loadTable derived its ETag from metadataLocation + snapshots. For the default snapshots=all path, these values differed, so a client that reuses the ETag from create would never get 304 Not Modified on a subsequent unchanged loadTable.

Follow-up to: #10498

Does this PR introduce any user-facing change?

No user-facing API changes. The ETag values may differ from the previous implementation due to the consistency fix, but this is transparent to clients — they will simply get fresh ETags on the next request.

How was this patch tested?

  • Added testCreateTableETagMatchesLoadTableETag: Verifies that the ETag from createTable matches the default loadTable ETag, and that it produces 304 on a subsequent conditional load.
  • Added testUpdateTableETagMatchesLoadTableETag: Same verification for updateTable.
  • All existing ETag tests continue to pass (8 tests covering ETag presence, 304 matching, ETag changes after update, consistency, and snapshot-dependent ETags).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Iceberg REST loadTable ETag freshness checks by introducing a lightweight metadata-location lookup path (via SupportsMetadataLocation) and aligns ETag generation across createTable/updateTable with the default loadTable snapshots behavior to make ETags reusable across endpoints.

Changes:

  • Add a fast-path in loadTable to compute/compare ETags using only the metadata file location (skipping full table metadata load when possible).
  • Make createTable/updateTable ETags consistent with default loadTable ETag by incorporating the default snapshots=all.
  • Add tests asserting ETag consistency and reusability across create/update and load.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
iceberg/iceberg-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestIcebergTableOperations.java Adds tests ensuring create/update ETags match default loadTable ETag and yield 304 with If-None-Match.
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/rest/IcebergTableOperations.java Adds fast-path ETag check using metadata location; centralizes default snapshots constant; adjusts create/update ETag defaulting.
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableOperationExecutor.java Implements getTableMetadataLocation by delegating to the catalog wrapper.
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableOperationDispatcher.java Extends dispatcher API with optional getTableMetadataLocation(...) (nullable).
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableHookDispatcher.java Pass-through implementation of getTableMetadataLocation(...).
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableEventDispatcher.java Pass-through implementation of getTableMetadataLocation(...).
iceberg/iceberg-common/src/main/java/org/apache/gravitino/iceberg/common/ops/IcebergCatalogWrapper.java Adds wrapper-level getTableMetadataLocation(...) backed by SupportsMetadataLocation.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Code Coverage Report

Overall Project 64.94% -0.07% 🟢
Files changed 59.5% 🔴

Module Coverage
aliyun 1.73% 🔴
api 47.14% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.0% 🔴
catalog-fileset 80.02% 🟢
catalog-hive 80.98% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.89% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.25% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.83% 🟢
common 49.42% 🟢
core 81.0% -0.39% 🟢
filesystem-hadoop3 76.97% 🟢
flink 38.86% 🔴
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 45.82% 🟢
iceberg-common 50.0% -10.08% 🟢
iceberg-rest-server 66.67% +3.62% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.62% 🟢
server-common 70.14% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 31.62% 🔴
Files
Module File Coverage
core OperationType.java 100.0% 🟢
GravitinoEnv.java 10.98% 🔴
iceberg-common IcebergCatalogWrapper.java 0.0% 🔴
iceberg-rest-server IcebergTableOperationExecutor.java 95.59% 🟢
IcebergTableEventDispatcher.java 95.12% 🟢
IcebergTableOperations.java 80.58% 🟢
IcebergTableHookDispatcher.java 71.88% 🟢
IcebergTableOperationDispatcher.java 0.0% 🔴

@laserninja laserninja force-pushed the freshness-aware-loading-iceberg-rest-followup branch from ca9583a to d1c676f Compare March 25, 2026 03:35
@IcebergAuthorizationMetadata(type = RequestType.LOAD_TABLE) @Encoded() @PathParam("table")
String table,
@DefaultValue("all") @QueryParam("snapshots") String snapshots,
@DefaultValue(DEFAULT_SNAPSHOTS) @QueryParam("snapshots") String snapshots,
Copy link
Copy Markdown
Contributor

@roryqi roryqi Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point:
We can do it in the next pull request.
If the snapshots is all, we should return all the snapshots.
If the snapshots is refs, we should return all the snapshots is used in the references.
We should get the refs by loadTableResponse.tableMetadata().refs() and only return the snapshots which existed in the refs().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, will keep it for next one.

@FANNG1
Copy link
Copy Markdown
Contributor

FANNG1 commented Mar 25, 2026

Maybe another option is to push the ETag / conditional-load logic down into the loadTable path itself, instead of exposing a separate getTableMetadataLocation(...) interface.

For example, we could extend loadTable to support a conditional request and return either 304 or LoadTableResponse. That way:

  • we do not need to expose a storage/detail-oriented interface just for HTTP caching
  • the fast path can still reuse the existing loadTable event/hook chain
  • the ETag logic stays encapsulated in one place

This feels a bit cleaner to me than adding a separate metadata-location lookup API.

@laserninja laserninja force-pushed the freshness-aware-loading-iceberg-rest-followup branch from d1c676f to 74c38c6 Compare March 25, 2026 15:49
@laserninja
Copy link
Copy Markdown
Contributor Author

@FANNG1 Thanks for the suggestion! I considered this approach but went with a separate getTableMetadataLocation for a few reasons:

  1. Separation of concerns: ETag/conditional-load is an HTTP caching concern. Keeping it in the REST layer (IcebergTableOperations) avoids leaking HTTP semantics into the catalog abstraction.
  2. Minimal contract change: loadTable has a well-defined return type (LoadTableResponse). Adding conditional-load support would require changing the return type to a wrapper/union type, which touches many more callers.
  3. SupportsMetadataLocation already exists: We're exposing an existing catalog capability, not inventing a new abstraction. The interface is already implemented by JDBC/Hive/Memory catalogs.
  4. The fast path is optional: getTableMetadataLocation returns Optional.empty() for unsupported catalogs, and the full loadTable fallback always works.
    That said, if you feel strongly about encapsulating it within loadTable, I'm happy to refactor. Would appreciate your thoughts on how you'd see the return type working (e.g., Optional<LoadTableResponse>, or a new ConditionalLoadResult wrapper)?

@roryqi
Copy link
Copy Markdown
Contributor

roryqi commented Mar 26, 2026

LGTM, I am ok to add the new method getMetadataLocation. Let us wait for @FANNG1 suggestion.

new IcebergRequestContext(httpServletRequest(), catalogName, isCredentialVending);

// Fast path: if client sent If-None-Match, try to resolve ETag without full table load
if (ifNoneMatch != null && !ifNoneMatch.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could u use StringUtils.isNotBlank instead?

@FANNG1
Copy link
Copy Markdown
Contributor

FANNG1 commented Mar 26, 2026

Makes sense. I agree that HTTP-layer behavior should not be pushed all the way down to the lower layer.

My main concern with the current approach is not getTableMetadataLocation(...) itself, but that the fast path returns before going through the normal loadTable chain, which means it may bypass existing event/audit hooks.

If we want to encapsulate this better, I think we probably need a better loadTable result model first, for example a ConditionalLoadResult that can represent either not modified or loaded. That would let us keep the conditional-load logic on the loadTable path while still preserving the existing event/hook semantics.

@jerryshao @roryqi WDYT?

@roryqi
Copy link
Copy Markdown
Contributor

roryqi commented Mar 26, 2026

Makes sense. I agree that HTTP-layer behavior should not be pushed all the way down to the lower layer.

My main concern with the current approach is not getTableMetadataLocation(...) itself, but that the fast path returns before going through the normal loadTable chain, which means it may bypass existing event/audit hooks.

If we want to encapsulate this better, I think we probably need a better loadTable result model first, for example a ConditionalLoadResult that can represent either not modified or loaded. That would let us keep the conditional-load logic on the loadTable path while still preserving the existing event/hook semantics.

@jerryshao @roryqi WDYT?

It's ok that we don't call the event or hook, because we don't call the method loadTable.
If requests failed because of request parameters, we won't trigger the event and hook.
I feel that Etag is also a kind of check mechism to avoid calling underlying api.

…Tag match via SupportsMetadataLocation

Optimize the ETag-based freshness check in loadTable by leveraging
SupportsMetadataLocation to resolve the metadata file location cheaply
without loading full table metadata. When the client sends If-None-Match
and the catalog supports it, the server can return 304 Not Modified
without the cost of a full loadTable call.

Also fixes ETag consistency between create/update and loadTable endpoints:
ETags from createTable and updateTable now include the default snapshots
value ("all"), matching the default loadTable ETag. This allows clients
to reuse ETags across endpoints as specified by the Iceberg REST spec.

Changes:
- Add getTableMetadataLocation() to IcebergCatalogWrapper, dispatcher
  interface, and all dispatcher implementations
- Use fast path in loadTable when If-None-Match is present
- Fix ETag consistency: create/update use DEFAULT_SNAPSHOTS in hash
- Add tests verifying create/update ETags match default loadTable ETags
@laserninja laserninja force-pushed the freshness-aware-loading-iceberg-rest-followup branch from 74c38c6 to d35bc0f Compare March 26, 2026 19:32
@laserninja
Copy link
Copy Markdown
Contributor Author

Done. Will wait for @jerryshao's input

@roryqi roryqi requested a review from jerryshao March 28, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants