[#9736][followup] feat(iceberg-rest): Skip full table load on ETag match via SupportsMetadataLocation#10536
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes Iceberg REST loadTable ETag freshness checks by introducing a lightweight metadata-location lookup path (via SupportsMetadataLocation) and aligns ETag generation across createTable/updateTable with the default loadTable snapshots behavior to make ETags reusable across endpoints.
Changes:
- Add a fast-path in
loadTableto compute/compare ETags using only the metadata file location (skipping full table metadata load when possible). - Make
createTable/updateTableETags consistent with defaultloadTableETag by incorporating the defaultsnapshots=all. - Add tests asserting ETag consistency and reusability across create/update and load.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| iceberg/iceberg-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestIcebergTableOperations.java | Adds tests ensuring create/update ETags match default loadTable ETag and yield 304 with If-None-Match. |
| iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/rest/IcebergTableOperations.java | Adds fast-path ETag check using metadata location; centralizes default snapshots constant; adjusts create/update ETag defaulting. |
| iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableOperationExecutor.java | Implements getTableMetadataLocation by delegating to the catalog wrapper. |
| iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableOperationDispatcher.java | Extends dispatcher API with optional getTableMetadataLocation(...) (nullable). |
| iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableHookDispatcher.java | Pass-through implementation of getTableMetadataLocation(...). |
| iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/dispatcher/IcebergTableEventDispatcher.java | Pass-through implementation of getTableMetadataLocation(...). |
| iceberg/iceberg-common/src/main/java/org/apache/gravitino/iceberg/common/ops/IcebergCatalogWrapper.java | Adds wrapper-level getTableMetadataLocation(...) backed by SupportsMetadataLocation. |
...t-server/src/main/java/org/apache/gravitino/iceberg/service/rest/IcebergTableOperations.java
Show resolved
Hide resolved
...berg-common/src/main/java/org/apache/gravitino/iceberg/common/ops/IcebergCatalogWrapper.java
Outdated
Show resolved
Hide resolved
...t-server/src/main/java/org/apache/gravitino/iceberg/service/rest/IcebergTableOperations.java
Show resolved
Hide resolved
Code Coverage Report
Files
|
ca9583a to
d1c676f
Compare
...berg-common/src/main/java/org/apache/gravitino/iceberg/common/ops/IcebergCatalogWrapper.java
Outdated
Show resolved
Hide resolved
| @IcebergAuthorizationMetadata(type = RequestType.LOAD_TABLE) @Encoded() @PathParam("table") | ||
| String table, | ||
| @DefaultValue("all") @QueryParam("snapshots") String snapshots, | ||
| @DefaultValue(DEFAULT_SNAPSHOTS) @QueryParam("snapshots") String snapshots, |
There was a problem hiding this comment.
Another point:
We can do it in the next pull request.
If the snapshots is all, we should return all the snapshots.
If the snapshots is refs, we should return all the snapshots is used in the references.
We should get the refs by loadTableResponse.tableMetadata().refs() and only return the snapshots which existed in the refs().
There was a problem hiding this comment.
sounds good, will keep it for next one.
|
Maybe another option is to push the ETag / conditional-load logic down into the For example, we could extend
This feels a bit cleaner to me than adding a separate metadata-location lookup API. |
d1c676f to
74c38c6
Compare
|
@FANNG1 Thanks for the suggestion! I considered this approach but went with a separate
|
|
LGTM, I am ok to add the new method |
| new IcebergRequestContext(httpServletRequest(), catalogName, isCredentialVending); | ||
|
|
||
| // Fast path: if client sent If-None-Match, try to resolve ETag without full table load | ||
| if (ifNoneMatch != null && !ifNoneMatch.isEmpty()) { |
There was a problem hiding this comment.
Could u use StringUtils.isNotBlank instead?
|
Makes sense. I agree that HTTP-layer behavior should not be pushed all the way down to the lower layer. My main concern with the current approach is not If we want to encapsulate this better, I think we probably need a better @jerryshao @roryqi WDYT? |
It's ok that we don't call the event or hook, because we don't call the method |
…Tag match via SupportsMetadataLocation
Optimize the ETag-based freshness check in loadTable by leveraging
SupportsMetadataLocation to resolve the metadata file location cheaply
without loading full table metadata. When the client sends If-None-Match
and the catalog supports it, the server can return 304 Not Modified
without the cost of a full loadTable call.
Also fixes ETag consistency between create/update and loadTable endpoints:
ETags from createTable and updateTable now include the default snapshots
value ("all"), matching the default loadTable ETag. This allows clients
to reuse ETags across endpoints as specified by the Iceberg REST spec.
Changes:
- Add getTableMetadataLocation() to IcebergCatalogWrapper, dispatcher
interface, and all dispatcher implementations
- Use fast path in loadTable when If-None-Match is present
- Fix ETag consistency: create/update use DEFAULT_SNAPSHOTS in hash
- Add tests verifying create/update ETags match default loadTable ETags
74c38c6 to
d35bc0f
Compare
|
Done. Will wait for @jerryshao's input |
What changes were proposed in this pull request?
Optimize the ETag-based freshness check in
loadTableby leveragingSupportsMetadataLocationto resolve the metadata file location cheaply without loading full table metadata. When the client sendsIf-None-Matchand the catalog supports it, the server can return304 Not Modifiedwithout the cost of a fullloadTablecall.Also fixes ETag consistency between
createTable/updateTableandloadTable: ETags from create and update now include the defaultsnapshotsvalue ("all"), matching the defaultloadTableETag. This allows clients to reuse ETags across endpoints as specified by the Iceberg REST spec.Why are the changes needed?
Performance: The original implementation always performs a full
loadTablebefore comparing ETags. For read-heavy workloads where clients already have fresh metadata (ETag matches), this full load is wasted. By usingSupportsMetadataLocation(already implemented for JDBC and Hive catalogs), we can compare ETags via a lightweight metadata location query and skip the full load entirely.ETag consistency (as reported by @FANNG1 in [#9736] feat(iceberg-rest): Support freshness-aware table loading with ETag #10498):
createTablereturned an ETag derived frommetadataLocationonly, whileloadTablederived its ETag frommetadataLocation + snapshots. For the defaultsnapshots=allpath, these values differed, so a client that reuses the ETag from create would never get304 Not Modifiedon a subsequent unchangedloadTable.Follow-up to: #10498
Does this PR introduce any user-facing change?
No user-facing API changes. The ETag values may differ from the previous implementation due to the consistency fix, but this is transparent to clients — they will simply get fresh ETags on the next request.
How was this patch tested?
testCreateTableETagMatchesLoadTableETag: Verifies that the ETag fromcreateTablematches the defaultloadTableETag, and that it produces304on a subsequent conditional load.testUpdateTableETagMatchesLoadTableETag: Same verification forupdateTable.