-
Notifications
You must be signed in to change notification settings - Fork 50
Migrate Iceberg catalog #1692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Migrate Iceberg catalog #1692
Changes from all commits
ce1dfba
5862345
b697526
b57f1e5
ca9e042
a26fbed
0bf1d1c
974824b
3b12fce
b8d9e49
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,126 @@ | ||||||
| = Migrate Iceberg Catalogs | ||||||
| :description: Switch the Iceberg catalog backend for an existing Redpanda cluster without losing untranslated topic data. | ||||||
|
|
||||||
| // tag::single-source[] | ||||||
| :page-topic-type: how-to | ||||||
| :page-categories: Iceberg, Migration | ||||||
| :personas: ops_admin, streaming_developer | ||||||
| :learning-objective-1: Verify that a target Iceberg catalog supports your existing schemas and partition specs | ||||||
| :learning-objective-2: Pause Iceberg translation and drain pending commits without losing untranslated data | ||||||
| :learning-objective-3: Apply new catalog configuration and resume translation safely | ||||||
|
|
||||||
| Switch your cluster from one Iceberg catalog backend to another without losing untranslated topic data. Use this procedure when moving from the filesystem-based `object_storage` catalog to a managed REST catalog, or when changing between REST catalogs. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This first line seems abrupt, like it came out of some other context. The second sentence seems like a better opening sentence. Use the procedures in this topic when moving from the filesystem-based |
||||||
|
|
||||||
| The procedure pauses Iceberg translation per topic, lets pending commits drain to the old catalog, applies the new catalog configuration, and restarts the cluster. While translation is paused, retention is temporarily set to infinite to prevent untranslated data from being deleted. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| After reading this page, you will be able to: | ||||||
|
|
||||||
| * [ ] {learning-objective-1} | ||||||
| * [ ] {learning-objective-2} | ||||||
| * [ ] {learning-objective-3} | ||||||
|
|
||||||
| [IMPORTANT] | ||||||
| ==== | ||||||
| Do not change config_ref:iceberg_catalog_type,true,properties/cluster-properties[`iceberg_catalog_type`] or any other catalog cluster property in place without following this procedure. In-flight commits and untranslated data can be lost or stuck if the catalog changes mid-translation. | ||||||
| ==== | ||||||
|
|
||||||
| == Prerequisites | ||||||
|
|
||||||
| * Iceberg topics enabled and running on your Redpanda cluster. | ||||||
| * Network connectivity from all brokers to the new catalog endpoint. | ||||||
| * Credentials configured for the new catalog (REST endpoint, authentication mode, secret or token). For configuration guidance for each catalog type, see xref:manage:iceberg/use-iceberg-catalogs.adoc[]. | ||||||
| * The new catalog must support the current schema and partition spec of every Iceberg topic. See <<verify-catalog-compatibility,Verify catalog compatibility>>. | ||||||
|
|
||||||
| == Verify catalog compatibility | ||||||
|
|
||||||
| Before starting the migration, verify that the new catalog can host every Iceberg topic's table with its existing schema and partition spec. Mismatches discovered after migration cause already-translated Parquet files to fail to commit, blocking translation in a state that is difficult to recover from. | ||||||
|
|
||||||
| The simplest validation is to manually create a test table in the new catalog with the same schema and partition spec as one of your Iceberg topics. If the create call fails, fix the partition spec or schema before migrating. Delete the test table after validation. | ||||||
|
|
||||||
| [CAUTION] | ||||||
| ==== | ||||||
| AWS Glue does not support partitioning on a nested field, which is Redpanda's default partition spec for Iceberg topics. If you migrate to AWS Glue, you must change the partition spec to a Glue-compatible form before starting the migration procedure. | ||||||
| ==== | ||||||
|
|
||||||
| == Run the migration | ||||||
|
|
||||||
| . Save the current `retention.ms` and `retention.bytes` values for every Iceberg topic, then set both to `-1` (infinite retention): | ||||||
| + | ||||||
| [,bash] | ||||||
| ---- | ||||||
| rpk topic alter-config <topic-name> --set retention.ms=-1 --set retention.bytes=-1 | ||||||
| ---- | ||||||
| + | ||||||
| While Iceberg translation is paused in the next step, the topic's retention anchor on the log is released. Without infinite retention, the cluster could delete untranslated data before the migration completes. | ||||||
|
|
||||||
| . Pause Iceberg translation on every Iceberg topic by setting `redpanda.iceberg.mode` to `disabled`. Save each topic's previous mode value so you can restore it later. | ||||||
| + | ||||||
| [,bash] | ||||||
| ---- | ||||||
| rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=disabled | ||||||
| ---- | ||||||
| + | ||||||
| Setting the mode to `disabled` stops new translation while letting already-translated data finish committing to the old catalog. For more about Iceberg modes, see xref:manage:iceberg/specify-iceberg-schema.adoc[]. | ||||||
| + | ||||||
| NOTE: Do not change config_ref:iceberg_enabled,true,properties/cluster-properties[`iceberg_enabled`] at the cluster level. The Iceberg integration must remain enabled at the cluster level so that pending commits can drain to the old catalog. | ||||||
|
|
||||||
| . Wait for pending commits to drain. Monitor the `redpanda_iceberg_pending_commit_lag` metric until it reaches `0` for every Iceberg topic-partition. | ||||||
| + | ||||||
| This metric reports the number of offsets pending a commit to the Iceberg catalog. While it is non-zero, Redpanda is still flushing translated data to the old catalog. The translation lag metric, `redpanda_iceberg_pending_translation_lag`, can remain non-zero. That value reflects new records the cluster has not yet translated, which is expected while translation is paused. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Which is expected" has an ambiguous antecedent — it could refer to the metric being non-zero, the untranslated records, or the whole situation
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| + | ||||||
| [TIP] | ||||||
| ==== | ||||||
| If you scrape Prometheus, the following expression returns `0` only when every Iceberg-topic partition has fully drained: | ||||||
|
|
||||||
| [,promql] | ||||||
| ---- | ||||||
| sum(redpanda_iceberg_pending_commit_lag) | ||||||
| ---- | ||||||
| ==== | ||||||
|
|
||||||
| . Apply the new catalog configuration. For example, to switch from `object_storage` to a REST catalog, update the catalog cluster properties: | ||||||
| + | ||||||
| [,bash] | ||||||
| ---- | ||||||
| rpk cluster config set iceberg_catalog_type rest | ||||||
| rpk cluster config set iceberg_rest_catalog_endpoint <endpoint-url> | ||||||
| rpk cluster config set iceberg_rest_catalog_authentication_mode oauth2 | ||||||
| # Set additional credential properties for your chosen authentication mode. | ||||||
| ---- | ||||||
| + | ||||||
| For full guidance on setting catalog cluster properties, see xref:manage:iceberg/use-iceberg-catalogs.adoc#rest[Connect to a REST catalog] and the individual xref:manage:iceberg/rest-catalog/index.adoc[REST catalog integration pages]. | ||||||
|
|
||||||
| . Restart Redpanda. The catalog cluster properties require a restart to take effect. | ||||||
| + | ||||||
| ifndef::env-cloud[] | ||||||
| For instructions, see xref:manage:cluster-maintenance/rolling-restart.adoc[]. | ||||||
| endif::[] | ||||||
| ifdef::env-cloud[] | ||||||
| Coordinate the restart with Redpanda Support. The restart must occur after `redpanda_iceberg_pending_commit_lag` has reached `0` (step 3) and before you resume translation in the next step. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can Cloud customers not initiate a rolling restart themselves? If they can, this guidance may be overly restrictive. If they can't, is "Coordinate with Redpanda Support" the right instruction, or is there a specific support process to reference? I see you included @wzzzrd86 here--good--as we should always check with support before placing guidance in the docs to contact support. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Approved to direct them to Support here, cloud customers currently cannot kick off a cloud restart |
||||||
| endif::[] | ||||||
|
Comment on lines
+95
to
+100
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wdberkeley could you please confirm if the rolling restart is the correct guidance for Self-managed, and that this restart isn't self-serve for Cloud users? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, a rolling restart of the whole cluster started after commit lag has dropped to zero for all topic-partitions and before re-enabling translation is the right move. The restart is required on all brokers so that they initialize with the correct catalog and the correct catalog authentication. I'm not sure about how Cloud users do restarts... is it really not self-serve for them? I think #help-cloud could confirm quickly.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wzzzrd86 Could you please confirm that we're good to include this restart guidance for Cloud users in this doc? |
||||||
|
|
||||||
| . After the cluster comes up, check broker logs for successful catalog requests and the absence of authentication errors to verify the new catalog connection. | ||||||
|
|
||||||
| . Resume Iceberg translation by restoring `redpanda.iceberg.mode` on every Iceberg topic to its previous value: | ||||||
| + | ||||||
| [,bash] | ||||||
| ---- | ||||||
| rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=<previous-mode> | ||||||
| ---- | ||||||
|
|
||||||
| . Restore `retention.ms` and `retention.bytes` on every Iceberg topic to the values you saved in step 1. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Referencing a step by number breaks if the order changes. The style guide discourages numbered step references in body text for the same reason it discourages step numbers in headings.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| == Verify the migration | ||||||
|
|
||||||
| After the migration completes, confirm that new data is reaching the new catalog: | ||||||
|
|
||||||
| * Query an Iceberg table in your query engine using the new catalog and confirm that row counts continue to increase as your topic produces new records. | ||||||
| * Check broker logs for any commit failures referencing the new catalog. Repeated failures often indicate a schema or partition spec mismatch. See <<troubleshooting,Troubleshooting>> for details. | ||||||
|
|
||||||
| == Troubleshooting | ||||||
|
|
||||||
| * Pending commits stuck after restart: A schema or partition spec mismatch between the original tables and the new catalog is the most common cause. See <<verify-catalog-compatibility,Verify catalog compatibility>>. If you cannot resolve the mismatch, contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda Support^]. | ||||||
| * Authentication errors against the new REST catalog: Verify that the credential cluster properties (for example, `iceberg_rest_catalog_client_id`, `iceberg_rest_catalog_client_secret`, `iceberg_rest_catalog_token`) match what the new catalog expects. For OAuth, also check `iceberg_rest_catalog_oauth2_server_uri`. | ||||||
| * Translation does not resume after restoring `redpanda.iceberg.mode`: Check that `redpanda_iceberg_pending_translation_lag` is increasing as new records are produced. If it remains `0`, the cluster is not translating new records. Verify that your producer is still writing to the topic and that the topic's mode value is one of `key_value`, `value_schema_id_prefix`, or `value_schema_latest`. | ||||||
|
|
||||||
|
kbatuigas marked this conversation as resolved.
|
||||||
| // end::single-source[] | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -20,7 +20,7 @@ For production deployments, Redpanda recommends <<rest,using an external REST ca | |||||
|
|
||||||
| In either case, you use the catalog to load, query, or refresh the Iceberg table as you produce to the Redpanda topic. See the documentation for your query engine or Iceberg-compatible tool for specific guidance on adding the Iceberg tables to your data warehouse or lakehouse using the catalog. | ||||||
|
|
||||||
| After you have selected a catalog type at the cluster level and xref:{about-iceberg-doc}#enable-iceberg-integration[enabled the Iceberg integration] for a topic, you cannot switch to another catalog type. | ||||||
| To switch to a different catalog type after you have xref:{about-iceberg-doc}#enable-iceberg-integration[enabled the Iceberg integration] for a topic, see xref:manage:iceberg/migrate-iceberg-catalog.adoc[]. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| [[rest]] | ||||||
| == Connect to a REST catalog | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit before merge