charts/redpanda: support schema_registry_client SASL credentials via secretRef#1520
charts/redpanda: support schema_registry_client SASL credentials via secretRef#1520david-yu wants to merge 10 commits into
Conversation
|
@david-yu FYI I just rebased my feature branch onto main and pushed. |
|
@AldoFusterTurpin Thanks I will pull in your changes. |
cff7eb9 to
b7ac273
Compare
|
@AldoFusterTurpin Are you using the Operator or only the Helm chart? If only Helm chart I assume you are simply creating users using rpk? |
(sorry for the delay, timezones difference 🙂) We are using the redpanda-operator, and the topics were created using the topic CR and the users using the user CR. I opened an issue with some problems I faced (before I opened this PR), in case you want to check the exact config. Thank you! |
End-to-end on local kind — bug found and fixed (commit cb8c8c0)After bumping The chart-rendered artifacts looked right at render-time:
But the runtime schema_registry_client:
brokers:
- address: rp-0.rp.redpanda.svc.cluster.local.
port: 9093
broker_tls:
truststore_file: /etc/tls/certs/default/ca.crt
enabled: trueNo Root causeThe chart has two init containers:
The fixups in Why the SR test on a single-broker cluster still appeared to "work": Redpanda runs SR in-process with the broker, so SR's writes to Why the PR's existing tests missed it: all four ( Fix (commit
|
|
Will mark as ready for review and close the original PR. |
…secretRef The V2 Helm chart had no way to configure SASL credentials for the schema registry's internal Kafka client without storing plaintext in the ConfigMap. This adds config.schema_registry_client.saslSecretRef (a reference to a Kubernetes Secret) which injects credentials at pod start via the existing redpanda.yaml.fixups mechanism.
Split into two functions instead of returning both values as both fields are never used from the same caller
…tRef Add the missing charts/redpanda Added changelog entry and update the lifecycle TestV2ResourceClient golden file: every rendered ConfigMap now carries an empty redpanda.yaml.fixups: '[]' entry, since the configurator looks for this file unconditionally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…figurator End-to-end testing on a local kind cluster revealed that the chart's standalone install never applied the redpanda.yaml.fixups entries this PR adds: the redpanda-configurator init container runs a bash script generated by SecretConfigurator() that uses `rpk redpanda config set` to patch specific listener fields, but does not read redpanda.yaml.fixups or apply any CEL expressions. The Go-based configurator at operator/cmd/configurator/configurator.go does process fixups, but it is only invoked from the V1 operator's Cluster CR flow — not from the chart's helm-install path. In single-broker clusters this defect is invisible: the schema registry is in-process with the broker and its writes to _schemas go through internal Seastar calls, never exercising the schema_registry_client SASL configuration. In multi-broker clusters where SR routes to a non-local leader, the missing scram_username/scram_password causes the SR's Kafka client to fail SASL handshake. Fix: in SecretConfigurator, after the existing advertised-listener patches, when auth.sasl is enabled and config.schema_registry_client .saslSecretRef is set, emit three additional `rpk redpanda config set` lines that read SCHEMA_REGISTRY_CLIENT_USERNAME / _PASSWORD env vars (already projected from the named Secret in statefulset.go) and patch schema_registry_client.scram_username / .scram_password / .sasl_mechanism in /etc/redpanda/redpanda.yaml. Wrap them in `set +x` / `set -x` so the password is not echoed to the init container's stdout. Tests: - Extend TestTemplate/sasl-schema-registry-client-secret-ref to assert the rendered redpanda-configurator Secret's configurator.sh contains both the env var references and the field paths. - Extend TestTemplate/sasl-disabled-secret-ref-ignored to assert the configurator.sh does NOT contain the SR scram patch when SASL is off. The redpanda.yaml.fixups entries remain in the ConfigMap unchanged — they still get applied on the V1 operator path. The new bash lines mirror their effect for the standalone-chart and V2-operator path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cosmetic-only cleanup of existing subtests in TestIntegrationChart: - order `env := h.Namespaced(t)` before `ctx := testutil.Context(t)` in mtls-using-cert-manager, mtls-using-self-created-certificates, admin api auth required, and admin api auth required - pre-existing secret, so all subtests in this test follow the same setup pattern - drop the redundant `Namespace: env.Namespace()` from helm.InstallOptions in those subtests; the namespaced env already pins the install namespace, so passing it again was a no-op No behavior change.
…retRef Adds the "schema registry client - pre-existing secret" subtest to TestIntegrationChart, exercising the new config.schemaRegistryClient.saslSecretRef path end-to-end against a real cluster: - creates a basic-auth Secret holding the SR client username/password (kubernetes.io/basic-auth keys) and a separate `users` Secret with a users.txt entry for a SCRAM-SHA-512 superuser - installs the chart with auth.sasl.enabled, auth.sasl.secretRef -> users, and config.schemaRegistryClient.saslSecretRef pointing at the basic-auth Secret, with 3 replicas to also surface multi-broker startup paths - after install, creates the referenced SCRAM user via the admin API and grants the minimum ACLs Schema Registry needs (topic _schemas and group prefix schema-registry) so the SR client can actually authenticate - verifies the Schema Registry listener works by registering and reading back a schema (schemaRegistryListenerTest), proving the fixups + bash configurator path produces a redpanda.yaml that lets SR talk to the SASL-enabled Kafka API Two small test helpers are added in client_test.go to keep the new subtest readable: - Client.CreateSASLUser: creates a SCRAM user via the admin API using redpanda.DefaultSASLMechanism, matching what the chart currently hard-codes for the SR client - Client.CreateACL: runs `rpk security acl create` inside broker-0 so ACL setup uses the same tooling operators would use in practice This is intentionally an integration test (not a TestTemplate assertion) because the SR-SASL feature relies on the bash configurator consuming redpanda.yaml.fixups at pod start; template tests only prove the fixup file and env vars render, not that the effective /etc/redpanda/redpanda.yaml ends up with valid SR credentials.
cb8c8c0 to
eb8142d
Compare
RafalKorepta
left a comment
There was a problem hiding this comment.
I feel like RedpandaYamlFixupFile obsolete from the whole change. The bash script with rpk commands is doing the process of settings schema_registry_client.scram_username and schema_registry_client.scram_password.
I personally would love to see init container that not only expands bootstrap file but redpanda too. We don't have that feature in operator subcommand, but as comment suggested operator V1 have this.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@david-yu Thank you for testing it and applying the bug fix for one of the paths. This is my first contribution so I was not aware that there were 2 ways of applying that configuration, that's why I missed the "The chart's helm-install path uses the bash configurator instead, so the fixups sit unused.", sorry about that. Thanks! |
|
@david-yu @RafalKorepta Could you please check if this comment makes sense ? I am facing some unexpected problems in the redpanda cluster and I am not sure if the operator config could be related. I would really appreciate that. |
|
@david-yu I have just opened another PR that could help us with the connections churn we are seeing in our cluster: I would really appreciate if you can take a look when you can. Thank you! |
Summary
Re-creation of #1503 (by @AldoFusterTurpin) onto current
main, with all original commits preserved and a missing changelog entry + lifecycle golden file update added on top.The V2 Helm chart had no way to configure SASL credentials for the schema registry's internal Kafka client (
schema_registry_client) without storing plaintext passwords in the ConfigMap. This is a regression versus the V1 operator, which already supports this viaoperator/pkg/resources/configuration.go:488-563.This PR adds a
config.schema_registry_client.saslSecretReffield that references a Kubernetes Secret containing the SASL username and password. Credentials are injected at pod start using the existingredpanda.yaml.fixupsmechanism (the same pattern the V1 operator uses), so they never appear in plaintext in the ConfigMap or Helm release history.How it works
config.schema_registry_client.saslSecretRef.name=<secret-name>where the Secret has keysusernameandpassword.redpanda.yaml.fixupsentry into the ConfigMap telling the configurator init container to patchschema_registry_client.scram_username,schema_registry_client.scram_password, andschema_registry_client.sasl_mechanism.secretKeyRef, applies them toredpanda.yamlbefore writing it to the shared volume, and the Redpanda container starts with credentials in place.The field names written to
redpanda.yaml(scram_username,scram_password,sasl_mechanism) are the documented Redpanda broker properties: https://docs.redpanda.com/current/reference/properties/broker-properties/#schema-registry-clientStep-by-step example (Operator +
UserCRD)This walkthrough assumes the Redpanda Operator is installed and that you manage clusters via the
RedpandaCR and SCRAM users via theUserCR (cluster.redpanda.com/v1alpha2). All commands are namespace-scoped toredpanda— adjust as needed.The headline benefit of the operator +
UserCR path is that the SCRAM credential and the configurator-injected env var are populated from the same Kubernetes Secret. The operator watches the Secret, so rotation is a one-step Secret update — norpk security user update, no race window.1. Install the operator
Confirm the CRDs landed:
2. Create the namespace and bootstrap superuser secret
kubectl create namespace redpanda kubectl -n redpanda create secret generic redpanda-bootstrap-user \ --from-literal=password='REPLACE_WITH_STRONG_PASSWORD'3. Create the schema registry's SASL user secret
This Secret is the single source of truth for the schema-registry-client SCRAM credential. The new chart field reads it, and the
UserCR in step 5 reads it. The keys must be exactlyusernameandpassword(matchingcorev1.BasicAuthUsernameKey/corev1.BasicAuthPasswordKey):4. Apply the
RedpandaCRThe operator takes
spec.clusterSpecand feeds it into the chart — same field shape as a Helmvalues.yaml, including the newconfig.schema_registry_client.saslSecretRef.kubectl apply -f redpanda.yaml kubectl -n redpanda wait redpanda/redpanda \ --for=condition=Ready --timeout=10m5. Apply the
UserCRThis is what actually registers the
schema-registry-clientusername/password as a SCRAM credential inside the cluster — the operator reconciles it by reading the password out of the sameschema-registry-saslSecret the chart'ssaslSecretRefpoints at.kubectl apply -f user-schema-registry.yaml kubectl -n redpanda wait user/schema-registry-client \ --for=condition=Ready --timeout=2mThere is no chicken-and-egg: the schema registry retries SASL handshakes against its local Kafka API until the operator finishes reconciling the user. If the cluster comes up before the
Useris Ready, the schema registry just retries until it can authenticate.6. Verify nothing is in plaintext
The ConfigMap should contain only CEL fixup expressions — never the actual password:
Expected output:
[ {"field":"schema_registry_client.scram_username","cel":"envString(\"SCHEMA_REGISTRY_CLIENT_USERNAME\")"}, {"field":"schema_registry_client.scram_password","cel":"envString(\"SCHEMA_REGISTRY_CLIENT_PASSWORD\")"}, {"field":"schema_registry_client.sasl_mechanism","cel":"\"SCRAM-SHA-512\""} ]And the configurator init container should source the values from
secretKeyRef— not as inline literals:Expected output:
7. Create the demo topic with a
TopicCRSince the operator is already running, we use the
TopicCR instead ofrpk topic create. This keeps every cluster-state object (the cluster, the SCRAM user, and now the topic) declared the same way and reconciled by the operator.kubectl apply -f topic-demo.yaml kubectl -n redpanda wait topic/demo \ --for=condition=Ready --timeout=2m8. End-to-end smoke test with
rpkand the Schema RegistryThe Redpanda image bundles
rpk, and the pods already haveRPK_USER/RPK_PASS/RPK_SASL_MECHANISMenv vars wired to the bootstrap user — so we can exec into a pod and exercise both Kafka SASL and the Schema Registry without copying any password to our workstation.Test 1 — Kafka SASL (proves the bootstrap user can produce/consume against the operator-managed topic):
If SASL is wired correctly you'll see the message echoed back. A
SASL_AUTHENTICATION_FAILEDhere means the bootstrap user secret is wrong — not related to this PR.Test 2 — Schema Registry against the cluster (proves the new
saslSecretRefworks):This is the actual feature under test. The schema registry's internal Kafka client must authenticate as
schema-registry-clientto read/write its_schemastopic. Every command below succeeds only if the credentials fromschema-registry-saslwere correctly injected at pod start.Test 3 — Produce a schema-bound record with
rpk:If you see
SASL_AUTHENTICATION_FAILEDorunable to fetch schemafromrpk, double-check that theUserCR has reconciled:kubectl -n redpanda get user schema-registry-client -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'It should report
"status":"True". If not, inspect the operator's logs:kubectl -n redpanda-system logs deploy/redpanda-operator | grep schema-registry-client9. Rotation
The
UserCR watches the same Secret the chart'ssaslSecretRefpoints at, so password rotation collapses to two non-racy steps:The operator typically completes the SCRAM credential update inside a second or two; the pod rollout takes longer than that, so by the time the first restarted pod tries to authenticate, the cluster already has the new credential. No
rpk security user update, no race window.How the previous Helm chart fell short
config.schema_registry_clientblock (invalues.yamland theSchemaRegistryClientGo struct atcharts/redpanda/values.go:1195) only exposed tuning knobs — retries, batch sizes, consumer timeouts. There was no field forscram_username/scram_password/sasl_mechanism, and no slot for asecretRef.config.node(free-form key/value), which gets serialized verbatim into the ConfigMap'sredpanda.yaml. Setting credentials that way embeds the cleartext password into the ConfigMap, the Helm release history (Secret of kindhelm.sh/release.v1), and anyhelm get manifestoutput.operator/pkg/resources/configuration.go:488references the schema-registry SASL user secret and emitsAddFixupcalls plusEnsureInitEnvenv vars on the configurator init container, the same pattern this PR brings to the V2 chart. Users moving from the V1 operator to the V2 chart hit a real regression: clusters withauth.sasl.enabled=truesaw the schema registry fail to authenticate to its own Kafka API unless they were willing to bake the password into version control.Authorship
Five commits authored by @AldoFusterTurpin are preserved (cherry-picked from #1503). A single follow-up commit by me adds the missing
charts/redpandachangelog entry and regenerates the lifecycleTestV2ResourceClientgolden file (every rendered ConfigMap now carries an emptyredpanda.yaml.fixups: '[]'entry because the configurator looks for that file unconditionally).Test plan
TestSASLFixups(unit): asserts fixup field names and CEL expressionsTestSASLEnvVars(unit): asserts env var names andsecretKeyRefshapeTestTemplate/sasl-schema-registry-client-secret-ref: verifies env vars in the configurator init container and fixup entries in the ConfigMap when SASL is enabledTestTemplate/sasl-disabled-secret-ref-ignored: verifies nothing is injected whenauth.sasl.enabled=falseTestV2ResourceClientlifecycle golden tests pass with the regenerated golden filehelm lint --strictpasses with the feature enabledgo test ./charts/redpanda/andgo test ./operator/internal/lifecycle/pass locally🤖 Generated with Claude Code