Preserve runconfig-checksum on pod template overrides#5149
Open
Preserve runconfig-checksum on pod template overrides#5149
Conversation
deploymentForMCPServer merged user-supplied
PodTemplateMetadataOverrides.Annotations onto the wrong base
map: it passed deploymentAnnotations (the Deployment-level
overrides, typically empty) instead of
deploymentTemplateAnnotations (which already had the
runconfig-checksum stamped on it). Because
ctrlutil.MergeAnnotations is "first-arg-wins on conflict", the
checksum was silently dropped from the pod template and any
Deployment-level annotations leaked onto pods.
Two user-visible symptoms followed:
1. Proxy pods stopped rolling on RunConfig changes whenever
PodTemplateMetadataOverrides.Annotations was set. Without
the checksum on the pod template, Kubernetes had no signal
to recreate pods, so telemetry / inline-OIDC / external-
auth-ref edits landed in the ConfigMap on disk but never
reached the running proxy. Vault Agent users were the
canonical population affected, since vault.hashicorp.com/*
keys live in this override field.
2. The drift-checker deploymentNeedsUpdate built the expected
pod-template annotations correctly, so it disagreed with
the constructor on every reconcile. The operator hot-looped
on r.Update without converging.
The fix flips one variable so the merge starts from
deploymentTemplateAnnotations. Add three layers of test
coverage in mcpserver_resource_overrides_test.go:
- an expectedPodTemplateAnns column on the table-driven
resource-overrides test so every case asserts on
deployment.Spec.Template.Annotations
- a focused regression test pinning the no-leakage contract
(checksum survives, user override survives, no extras)
- a parity test asserting deploymentNeedsUpdate reports no
drift immediately after deploymentForMCPServer with the
same checksum and overrides
Why this is safe:
- The comparator deploymentNeedsUpdate in the same file was
already merging in this order, and the sibling controller
MCPRemoteProxyReconciler.buildPodTemplateMetadata uses
this exact pattern. The fix brings the buggy constructor
in line with both.
- Kubernetes convention treats Deployment-level and
pod-template annotations as separate scopes; the CRD
exposes them as separate user-facing fields
(ProxyDeployment.Annotations vs
ProxyDeployment.PodTemplateMetadataOverrides.Annotations).
No documentation, example (including
examples/operator/vault/mcpserver-github-with-vault.yaml),
or test in this repository places sidecar-injection
annotations at the deployment level, so no documented
user is regressed.
- MergeAnnotations keeps the first-argument map winning on
conflict, so a user cannot accidentally overwrite the
runconfig-checksum from their override map.
Fixes #5148
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5149 +/- ##
==========================================
+ Coverage 67.53% 67.58% +0.04%
==========================================
Files 601 601
Lines 61093 61093
==========================================
+ Hits 41262 41288 +26
+ Misses 16714 16686 -28
- Partials 3117 3119 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ChrisJBurns
approved these changes
May 1, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a user populates
Spec.ResourceOverrides.ProxyDeployment.PodTemplateMetadataOverrides.Annotationson an MCPServer (e.g. for Vault Agent injection),deploymentForMCPServerwas passing the deployment-level annotations map (typically empty) instead of the pod-template annotations map (which already carried therunconfig-checksum) as the precedence base when merging in the user overrides. Becausectrlutil.MergeAnnotationsis first-arg-wins, the checksum was silently dropped from the pod template, breaking the rolling-restart-on-RunConfig-change mechanism for any setup using pod-template overrides. The same asymmetry caused the drift-checker to disagree with the constructor on every reconcile, leaving the operator in a perpetualr.Updateloop.This change flips one variable so the merge starts from the pod-template annotations map. The
MCPRemoteProxysibling controller already merged correctly; this alignsMCPServerwith it and with its own drift-checker. Three layers of regression tests are added so the bug class cannot return.Fixes #5148
Type of change
Test plan
task test) —cmd/thv-operator/controllersis green, including the new tests. There is one pre-existing, unrelated failure inpkg/workloads/manager_test.go(TestDefaultManager_ListWorkloadsInGroup/workloads_with_empty_group_names) that reproduces with this PR's diff reverted, so it is not introduced here.task lint-fix) —0 issues.API Compatibility
v1beta1API.Changes
cmd/thv-operator/controllers/mcpserver_controller.godeploymentForMCPServer: passdeploymentTemplateAnnotations(which already carries the runconfig-checksum) instead ofdeploymentAnnotations(deployment-level, typically empty) as the precedence base forMergeAnnotations.cmd/thv-operator/controllers/mcpserver_resource_overrides_test.goexpectedPodTemplateAnnscolumn to the table-driven test; addTestDeploymentForMCPServer_PodTemplateOverridesPreserveRunConfigChecksum(focused regression) andTestDeploymentNeedsUpdate_StableAfterBuildWithPodTemplateOverrides(constructor/comparator parity guardrail).Does this introduce a user-facing change?
Yes — bug fix:
PodTemplateMetadataOverrides.Annotationsset will now correctly see proxy pods roll on RunConfig changes (telemetry, inline OIDC, external auth refs). Previously these edits silently failed to reach the running proxy.ProxyDeployment.Annotationsno longer leak into the pod template. No documentation, example, or test in this repository placed sidecar-injection annotations (Vault Agent, Linkerd, Istio) at the deployment level, so no documented user pattern is regressed. Users who need the same key on both levels can set both fields independently.Implementation plan
Approved implementation plan (AI-assisted)
Plan summary:
deploymentForMCPServerto align with the existing comparator and theMCPRemoteProxysibling pattern.deployment.Spec.Template.Annotations. Catches missing-checksum and deployment-level-leak regressions in one place.deploymentForMCPServerand immediately callsdeploymentNeedsUpdatewith the same checksum, asserting no drift. Long-term guardrail against the asymmetry that caused the perpetualr.Updateloop.Each step was implemented separately and reviewed in parallel by go-expert and kubernetes-go-expert agents (Opus). Reviewers verified:
MergeAnnotationsfirst-arg-wins semantics through the constructor and the comparator).MCPRemoteProxysibling, and Kubernetes convention (Deployment-level vs pod-template annotations are intentionally separate scopes).examples/operator/vault/mcpserver-github-with-vault.yamlanddocs/arch/04-secrets-management.mdboth place / recommend Vault annotations underpodTemplateMetadataOverrides.annotations— the field unaffected by the precedence-base swap.The four implementation commits were squashed for merge.
Special notes for reviewers
The fix is one token; the bulk of the diff is regression coverage. The drift-checker
deploymentNeedsUpdate(~line 1797 in the same file) was already merging in the correct order — the constructor was the outlier. TheMCPRemoteProxyReconciler.buildPodTemplateMetadatasibling atcmd/thv-operator/controllers/mcpremoteproxy_deployment.go:328is the existing correct model.Optional follow-up out of scope for this PR:
ProxyDeploymentOverrides.PodTemplateMetadataOverridesincmd/thv-operator/api/v1beta1/mcpserver_types.go:373has no godoc comment (the equivalent inembeddingserver_types.go:155does). Adding a comment that explicitly directs users to put sidecar-injection annotations here, not in the embeddedAnnotationsfield, would close a documentation gap surfaced by the analysis.Generated with Claude Code