Skip to content

Add StorageVersionMigrator controller to clean up storedVersions#5011

Draft
ChrisJBurns wants to merge 9 commits intomainfrom
worktree-storage-version-controler
Draft

Add StorageVersionMigrator controller to clean up storedVersions#5011
ChrisJBurns wants to merge 9 commits intomainfrom
worktree-storage-version-controler

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns commented Apr 22, 2026

Summary

Adds a new StorageVersionMigrator controller to the ToolHive operator that watches opted-in toolhive.stacklok.dev CRDs and reconciles each one's status.storedVersions down to the current storage version, so a future release can drop deprecated API versions (e.g. v1alpha1) from spec.versions without orphaning etcd objects.

Closes #4969

Medium level
  • Controllercmd/thv-operator/controllers/storageversionmigrator_controller.go. Watches apiextensions.k8s.io/v1/CustomResourceDefinition (metadata-only), filtered by opt-in label + group. Per-CRD: reads spec.versions to find the storage version, lists every CR via paginated unstructured list, then for each CR does a plain Get + Update of the unmodified live object. The apiserver re-encodes the request body at the current storage version and byte-compares against etcd; when the CR was stored at a different apiVersion (the migration scenario) the comparison fails and the write proceeds, re-encoding the object. When the CR is already at the current storage version, the comparison matches and the apiserver harmlessly elides the write — there was nothing to migrate. This matches upstream kube-storage-version-migrator's mechanism. Once every CR has been processed, patches status.storedVersions via optimistic-lock merge. 1h migration cache (with periodic background GC) dedupes Get+Update RPCs across reconciles.
  • Opt-in label — every v1beta1 root type carries a +kubebuilder:metadata:labels=toolhive.stacklok.dev/auto-migrate-storage-version=true marker, baked into the generated CRD YAML by task operator-manifests. New CRDs opt in by adding the same marker; CRDs that shouldn't be migrated omit it. A CI test (TestV1beta1TypesMarkerCoverage) fails the build if a root type is added without either this marker or an explicit +thv:storage-version-migrator:exclude sibling.
  • Feature flagoperator.features.storageVersionMigrator in the helm chart (default true) wires to ENABLE_STORAGE_VERSION_MIGRATOR following the existing ENABLE_SERVER / ENABLE_REGISTRY / ENABLE_VMCP pattern. Admins who prefer to run kube-storage-version-migrator externally can disable it.
  • Conflict-safe trim — if any per-CR re-store hits an IsConflict (concurrent writer), the controller refuses to trim storedVersions on that pass and returns a sentinel error so the next reconcile retries cleanly. The post-conflict state of the affected object is unverified, so trimming would be reasoning from an unproven premise.
  • Tests — envtest suite at cmd/thv-operator/test-integration/storageversionmigrator/ with 8 scenarios: noop-when-clean, happy-path-trims-storedVersions, re-encodes CRs that are stored at a prior storage version (the load-bearing test that proves the migration mechanism actually rewrites etcd — installs a CRD with v1alpha1 storage, creates a CR, flips storage to v1beta1, asserts the CR's resourceVersion bumped after reconcile), partial-failure leaves storedVersions untouched, conflict-during-restore leaves storedVersions untouched, skips-foreign-CRDs, skips-unlabelled-CRDs, pagination (continue-token loop verified via a counting APIReader wrapper). Each test installs its own CRD fixture and calls Reconcile directly — no background manager, no timing races.
  • Docsdocs/operator/storage-version-migration.md covers the actual write mechanism, webhook interaction, the opt-in label contract, feature-flag opt-out, the admin per-CRD escape hatch, and coordination with the future v1alpha1-removal release.
Low level
File Change
cmd/thv-operator/controllers/storageversionmigrator_controller.go Reconciler, SetupWithManager, migration cache + periodic GC, conflict-safe trim, RBAC markers
cmd/thv-operator/controllers/marker_coverage_test.go Scans api/v1beta1/*_types.go and fails if a root type lacks either the migrate marker or an explicit exclude marker
cmd/thv-operator/test-integration/storageversionmigrator/suite_test.go envtest bootstrap (no manager; direct Reconcile calls)
cmd/thv-operator/test-integration/storageversionmigrator/controller_test.go 8 ginkgo scenarios incl. cross-version re-encode empirical proof
cmd/thv-operator/main.go Register apiextensions/v1 scheme, add featureStorageVersionMigrator const, setupStorageVersionMigrator() wiring
cmd/thv-operator/api/v1beta1/*_types.go (12 files) Add +kubebuilder:metadata:labels=... marker to each root type
deploy/charts/operator-crds/files/crds/*.yaml + templates/*.yaml (12 × 2) Regenerated CRDs — label now present in each CRD's metadata.labels
deploy/charts/operator/templates/clusterrole/role.yaml Regenerated RBAC — adds customresourcedefinitions get/list/watch + /status update;patch + toolhive.stacklok.dev/* get/list/update + */status update
deploy/charts/operator/values.yaml New operator.features.storageVersionMigrator: true
deploy/charts/operator/templates/deployment.yaml New ENABLE_STORAGE_VERSION_MIGRATOR env var
deploy/charts/operator/README.md Regenerated via task helm-docs
docs/operator/storage-version-migration.md New user-facing docs

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation only

Test plan

  • task lint-fix — 0 issues
  • task build — clean
  • Operator unit tests (go test $(go list ./cmd/thv-operator/... | grep -v /test-integration)) — all green
  • Envtest suite — 8/8 pass (task operator-test-integration path, or directly with KUBEBUILDER_ASSETS=$(setup-envtest use 1.31.0 -p path) go test ./cmd/thv-operator/test-integration/storageversionmigrator/)
  • Marker coverage test — passes against all 12 current v1beta1 root types
  • task operator-manifests + task helm-docs produce no drift against committed state
  • helm template deploy/charts/operator renders the new env var correctly

Special notes for reviewers

The write mechanism in this PR has been through three iterations during review — final state described below; the journey is captured in the inline comment threads.

The current implementation is plain Get + Update of the unmodified CR — exactly what upstream kube-storage-version-migrator does. The Kubernetes apiserver's storage layer re-encodes the request body at the current storage version, then byte-compares against etcd. When the CR was originally stored at an older apiVersion (the actual migration scenario), the encoded apiVersion stamp differs from etcd's, the comparison fails, and the write proceeds — re-encoding the object. When the CR is already at the current storage version, the bytes match and the apiserver elides the write harmlessly. This is the upstream maintainers' explicit reasoning, captured at kube-storage-version-migrator#65.

A new envtest case (re-encodes CRs that are stored at a prior storage version) proves the migration mechanism empirically: install a CRD with v1alpha1 as storage, create a CR (etcd writes apiVersion: v1alpha1 bytes), flip storage to v1beta1, run the reconciler, and assert the CR's resourceVersion bumped. Observed bump 202→204; the elision sanity check on a same-storage-version CR shows RV stays at 204 as expected.

PR size: ~450 production lines across 15 files, slightly over the 400-line / 10-file convention. 12 of those files are 1-line kubebuilder marker additions in api/v1beta1/*_types.go; the substantive Go work is 3 files (controller, main.go wiring, helm values/deployment). I did not split because the controller + opt-in labels + marker-coverage CI test are one cohesive feature — splitting would mean landing unused labels or an unused controller mid-stack.

Why this PR must ship at least one release before v1alpha1 is removed: the migrator needs time to walk every cluster's storedVersions during the deprecation window. Shipping it in the same release as the version removal provides zero benefit because clusters jumping straight to that release won't have run the migrator yet. Documented in docs/operator/storage-version-migration.md.

Follow-ups not in scope for this PR:

  • Optional chainsaw smoke test against a real Kind cluster (envtest covers the same behaviour at the apiserver level; chainsaw would add coverage for the helm chart wiring and admission-webhook compatibility specifically).
  • Audit existing operator webhooks (MCPServer, MCPGroup, VirtualMCPServer) for compatibility with migration-induced Updates. Migration writes go through normal admission flow on objects that need re-encoding; webhooks should already tolerate metadata-stable Updates but worth empirical validation in a kind cluster.

🤖 Generated with Claude Code

ChrisJBurns and others added 7 commits April 22, 2026 13:25
Prepares the operator manager for an upcoming controller that watches
CustomResourceDefinition objects. No runtime behaviour change yet.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the kubebuilder marker

    +kubebuilder:metadata:labels=toolhive.stacklok.dev/auto-migrate-storage-version=true

to every v1beta1 root type and regenerates the CRD manifests. The label
is the opt-in signal for an upcoming StorageVersionMigrator controller
that will reconcile each CRD's status.storedVersions down to the current
storage version. List types are intentionally not labelled — the label
applies to the CRD itself, which is keyed on the root type.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces a new operator controller that watches opted-in
toolhive.stacklok.dev CRDs and reconciles each one's status.storedVersions
down to the current storage version, so a future release can drop
deprecated versions (e.g. v1alpha1) from spec.versions without orphaning
etcd objects.

Mechanism: for each CR of a managed kind, issue a metadata-only
Server-Side Apply against the /status subresource with a dedicated
field manager (thv-storage-version-migrator). SSA on /status bypasses
the admission chain (ToolHive has validating webhooks on several root
types) while still forcing the API server to re-encode each object at
the current storage version. Once every CR succeeds, the CRD's
status.storedVersions is patched to [<currentStorageVersion>] with an
optimistic-lock merge.

Scoping: CRDs must carry both the auto-migrate label (set via kubebuilder
marker in a previous commit) and a name under toolhive.stacklok.dev.
Wired into main.go behind the ENABLE_STORAGE_VERSION_MIGRATOR feature
flag (default on). Helm chart surface + envtest coverage land in
subsequent commits.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven table-driven scenarios exercising the reconciler against a real
kube-apiserver + etcd via envtest:

  - noop when storedVersions is already [storageVersion]
  - happy path: [v1alpha1,v1beta1] -> [v1beta1] with CRs re-stored
  - foreign-group CRDs are skipped
  - unlabelled toolhive CRDs are skipped
  - fallback to main-resource SSA when /status is absent
  - pagination across the continue-token loop (verified via a counting
    APIReader wrapper since metadata-only SSAs leave no managedFields
    fingerprint to assert on)
  - partial failure leaves storedVersions untouched (failure injected
    via a client wrapper that fails Apply for one specific CR key)

Each test builds its own CRD fixture and calls Reconcile directly; the
suite does not start a manager so tests are deterministic and free from
races with a background controller. Runs via `task operator-test-integration`
alongside the existing virtualmcp suite.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Guards against the main footgun of the opt-in-label design for
storage-version migration: if a new CRD root type is added to
cmd/thv-operator/api/v1beta1/ without the migrate-marker, the
StorageVersionMigrator controller silently excludes it. The problem
surfaces only when a future release tries to drop a deprecated version
— at which point it is far too late to fix.

The test scans every root type (marker block contains both
+kubebuilder:object:root=true and +kubebuilder:storageversion) and
fails unless the block also contains either the migrate marker or an
explicit +thv:storage-version-migrator:exclude sibling marker.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds operator.features.storageVersionMigrator (default true) plus the
ENABLE_STORAGE_VERSION_MIGRATOR env var wiring in the deployment
template, mirroring the existing server/registry/virtualMCP pattern.
Helm docs regenerated.

Admins who run kube-storage-version-migrator externally can disable
the in-operator controller by setting this flag to false.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers what the controller does, the opt-in label contract, how to
disable the controller operator-wide, the per-CRD escape hatch and its
interaction with GitOps, how the migrator interacts with future
version-removal releases, and the required RBAC.

Part of #4969.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I'd like to flag. The two bigger ones are about the SSA body being empty and about swallowed conflicts before we trim storedVersions, curious whether those are intentional. The rest are smaller.

patch.SetUID(original.GetUID()) // UID mismatch → typed conflict on races
patch.SetResourceVersion(original.GetResourceVersion()) // RV mismatch → typed conflict on races

applyConfig := client.ApplyConfigurationFromUnstructured(patch)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSA body here only carries name/namespace/uid/resourceVersion, no fields inside spec or status. On /status SSA that's effectively an empty apply, which the apiserver typically short-circuits without rewriting etcd. If that's what's happening then we trim storedVersions while the underlying bytes haven't actually been re-encoded, which is the exact footgun this controller is meant to prevent.

For reference, kube-storage-version-migrator does an Update with the full re-decoded object precisely to force the storage write. Another option is to include a single managed field here (e.g. an annotation like toolhive.stacklok.dev/last-restored-at) so the apply is non-empty.

Worth checking: is the metadata-only SSA actually forcing a re-encode in practice? The existing envtest asserts storedVersions converges but can't distinguish "bytes rewritten" from "storedVersions lied".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically confirmed — wrote a probe that runs the exact metadata-only /status SSA against envtest and snapshots resourceVersion before/after. Result: rvBefore=201 rvAfter=201 changed=false. The apiserver short-circuits the empty apply, etcd is never re-encoded, and the controller's correctness claim was hollow.

Fixed in dad30c9 by switching to Get + Update on the main resource, mutating the toolhive.stacklok.dev/storage-version-migrated-at annotation to force a real diff. This matches the upstream kube-storage-version-migrator pattern. Plain Status().Update(ctx, live) was tried first and is also insufficient — Update with no diff is elided too, so the annotation toggle is load-bearing.

Trade-off: the Update goes through the main resource so admission webhooks see the write. Acknowledged in the controller docstring and docs/operator/storage-version-migration.md. Webhooks that need to ignore migration-only writes can branch on the annotation in the diff. The original /status webhook-bypass benefit was always notional since the SSA wasn't writing etcd at all.

The HappyPath envtest now snapshots each CR's resourceVersion before reconcile and asserts it has bumped — empirical proof the bug is fixed.

if err := r.restoreOne(ctx, gvk, u, hasStatusSubresource); err != nil {
if apierrors.IsNotFound(err) || apierrors.IsConflict(err) {
logger.V(1).Info("skip CR — deleted or updated elsewhere",
"object", client.ObjectKeyFromObject(u), "err", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsNotFound is fine to swallow, but I'm not sure IsConflict is. The comment says "storage is already fresh", but a concurrent writer using the deprecated API version (still served during the deprecation window) produces a write encoded at the old version. If our SSA races with that writer and comes back Conflict, the CR is still stored at the old version, we silently skip it, and then patchStoredVersions trims storedVersions to [storageVersion]. That leaves storedVersions claiming we're clean while there's still an old-version-encoded object in etcd.

Options: retry the apply on Conflict (bounded), or track a "swallowed-conflict" flag and refuse to trim storedVersions on the pass where any occurred. Either way I don't think we should treat Conflict the same as NotFound here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fixed in dad30c9. restoreCRs now tracks per-pass conflict count in a function-local counter. IsConflict is logged at V(1) and skipped per-CR (still treated as transient), but if any conflicts occurred and no other errors did, the function returns a sentinel errMigrationRetriedDueToConflicts. Reconcile propagates it without calling patchStoredVersions, so storedVersions stays at [v1alpha1, v1beta1] until a conflict-free pass succeeds.

Added a new test ConflictDuringRestoreLeavesStoredVersionsUntouched that injects an apierrors.NewConflict(...) for one CR via a wrapping client, asserts the sentinel error is returned, asserts storedVersions is untouched, then drops the injection and verifies the next reconcile completes cleanly.

IsNotFound is still a clean swallow with no error — the object is gone, it can't be at the old version anymore.

}
}

func (c *migrationCache) has(crdName string, uid apitypes.UID, resourceVersion string) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache only evicts expired entries when has() is called with the same key. Once a CR is deleted from the cluster its UID never comes up in a list again, so its entry stays resident until the operator restarts. For an operator that runs for weeks with CR churn (ephemeral MCPServers, GitOps recreations) the map grows unbounded, each entry is small but never released.

Is this an accepted tradeoff given the 1h TTL and typical restart cadence? If so, happy to drop this. Otherwise a background sweep via mgr.Add(RunnableFunc(...)) or an expirable LRU would bound it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the lazy eviction never frees entries for deleted CRs. Fixed in dad30c9 by adding a periodic background sweep: a manager.RunnableFunc registered in SetupWithManager ticks every 10m (configurable via the new CacheGCInterval field) and calls cache.gc() to evict expired entries unconditionally. Ticker is Stop()'ed via defer; the runnable returns on ctx.Done() so it shuts down cleanly with the manager.

Went with hand-rolled GC over k8s.io/utils/lru to keep the public API surface small. Easy to swap later if we want capacity-bounding too.

servedCount++
}
}
return servedCount > 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this branch produces the opposite of what the doc comment describes. When storedVersions is already [storageVersion] and more than one version is still served (the common long-lived state during a multi-release deprecation window), this returns true, which forces a full list pass on every CRD event. The migrationCache suppresses the per-CR SSAs but the list call itself still fires, and a restart clears the cache.

If we want to re-scan periodically during deprecation as a defensive measure, worth stating so and acknowledging the cost. If not, return len(stored) != 1 || stored[0] != storageVersion on its own should be sufficient, new writes at the old version would append to storedVersions and re-trigger migration via the first branch.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that under None-strategy with identical schemas, normal writers cannot reintroduce stale versions to storedVersions, so the defensive re-scan has no scenario to defend against. Simplified in dad30c9 to your suggested form: return len(stored) != 1 || stored[0] != storageVersion. Doc comment updated to be honest about the (lack of) defensive value.


//+kubebuilder:rbac:groups=apiextensions.k8s.io,resources=customresourcedefinitions,verbs=get;list;watch
//+kubebuilder:rbac:groups=apiextensions.k8s.io,resources=customresourcedefinitions/status,verbs=update;patch
//+kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=*,verbs=get;list;patch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, just wanted to surface this for sign-off. The wildcard grants get/list/patch across every current and future kind in the toolhive group, not just the opted-in ones. I get that the opted-in set isn't known at codegen time, so the alternative (explicit per-kind rules) is awkward. Probably fine, but worth a brief comment on the marker noting the intentional wildcard plus the isManagedCRD gate as defense-in-depth.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment in dad30c9 — explicit note above the kubebuilder marker block explaining the wildcard is intentional (opt-in set isn't known at codegen time), with isManagedCRD documented as the runtime gate forming defence in depth.

While I was there: the verbs themselves changed too — patchupdate for both the main resource and /status to match the new write path (Update + annotation toggle, see your other comment). RBAC regenerated via task operator-manifests.

// Per-kind escape hatch: remove the label from the CRD (emergency only — will
// be re-applied by GitOps / helm upgrade).
type StorageVersionMigratorReconciler struct {
client.Client // cached reads for CRs (unused in this reconciler, kept for kubebuilder convention)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the comment says "unused in this reconciler, kept for kubebuilder convention", but the field is used on lines 280 (r.Client.Status().Apply), 285 (r.Apply), and 301 (r.Client.Status().Patch). A maintainer reading the comment might assume they can drop it. Maybe change to something like "used for SSA writes and status patch; reads go through APIReader".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, embarrassing comment drift — the field is used at three call sites. Fixed in dad30c9:

client.Client                      // used for /status writes (CR re-stores, CRD storedVersions patch); reads go through APIReader

(Slightly tweaked from your suggestion to mention CRD storedVersions, since the reconciler also uses Client.Status().Patch on the CRD itself.)

var _ = AfterSuite(func() {
By("tearing down envtest")
cancel()
time.Sleep(100 * time.Millisecond)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suite doesn't start a controller manager per the docstring, so I'm not sure what this sleep is waiting for. cancel() doesn't block and there are no background goroutines that need to drain. If it turns out envtest.Stop() flakes without it, there's usually a proper fix (draining a specific channel) rather than a fixed delay. Otherwise it can probably just go.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, dead code copy-pasted from virtualmcp/suite_test.go (which actually has a manager goroutine). Removed in dad30c9 along with the now-unused time import.

ChrisJBurns and others added 2 commits April 29, 2026 15:02
…n toggle

Empirical verification (probe test against envtest) confirmed the
reviewer's hypothesis: metadata-only Server-Side Apply on /status is
short-circuited by the apiserver — resourceVersion does not bump and
etcd is not re-encoded. The "force a re-encode" claim the original
implementation rested on was hollow. This commit replaces the SSA path
with the kube-storage-version-migrator pattern: Get the live object,
toggle a timestamp annotation to force a real diff, then Update.

Trade-off acknowledged in the controller docstring and user docs: the
Update goes through the main resource so admission webhooks see it.
Webhooks that need to ignore migration-only writes can branch on the
presence of toolhive.stacklok.dev/storage-version-migrated-at in the
diff. The original /status webhook-bypass benefit was always notional —
the SSA wasn't writing etcd at all.

Other fixes from the same review pass:

- Track per-pass IsConflict count in restoreCRs; if any conflicts were
  swallowed and no other errors occurred, return a sentinel error so
  Reconcile leaves storedVersions untouched (the post-conflict state of
  the affected object is unverified). Next reconcile retries cleanly.
  New ConflictDuringRestoreLeavesStoredVersionsUntouched test covers
  this.
- Bound migrationCache memory: register a manager.RunnableFunc that
  walks the cache every 10m and evicts expired entries. Prior code only
  evicted lazily on key re-query, which leaked entries for deleted CRs.
- Simplify isMigrationNeeded to a single line. The servedCount > 1
  branch was forcing list passes during deprecation windows for no
  actual benefit (None conversion + identical schemas means writers
  cannot reintroduce stale storedVersions).
- Document the wildcard CR RBAC as defence-in-depth alongside the
  isManagedCRD runtime gate.
- Fix the stale "(unused in this reconciler)" comment on the embedded
  Client field — it is used for CR Updates and the CRD status patch.
- Drop a 100ms time.Sleep in AfterSuite that was dead code copied from
  virtualmcp's suite (which has a manager goroutine to drain; we don't).

HappyPath now snapshots each CR's resourceVersion before reconcile and
asserts it has bumped after — empirical proof the underlying re-encode
actually happened. All 8 envtest cases pass; lint clean.

Refs PR #5011 review by @jhrozek.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The annotation toggle was a workaround for a misdiagnosed problem.
Earlier in this branch we observed empty SSAs being elided and
generalised it to "all unmodified writes elide" — and added a
toolhive.stacklok.dev/storage-version-migrated-at annotation toggle
to force a real diff per write. A second-opinion investigation
pointed out the elision check happens AFTER storage encoding: the
apiserver re-encodes the request body at the current storage version,
then byte-compares against etcd. For the actual migration scenario
(CR stamped at an older apiVersion in etcd, storage version now newer)
the encoded apiVersion differs, the comparison fails, and the write
proceeds. Verified empirically: a CR stored at v1alpha1 with storage
flipped to v1beta1 has plain Get+Update bump RV from 202 to 204; a CR
already at v1beta1 has plain Get+Update elide cleanly. Both behaviours
are correct — the elision is exactly the no-migration-needed case.

Changes:
  - restoreOne: plain Get + Update of the unmodified live object. No
    annotation, no timestamp, no MigrationTimestampAnnotation constant.
  - Controller docstring rewritten to describe the actual mechanism
    (encode-then-compare elision) and cite ksvm#65 for posterity.
  - HappyPath envtest loosened: per-CR resourceVersion bump assertions
    are dropped (already-clean CRs correctly elide; checking RV bumps
    on them was an artefact of the annotation-induced writes). The
    storedVersions convergence assertion remains as the contract test.
  - New cross-version envtest "re-encodes CRs that are stored at a
    prior storage version" exercises the real migration scenario:
    install CRD with v1alpha1 storage, create CR, flip storage to
    v1beta1, run reconciler, assert storedVersions trims and the CR's
    RV bumped (proving the apiserver actually rewrote etcd).
  - User docs updated to drop annotation references and describe the
    real mechanism. Reference implementation citation switched from
    cluster-api/crdmigrator to kube-storage-version-migrator (the
    actual upstream SIG tool we share semantics with).
  - FallsBackWhenNoStatusSubresource test removed: the SSA path it
    exercised no longer exists. Plain Update on a CRD without /status
    is just a normal Update; covered by the happy-path spec.

8/8 envtest cases pass. Lint clean. No RBAC drift (verbs already
correct).

Refs PR #5011 review by @jhrozek.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

Design correction — withdrawing the "upstream is broken" claim

I owe a correction on the design pivot from dad30c90. That commit message and the inline replies on this PR both claimed kube-storage-version-migrator upstream and Cluster-API's crdmigrator were silently broken, based on probes that showed plain Update of an unmodified CR was elided by the apiserver. That conclusion was wrong — I was testing the wrong scenario.

The corrected understanding (verified against ksvm#65 and confirmed empirically against our envtest in ab03d5e1):

  • The apiserver's storage layer elision happens after the storage encoder runs, comparing freshly-encoded bytes against etcd bytes.
  • The encoded form includes the storage version's apiVersion stamp.
  • When etcd holds bytes stamped at an older version (the actual migration scenario) and the controller writes via the current storage version, the encoded bytes differ → write proceeds → RV bumps → etcd re-encoded.
  • When etcd is already at the current storage version (no migration needed), the bytes match → write elides — and that's correct, not a bug.

My earlier probes only tested the second case (CR already at storage version), saw the elision, and overgeneralised. The probe in ab03d5e1's new envtest case exercises the actual cross-version migration scenario and confirms plain Update bumps RV correctly.

Practical consequence for this PR: the annotation toggle (toolhive.stacklok.dev/storage-version-migrated-at) added in dad30c90 was unnecessary. ab03d5e1 removes it — the controller now does plain Get + Update, which matches upstream kube-storage-version-migrator exactly. CRs no longer carry the persistent migration annotation that earlier review surfaced as a concern.

Apologies to the upstream maintainers of kube-storage-version-migrator and cluster-api/crdmigrator for the mischaracterisation. Both projects' mechanisms are correct for the migration scenario they target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator: add a StorageVersionMigrator controller to clean up storedVersions

2 participants