From 16189913403298301599b387e9f39e8f2e92686e Mon Sep 17 00:00:00 2001 From: Dan Winship Date: Sat, 22 Nov 2025 13:02:53 -0500 Subject: [PATCH] KEP-5922: Conformance Tests for Out-of-Tree Networking Features --- .../5922-networking-conformance/README.md | 479 ++++++++++++++++++ .../5922-networking-conformance/kep.yaml | 19 + 2 files changed, 498 insertions(+) create mode 100644 keps/sig-network/5922-networking-conformance/README.md create mode 100644 keps/sig-network/5922-networking-conformance/kep.yaml diff --git a/keps/sig-network/5922-networking-conformance/README.md b/keps/sig-network/5922-networking-conformance/README.md new file mode 100644 index 000000000000..7656d64afccc --- /dev/null +++ b/keps/sig-network/5922-networking-conformance/README.md @@ -0,0 +1,479 @@ +# KEP-5922: Conformance Tests for Out-of-Tree Networking Features + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1: Conformance tests for out-of-tree features](#story-1-conformance-tests-for-out-of-tree-features) + - [Story 2: Conformance tests for existing-but-untested functionality](#story-2-conformance-tests-for-existing-but-untested-functionality) + - [Story 3: Rolling out new features that may confuse existing components](#story-3-rolling-out-new-features-that-may-confuse-existing-components) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Implementation details](#implementation-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +Networking is unusual among Kubernetes features in that while it is +required for conformance, much of it is implemented outside of +`kubernetes/kubernetes`, by people who are not always Kubernetes +developers, on schedules that are not always in sync with the +Kubernetes release cycle. + +This makes it problematic to add new conformance requirements for +Kubernetes networking, since in many cases the conformance test won't +just be validating code that we already implemented in-tree, it will +be immediately imposing a requirement on third parties to have +implemented the feature in their own code before the next Kubernetes +release. + +This KEP proposes a process for formally declaring that an e2e test +will become a conformance test in a specific future release, so that +third-party networking implementations will know they are required to +implement that behavior, and will have a reasonable amount of time to +do so. + +(In theory the process here is not specific to SIG Network, but AFAIK +SIG Node is the only other SIG that has a component that is required +for conformance but which is externally developed with multiple +implementations (container runtimes) and they already have their own +rule for dealing with that. However, the [third user +story](#story-3-rolling-out-new-functionality-that-may-confuse-existing-components) +below might be applicable to another SIG at some point.) + +## Motivation + +According to [the CNCF's Conformance page], "Users want consistency +when interacting with any installation of Kubernetes". It seems clear +that SIG Network is _not_ delivering this: + + + +|KEP |GA in|Implemented by? |Status | +|----------------------------------------------|-----|----------------------------------------------|:---------------:| +|[KEP-2447] `service-proxy-name` label |1.14 |Most service proxies |:neutral_face: | +|[KEP-614] `SCTPSupport` |1.20 |Some pod networks, some service proxies |:fearful: | +|[KEP-752] `EndpointSliceProxying` |1.21 |All service proxies? |:smile: | +|[KEP-563] `IPv6DualStack` |1.23 |Most pod networks, most service proxies? |:neutral_face: | +|[KEP-1138] IPv6 single-stack |1.23 |Most pod networks, most service proxies? |:neutral_face: | +|[KEP-2365] `IngressClassNamespacedParams` |1.23 |(unknown%) ingress controllers |:thinking: | +|[KEP-2079] `NetworkPolicyEndPort` |1.25 |Most NetworkPolicy implementations |:neutral_face: | +|[KEP-1435] `MixedProtocolLBSVC` |1.26 |Few cloud load balancers |:rage: | +|[KEP-2086] `ServiceInternalTrafficPolicy` |1.26 |Some service proxies |:fearful: | +|[KEP-1669] `ProxyTerminatingEndpoints` |1.28 |Most service proxies |:neutral_face: | +|[KEP-3836] `KubeProxyDrainingTerminatingNodes`|1.31 |Few service proxies, few cloud load balancers?|:rage: | +|[KEP-1860] `LoadBalancerIPMode` |1.32 |Few service proxies, some cloud load balancers|:fearful: | +|[KEP-1880] `MultiCIDRServiceAllocator` |1.33 |??? |:exploding_head: | +|[KEP-2433] `TopologyAwareHints` |1.33 |Some service proxies |:fearful: | +|[KEP-4444] `ServiceTrafficDistribution` |1.33 |Some service proxies |:fearful: | +|[KEP-3015] `PreferSameTrafficDistribution` |1.35 |Few service proxies |:rage: | + +[the CNCF's Conformance page]: https://www.cncf.io/training/certification/software-conformance/#benefits + +[KEP-563]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/563-dual-stack +[KEP-614]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/614-SCTP-support +[KEP-752]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/0752-endpointslices +[KEP-1138]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/1138-ipv6 +[KEP-1435]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1435-mixed-protocol-lb +[KEP-1669]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/1669-proxy-terminating-endpoints +[KEP-1860]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/1860-kube-proxy-IP-node-binding +[KEP-1880]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1880-multiple-service-cidrs +[KEP-2079]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2079-network-policy-port-range +[KEP-2086]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2086-service-internal-traffic-policy +[KEP-2365]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2365-ingressclass-namespaced-params +[KEP-2433]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2433-topology-aware-hints +[KEP-2447]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2447-Make-kube-proxy-service-abstraction-optional +[KEP-3015]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/3015-prefer-same-node +[KEP-3836]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3836-kube-proxy-improved-ingress-connectivity-reliability +[KEP-4444]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4444-service-traffic-distribution + +### Goals + +- Agree on a process for adding new conformance tests for behavior + which is implemented by out-of-tree components. + +- Make any necessary changes to the Kubernetes e2e framework, and to + sonobuoy and hydrophone, to support the new process. + +- Update the conformance documentation (both internal and CNCF) to + explain the new process. + +### Non-Goals + +- Actually promoting any new e2e tests to conformance; that will be + handled independently of the KEP. + +## Proposal + +### User Stories + +#### Story 1: Conformance tests for out-of-tree features + +As a SIG Network KEP author, I want users to be able to use the +feature that I developed, in all conforming Kubernetes clusters. + +#### Story 2: Conformance tests for existing-but-untested functionality + +As a SIG Network Lead, I want to promote the e2e test `"should support +named targetPorts that resolve to different ports on different +endpoints"` to conformance ([kubernetes #132954]), since this has been +documented as an important feature of Services since [the earliest +online version of our docs]. However, I don't want to abruptly break +conformance in clusters using certain out-of-tree service proxy +implementations that are known to currently fail that test. + +[kubernetes #132954]: https://github.com/kubernetes/kubernetes/issues/132954 +[the earliest online version of our docs]: https://github.com/kubernetes/website/blob/c8dd8b8831db5cd7c862ac5631c4414a53ac021c/docs/user-guide/services/index.md + +#### Story 3: Rolling out new features that may confuse existing components + +As a SIG Network Lead, I want users in all clusters to be able to use +the `ServiceCIDR` API ([KEP-1880]) without it breaking third-party +networking components in their cluster. + +(Being able to extend the service CIDR range in a running cluster has +long been a requested feature, and it is now _theoretically_ possible. +However, some existing external networking components don't handle the +service CIDRs being changed after cluster install time (since this was +previously not possible), and would end up mis-routing traffic if new +service CIDRs were added later. While these components can be fixed to +take the `ServiceCIDR` API into account, the `ServiceCIDR` API itself +has no way of determining whether a given cluster includes components +that are incompatible with it. For now, we document how cluster +operators can disable the `ServiceCIDR` API via +`ValidatingAdmissionPolicy` if they know it won't work in their +cluster; it would be good if we could require components to eventually +support it.) + +### Risks and Mitigations + +The entire KEP is about reducing risk: + + - Creating a well-defined process for adding new conformance + requirements for third-party networking components reduces the + risk that third party implementers will be caught off guard and + not have enough time to implement the necessary features. + + - Having a mandatory lag time between announcing the new conformance + tests and having them actually become required reduces the risk + that we will accidentally introduce new conformance requirements + that are impossible for some third parties to implement (like the + old `timeoutSeconds` parameter for service session affinity, which + we had to demote from conformance after realizing it was too + specific to the kube-proxy `iptables` implementation ([kubernetes + #112806])). + + - Making it less risky to add networking conformance tests means SIG + Network is likely to add more of them in the future, which will + increase compatibility between various Kubernetes environments, + and decrease risk to users when migrating between different + providers. + +[kubernetes #112806]: https://github.com/kubernetes/kubernetes/pull/112806 + +## Design Details + +All conformance tests are labelled with the version of Kubernetes in +which the test first became part of conformance. I propose that we +allow adding conformance tests that are tagged with *future* release +numbers. This would be used to indicate that, while the test is not +required for conformance in the current release, it is intended to +become a conformance requirement in the indicated future release. + +For purposes of Kubernetes CI, these "future conformance" tests would +be treated no different from "present-day conformance" tests: all +Kubernetes CI jobs that run `[Conformance]` tests would begin running +them immediately, and the job would fail if the test failed. + +However, for people doing conformance testing of Kubernetes +distributions, failures in the "future conformance" tests would merely +result in warnings in the conformance test results, not failures. The +warnings should be obvious to the user, and should indicate in which +release the test is intended to become required for conformance. + +There are no explicit requirements for promotion to "future +conformance" beyond the usual [conformance test requirements]. +However, the fact that the test would already have to be able to pass +all existing conformance CI jobs would imply that: + + - To promote a pod networking-related feature or behavior to "future + conformance", it would have to already be implemented correctly by + both `kindnet` and "GKE Dataplane v1". + + - To promote a service proxying feature or behavior to "future + conformance", it would have to already be implemented correctly by + `kube-proxy` (specifically, the `iptables` mode of `kube-proxy`, + for the moment). + + - To promote a service DNS feature or behavior to "future + conformance", it would have to already be implemented correctly by + `CoreDNS`. + +``` +<<[UNRESOLVED] kube-dns? >> + +I previously thought we still depended on kube-dns, but from +https://github.com/kubernetes/kubernetes/pull/137553, it seems we +might already not? + +<<[/UNRESOLVED]>> +``` + +(NetworkPolicy, cloud load balancers, Ingress, and Gateway are +considered optional features, and are not covered by conformance.) + +The rules for picking a version for future conformance will be: + + - E2e tests of externally-implemented features associated with KEPs, + which go through the alpha → beta → GA cycle, can become + conformance requirements no sooner than: + + - 2 years after the e2e test is first merged to k/k (presumably + behind an Alpha feature gate). + + - 1 year after the KEP for the feature becomes `status: + implemented`. + + - New e2e tests of externally-implemented features/behaviors not + associated with KEPs, or pre-existing e2e tests of + externally-implemented features/behaviors, can become conformance + requirements no sooner than: + + - 1 year after the e2e test for the feature/behavior is first + merged to k/k. + + - 1 year after the test is tagged for future conformance. + + - No new tests for externally-implemented features/behaviors shall + become conformance requirements until at least 1 year after *this* + KEP becomes `status: implemented`. + +(The requirements are stated in terms of years, but as proposed below, +would be implemented in terms of release versions. If the release +cadence changes in the future, it may be necessary to adjust the +target releases of current future conformance tests.) + +If necessary, a test that was marked for "future conformance" could be +demoted back to non-conformance before the release where it would have +become required. + +[conformance test requirements]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#conformance-test-requirements + +### Implementation details + +Tests for future conformance should use `framework.ConformanceIt()`, +but should include the additional decorator +`framework.WithConformanceVersion(version)` with a future Kubernetes +version: + +```golang + framework.ConformanceIt("blah blah blah", framework.WithConformanceVersion("1.39"), ... +``` + +The question is what effect this should actually have on the test +name/labels... + +Sonobuoy and hydrophone both filter the e2e tests to those containing +exactly the string `[Conformance]`, so we need to avoid adding the +exact string `[Conformance]` to future-conformance tests. + +My first thought was to use `[Conformance:1.39]` for future +conformance, which works to make sonobuoy/hydrophone not see the +tests, but it turns out that ginkgo treats `[Conformance]` and +`[Conformance:...]` as essentially two unrelated labels, so matching +both groups in our own conformance tests would require a label filter +something like `'Conformance || !Conformance: isEmpty'`, which is no +simpler than just matching something more obvious like `'Conformance +|| FutureConformance'`. + +Ginkgo's new versioning constraints allow us to label tests with +versions and then select by version, but if you run without a +`--sem-ver-filter` (as sonobuoy and hydrophone do currently), then you +get all tests, so we can't *just* do `[Conformance]` plus a separate +version constraint. + +I think the best approach is to have +`framework.WithConformanceVersion(version)` just do +`ginkgo.ComponentSemVerCondition("Conformance", ">="+version)`. Then, +`framework.ConformanceIt` can scan for this constraint in its +arguments, and output the label `[FutureConformance]` rather than +`[Conformance]` if it is found (and refers to a version in the +future). + +So the test case above, running against 1.36 would show up as: + +``` + [FutureConformance] [Conformance: [>= 1.39]] blah blah blah +``` + +(and thus be ignored by sonobuoy and hydrophone) but running against +1.39 would show as: + +``` + [Conformance] [Conformance: [>= 1.39]] blah blah blah +``` + +Our own CI conformance tests would be changed to run both +`Conformance` and `FutureConformance` tests, and would ignore the +version constraints. (Note that `[Conformance: [>= 1.39]]` is not a +label and can't be included or excluded via a label filter.) + +Sonobuoy and hydrophone would be updated to (a) have an option to run +`FutureConformance` tests, and (b) allow passing a target version and +converting that into an appropriate `-ginkgo.sem-ver-filter`, to allow +users to test against some or all future conformance tests. + +``` +<<[UNRESOLVED] >> + +Should we keep the `[Conformance: [>= 1.39]]` around in the output +even after 1.39, or should `WithConformanceVersion` turn into a no-op +at that point? Alternatively, should we retroactively label old +conformance tests with `WithConformanceVersion` so that they also get +labeled that way? (This could be done automatically based on the existing +conformance metadata.) + +<<[/UNRESOLVED]>> +``` + +### Test Plan + +This is proposing a change to testing itself. Other than perhaps some +unit tests of the changes to `conformance.yaml` generation, there is +unlikely to be any automated testing associated with it. Instead, if +there are any changes to our e2e infrastructure, we will need to just +manually confirm that they have the expected result. + +### Graduation Criteria + +Not really applicable; the new "future conformance" feature would be +GA as soon as it was fully implemented. Additionally, the primary +changes are to the conformance-testing process, *not* to Kubernetes +itself, so they can be added (or reverted) outside of the release +cycle. + +### Upgrade / Downgrade Strategy + +N/A + +### Version Skew Strategy + +The skew-able components here are: + + - The `e2e.test` binary + - sonobuoy / hydrophone + - The official ["How to submit conformance results"] instructions + +The documentation says to run `sonobuoy run +--mode=certified-conformance` or `hydrophone --conformance`. Both of +those commands specifically set a ginkgo focus of "`\[Conformance\]`", +which will continue to match only the "present-day conformance" tests, +even when an older sonobuoy/hydrophone (or older set of instructions) +is used with a newer `e2e.test` binary. + +["How to submit conformance results"]: https://github.com/cncf/k8s-conformance/blob/master/instructions.md + +## Production Readiness Review Questionnaire + +N/A: the KEP does not describe a change to the runtime behavior of +Kubernetes. + +## Implementation History + +- 2026-02-15: Initial proposal +- 2026-03-07: Updated for comments + +## Drawbacks + +Although it would be possible to abuse the process proposed here, it +seems like "being able to add new networking conformance requirements +in a way that is friendly to third party implementations" is strictly +better than "not being able to add new networking conformance +requirements in a way that is friendly to third party +implementations". + +## Alternatives + +The obvious alternatives are (a) never add new conformance +requirements, and (b) add new conformance requirements whenever we +want to, without worrying about third party implementations. Neither +of these is a good alternative. (One could argue that all out-of-tree +networking features added since conformance was first defined in 1.9 +are inherently "optional" and thus not subject to conformance, but +that does not match the way that we document those APIs.) + +We could implement the same general idea as proposed here, but with no +formal infrastructure, by just having a rule like "if you want to +promote a test of an out-of-tree feature to conformance, you have to +write a blog post about it on the developer blog 1 year before you do +it so everyone will know". That would be simpler, but I don't think it +would be better. + +For features implemented by container runtimes, SIG Node uses the rule +that a required out-of-tree feature can be depended on once both +containerd and cri-o implement it. I don't think we could adopt a rule +like that for networking components. There are many more third-party +networking components than there are container runtimes (including +some important platform-specific ones in the "long tail"), and it +would probably not be either statistically or politically valid to try +to bless a specific small group of networking implementations as +"first among equals" in the way that SIG Node has blessed the two +major container runtimes. diff --git a/keps/sig-network/5922-networking-conformance/kep.yaml b/keps/sig-network/5922-networking-conformance/kep.yaml new file mode 100644 index 000000000000..494fdfb7f37c --- /dev/null +++ b/keps/sig-network/5922-networking-conformance/kep.yaml @@ -0,0 +1,19 @@ +title: Conformance Tests for Out-of-Tree Networking Features +kep-number: 5922 +authors: + - "@danwinship" +owning-sig: sig-network +participating-sigs: + - sig-architecture + - sig-testing + - sig-docs +status: provisional +creation-date: 2026-02-12 +reviewers: + - "@aojea" + - "@thockin" + - TBD +approvers: + - "@aojea" + - "@thockin" + - TBD