Skip to content

KEP-5922: Conformance Tests for Out-of-Tree Networking Features#5923

Open
danwinship wants to merge 1 commit intokubernetes:masterfrom
danwinship:future-conformance
Open

KEP-5922: Conformance Tests for Out-of-Tree Networking Features#5923
danwinship wants to merge 1 commit intokubernetes:masterfrom
danwinship:future-conformance

Conversation

@danwinship
Copy link
Contributor

Networking is unusual among Kubernetes features in that while it is
required for conformance, much of it is implemented outside of
`kubernetes/kubernetes`, by people who are not always Kubernetes
developers, on schedules that are not always in sync with the
Kubernetes release cycle.

This makes it problematic to add new conformance requirements for
Kubernetes networking, since in many cases the conformance test won't
just be validating code that we already implemented in-tree, it will
be immediately imposing a requirement on third parties to have
implemented the feature in their own code before the next Kubernetes
release.

This KEP proposes a process for formally declaring that an e2e test
will become a conformance test in a specific future release, so that
third-party networking implementations will know they are required to
implement that behavior, and will have a reasonable amount of time to
do so.

/sig network architecture testing docs
/assign @aojea @thockin

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/docs Categorizes an issue or PR as relevant to SIG Docs. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 16, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 16, 2026
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 16, 2026
|[KEP-2433] `TopologyAwareHints` |1.33 |Some service proxies |:fearful: |
|[KEP-4444] `ServiceTrafficDistribution` |1.33 |Some service proxies |:fearful: |
|[KEP-3015] `PreferSameTrafficDistribution` |1.35 |Few service proxies |:rage: |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- To promote a service DNS feature or behavior to "future
conformance", it would have to already be implemented correctly by
both `kube-dns` and `CoreDNS`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-dns may not be a good example, it does not uses EndpointSlices kubernetes/dns#504 so dual stack is not handled correctly IIRC

Copy link
Contributor Author

@danwinship danwinship Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an "example", it's a clarification of a requirement that already exists but was previously not explicitly stated: the existing GKE-based conformance jobs use kube-dns (right?), so therefore new conformance tests must be able to pass in a cluster using kube-dns.

kubernetes/kubernetes#132019 was supposed to have made EndpointSlice support explicitly required for conformance... I guess it made EndpointSlice-based proxying a conformance requirement, but not EndpointSlice-based service DNS...

I'll need to revisit this as part of KEP-4974... (Is Google is planning to ditch kube-dns? I thought I heard something about that...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to revisit this as part of KEP-4974... (Is Google is planning to ditch kube-dns? I thought I heard something about that...)

We shoud/must , @marqc should know better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an item about this to this week's SIG Network agenda

conformance", it would have to already be implemented correctly by
both `kube-dns` and `CoreDNS`.

(NetworkPolicy, cloud load balancers, Ingress, and Gateway are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder of we should revisit NetworkPolicy, but is just a drive by comment, unrelated to the KEP itself

Copy link
Contributor Author

@danwinship danwinship Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on whether to say something here, but yes, this does pave the way for making NP a requirement. (Though as Tim pointed out, if we think "NPv2" is coming maybe we should just wait for that...)

@aojea
Copy link
Member

aojea commented Feb 16, 2026

/cc @BenTheElder @pohly

for the testing framework and mechanics

/cc @dims @johnbelamaric

for sig arch conformance

It is worth having this discussion, at least having a reasonable path to unblock this deadlock of third_parties not investing in being conformance and us trying to be good guys to not break them

@dims
Copy link
Member

dims commented Feb 16, 2026

It is worth having this discussion, at least having a reasonable path to unblock this deadlock of third_parties not investing in being conformance and us trying to be good guys to not break them

yes please


However, for people doing conformance testing of Kubernetes
distributions, failures in the "future conformance" tests would merely
result in warnings in the conformance test results, not failures. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to achieve the "merely result in warnings" part. Once a test runs, any failure is recorded as a failure in the JUnit result.

Maybe we can do some post-processing (we already implement our own JUnit reporting) and turn "failed" into "warning" in https://pkg.go.dev/github.com/onsi/ginkgo/v2@v2.28.1/reporters#JUnitFailure for tests which are marked as "are allowed to fail".

But the overall test suite result then will still be a failure. Hydrophone and Sonobuoy may have to be adapted to report this differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally imagining that maybe in the same way you can recover from a panic in go, that maybe there was some way to catch a ginkgo.Fail(). Then we could turn it into a Skip() instead. (There shouldn't normally be any Skipped tests in the conformance results, so that would be something the testing helpers could recognize as being specific to this case). But I couldn't find anything in the ginkgo docs suggesting anything like that would be possible...

A higher-effort version of that would be to have the test case itself Skip itself on failure, but that would imply it couldn't use a lot of helper functions (like, most of gomega), and there'd be the risk that we'd accidentally end up actually failing in some edge case.

Maybe "run the test suite twice" really is the best approach. I guess rather than splitting it into "all present+future tests" and "only present tests", the split could be "only present tests" and "only future tests", and the instructions for the actual conformance results would be the same as they are now, but then we'd suggest you could also run the future conformance tests separately to confirm your future conformance...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally imagining that maybe in the same way you can recover from a panic in go, that maybe there was some way to catch a ginkgo.Fail()

That does indeed not work. As soon as ginkgo.Fail is called, the test is marked as failed. You can catch the special panic and continue the test, but there's no way to intercept the actual failure. The post-processing that I mentioned could have the same effect, though.

Copy link
Member

@BenTheElder BenTheElder Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do some post-processing (we already implement our own JUnit reporting) and turn "failed" into "warning" in https://pkg.go.dev/github.com/onsi/ginkgo/v2@v2.28.1/reporters#JUnitFailure for tests which are marked as "are allowed to fail".

Oof, that seems confusing and problematic.

Maybe:

  • It just fails
  • We work with the CNCF before rolling out any of these to update the tooling used to enforce passing conformance to accept overall suite failure and specifically consider the tests, ignoring tests that are [FutureConformance] or something

People are typically running these tests under some wrapper, and often not inspecting the junit at all.
I think a "failure" will be much more visible, and encourage actually fixing things.
But if we don't want to strictly require it to pass yet, then we can just have the conformance program permit those failures when a special tag is present.

I doubt any existing pipeline would surface a a "warning", including our own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOW: we treat this like any other tests today: excluded or included from the run. Those preparing for the future can run [(Future?)Conformance], those running current conformance can run [Conformance]

The tricky thing is getting buy-in to actually bother running these and preparing for them, otherwise this is essentially no different from just ... delaying promotion of an otherwise untagged test, as we sometimes do today.


This may require some combination of changes to:

- The k/k e2e framework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently worked with the Ginkgo maintainers on supporting labeling tests with version numbers for arbitrary components: https://pkg.go.dev/github.com/onsi/ginkgo/v2#ComponentSemVerConstraint

A test could be tagged as ComponentSemVerConstraint("KubernetesConformance", ">=1.57").

Then ginkgo --sem-ver-filter=" KubernetesConformance=1.56" does not include this test. ginkgo --sem-ver-filter=" KubernetesConformance=1.57" does.

I believe (to be verified!) that the test also gets excluded when --sem-ver-filter is not used. This might not be what we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use no --sem-ver-filter then it just ignores the version constraints and runs everything.

Given that, the _simplest_ approach would just be to tell people to
run the full present-and-future conformance suite first, and if it
passes, submit those results, but if it fails, re-run just the present
conformance suite, and submit the results of that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this "present" vs. "future" could be achieved with --sem-ver-filter.

@danwinship danwinship force-pushed the future-conformance branch 2 times, most recently from 6039cac to ae79957 Compare March 9, 2026 14:03
@danwinship danwinship force-pushed the future-conformance branch from ae79957 to 1618991 Compare March 9, 2026 14:03
@danwinship
Copy link
Contributor Author

@pohly updated with a clearer plan...


- To promote a pod networking-related feature or behavior to "future
conformance", it would have to already be implemented correctly by
both `kindnet` and "GKE Dataplane v1".
Copy link
Member

@BenTheElder BenTheElder Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't run CI with GKE AFAIK, is "GKE Dataplane v1" a reference to kube-up.sh?

Also, when we say kindnet, do we mean sigs.k8s.io/kindnet, or that which runs in a kind cluster OOTB which we use to run some CI with conformance tests project-wide (the former has cloud bits that the latter does not, their implementations have diverged)

x-ref #4224

Copy link
Contributor Author

@danwinship danwinship Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So again as I commented to Antonio in the discussion about kube-dns above, the text here is not trying to create new constraints. It is trying (and apparently failing) to describe a constraint that already exists, namely, "you can't even merge a PR adding a new conformance test unless the test passes in all of the always_run: true, optional: false k/k CI jobs".

By "GKE Dataplane v1", I meant "whatever networking we run in GCE CI jobs"... I thought "GKE Dataplane v1" was a correct way of describing that... In particular, to the best of my knowledge, pod networking in GCE CI jobs uses a CNI plugin called "kubenet" that is a descendant of the "kubenet" plugin that used to exist in-tree, but which no longer exists anywhere within the Kubernetes project.

Likewise, by "kindnet", I mean "whatever we run in the kind CI jobs"


Tests for future conformance should use `framework.ConformanceIt()`,
but should include the additional decorator
`framework.WithConformanceVersion(version)` with a future Kubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this to interact with #4330 at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUI, we don't support running the e2e tests against a cluster using compatibility/emulated versions. If you want to test against a cluster being run with --emulated-version=1.28 you have to run 1.28's e2e test binary. Right?

<<[UNRESOLVED] kube-dns? >>

I previously thought we still depended on kube-dns, but from
https://github.com/kubernetes/kubernetes/pull/137553, it seems we
Copy link
Member

@BenTheElder BenTheElder Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do as a project.

I'm not sure if any ecosystem offerings still do, or if that's even reasonable to block on ... I lean towards: ~no, no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was specifically worried about CI clusters, since kube-up.sh, etc, still defaults to kube-dns if CLUSTER_DNS_CORE_DNS is unset. But I missed the fact that cluster/gce/config-default.sh sets that to true, so you get CoreDNS in any job that doesn't explicitly set it to false. So it seems that all of our CI jobs use CoreDNS except for a few sig-network ones which are intentional kube-dns variants of other jobs.

As for the wider ecosystem, SIG Network is deprecating it (kubernetes/kubernetes#137556) after confirming that there is not enough usage of it for it to be reasonable to block on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants