Skip to content

Simplify hasHealthyEndpoint logic and fix edge cases with reused IPs#2908

Merged
oribon merged 2 commits intometallb:mainfrom
tarabrind:fix/eps-with-the-same-ips
Mar 15, 2026
Merged

Simplify hasHealthyEndpoint logic and fix edge cases with reused IPs#2908
oribon merged 2 commits intometallb:mainfrom
tarabrind:fix/eps-with-the-same-ips

Conversation

@tarabrind
Copy link
Contributor

@tarabrind tarabrind commented Dec 23, 2025

Is this a BUG FIX or a FEATURE ?:

/kind bug

What this PR does / why we need it:

In environments like KubeVirt, virtual machines migrating between nodes can lead to multiple endpoints sharing the same IP address. For example, during migration, an old pod might stay in a Completed state on the source node while a new one is Running on the destination node, both sharing the same IP.

The previous implementation of hasHealthyEndpoint in bgp_controller.go was problematic because it tracked readiness via an intermediate map keyed by the endpoint's IP address (ready := map[string]bool{}). This led to redundant logic (readiness belongs to the Endpoint, not the IP) and potential fragility when IPs are reused across different pod lifecycles.

Real-world Scenario:

Here is an example of such a state from a live cluster:

Pods:

# kubectl get pods -o wide
NAME                                READY   STATUS      RESTARTS   AGE    IP             NODE           NOMINATED NODE   READINESS GATES
virt-launcher-ubuntu-vm-one-pp6sq   0/1     Completed   0          4d2h   10.88.10.7     virtlab-ap-1   <none>           1/1
virt-launcher-ubuntu-vm-one-sjxcw   1/1     Running     0          34s    10.88.10.7     virtlab-ap-2   <none>           1/1

EndpointSlice entries:

# kubectl get endpointslices.discovery.k8s.io vm-ssh-xk6g6 -o yaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 10.88.10.7
  conditions:
    ready: false
    serving: false
    terminating: false
  nodeName: virtlab-ap-1
  targetRef:
    kind: Pod
    name: virt-launcher-ubuntu-vm-one-pp6sq
    namespace: default
    uid: 938eea92-6b6c-4df2-9edb-452aa20d727b
- addresses:
  - 10.88.10.7
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: virtlab-ap-2
  targetRef:
    kind: Pod
    name: virt-launcher-ubuntu-vm-one-sjxcw
    namespace: default
    uid: 10df3525-1eab-460b-b881-3187899de323
kind: EndpointSlice
...

This PR refactors hasHealthyEndpoint to use a simplified map-based approach that groups readiness by IP address. This ensures that:

  • Health is evaluated correctly even when multiple Endpoints share the same IP (KubeVirt scenario).
  • If at least one endpoint for a given IP is healthy, the IP is considered ready.
  • Found and updated one test case in TestBGPSpeakerEPSlices ("Endpoint on our node has some unready ports") that was asserting an incorrect behavior for the KubeVirt scenario (it expected no announcement if ANY endpoint for an IP was unready).
  • The logic is cleaner than the original version while being more robust against IP reuse than a simple "any-ready" check.

Special notes for your reviewer:

While this refactoring aligns the BGP controller's intent with the Layer2 controller's activeEndpointExists, I intentionally kept a simplified map-based approach instead of a pure early return. This was done to:

  1. Maintain compatibility with the existing BGP test suite (specifically TestBGPSpeakerEPSlices), which expects per-IP readiness evaluation.
  2. Ensure that "Ready" endpoints correctly shadow "Unready" ones for the same IP (fixing the KubeVirt migration issue) while keeping the code structure familiar to the original implementation.

Release note:

Simplify hasHealthyEndpoint logic in BGP controller to correctly handle endpoint health when IP addresses are reused (e.g., during KubeVirt migrations).

@tarabrind tarabrind marked this pull request as ready for review December 23, 2025 16:37
@tarabrind tarabrind marked this pull request as draft December 25, 2025 14:16
@tarabrind tarabrind force-pushed the fix/eps-with-the-same-ips branch from 20f507f to cac19a5 Compare December 25, 2025 14:20
@tarabrind
Copy link
Contributor Author

I've pushed a second commit to fix the CI failures. The TestBGPSpeakerEPSlices suite had a case that expected no announcement if any endpoint for an IP was unready. I've updated the logic to ensure a healthy endpoint takes precedence (crucial for KubeVirt migrations) and adjusted the test case accordingly. PR description is updated with details.

@tarabrind tarabrind marked this pull request as ready for review December 25, 2025 14:31
@tarabrind
Copy link
Contributor Author

Hi @fedepaol, @oribon! Gentle ping on this one. Let me know if you need anything else from my side.

@oribon
Copy link
Member

oribon commented Jan 27, 2026

sorry for the late reply!
I validated (together with ai) that what you propose aligns with k8s:
https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#duplicate-endpoints
points to the EndpointSliceCache code within kube-proxy which says:

		// On the other hand, there maybe also be two *different* Endpoints (i.e.,
		// with different targetRefs) that point to the same IP, if the pod
		// network reuses the IP from a terminating pod before the Pod object is
		// fully deleted. In this case we want to prefer the running pod over the
		// terminating one. (If there are multiple non-terminating pods with the
		// same podIP, then the result is undefined.)

I haven't gone over the exact changes, but ccing @fedepaol to also give this a look as he has done most (or all) of the work regarding how we treat epslices

// Only set true if nothing else has expressed an
// opinion. This means that false will take precedence
// if there's any unready ports for a given endpoint.
if epslices.EndpointCanServe(ep.Conditions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hard to follow compared to returning early. The semantic here is, if at least one endpoint with this address is ready, then let's consider it ready.

So, I'd flip the logic to

if if _, ok := ready[addr]; ok { // continue, in case of multiple endpoint slices with the same address, one is enough
   continue
}

if epslices.EndpointCanServe() {
   ready[addr] = true
}
ready[addr] = false

This is way more explicit and easy to understand imo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've refactored hasHealthyEndpoint to correctly handle reused IPs as suggested.

@fedepaol
Copy link
Member

I'd add a unit test for this scenario.
Also, please describe why you are doing this change in the commit message.
Thanks!

@tarabrind
Copy link
Contributor Author

I'd add a unit test for this scenario. Also, please describe why you are doing this change in the commit message. Thanks!

I also added a new unit test TestHasHealthyEndpoint covering these edge cases (duplicate IPs with mixed readiness states).

@tarabrind tarabrind requested a review from fedepaol February 5, 2026 13:11
Copilot AI review requested due to automatic review settings March 2, 2026 15:06
@tarabrind tarabrind force-pushed the fix/eps-with-the-same-ips branch from c173d0a to eb29b7b Compare March 2, 2026 15:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the BGP controller’s endpoint health evaluation to correctly handle EndpointSlice scenarios where multiple endpoints share the same IP (e.g., during KubeVirt VM migrations), and updates/extends tests to reflect the corrected behavior.

Changes:

  • Updates hasHealthyEndpoint to treat an IP as healthy if any endpoint for that IP can serve (so ready endpoints “win” over unready ones for the same address).
  • Fixes the expected advertisement behavior in TestBGPSpeakerEPSlices for the reused-IP/unready+ready mix case.
  • Adds a focused unit test suite for hasHealthyEndpoint covering duplicate-IP combinations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
speaker/bgp_controller.go Adjusts endpoint health evaluation to avoid false negatives when the same endpoint IP appears with mixed readiness.
speaker/bgp_controller_test.go Updates an existing EPSlices test expectation and adds a dedicated hasHealthyEndpoint unit test matrix.
Comments suppressed due to low confidence (1)

speaker/bgp_controller.go:161

  • hasHealthyEndpoint returns a boolean (“any healthy endpoint exists”), so the per-address ready map plus the final loop are redundant and add extra allocation/complexity. Consider simplifying to an early return on the first endpoint that can serve after node filtering (or accumulate into a single boolean), which preserves the reused-IP behavior.
func hasHealthyEndpoint(eps []discovery.EndpointSlice, filterNode func(*string) bool) bool {
	ready := map[string]bool{}
	for _, slice := range eps {
		for _, ep := range slice.Endpoints {
			node := ep.NodeName
			if filterNode(node) {
				continue
			}
			for _, addr := range ep.Addresses {
				if ready[addr] {
					continue
				}
				ready[addr] = epslices.EndpointCanServe(ep.Conditions)
			}
		}
	}

	for _, r := range ready {
		if r {
			return true
		}
	}
	return false

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +142 to +151
if ready[addr] {
continue
}
ready[addr] = epslices.EndpointCanServe(ep.Conditions)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filterNode predicate is used as a skip function (if filterNode(node) { continue }), but the naming/commenting makes it read like it selects nodes to include. Consider renaming the parameter (e.g. skipNode/filterOutNode) or inverting the predicate for clarity so future edits don’t accidentally flip the logic.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code and outside the scope of this PR, but I can fix it if maintainers prefer.

@tarabrind
Copy link
Contributor Author

tarabrind commented Mar 2, 2026

"Hello @oribon! I've updated PR (fixed merge conflicts). It’s been open for a while, just wanted to make sure it hasn't fallen off the radar. Would appreciate a review when possible!"

@oribon
Copy link
Member

oribon commented Mar 3, 2026

@tarabrind can you please squash the commits?

@oribon
Copy link
Member

oribon commented Mar 3, 2026

@tarabrind can you please squash the commits?

@tarabrind tarabrind force-pushed the fix/eps-with-the-same-ips branch from eb29b7b to 55bd809 Compare March 5, 2026 07:27
@tarabrind
Copy link
Contributor Author

@tarabrind can you please squash the commits?

Done! I've squashed the commits and force-pushed the changes.


// hasHealthyEndpoint return true if this node has at least one healthy endpoint.
// It only checks nodes matching the given filterNode function.
func hasHealthyEndpoint(eps []discovery.EndpointSlice, filterNode func(*string) bool) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sorry if I'm missing something, but can this function be simplified further to:

func hasHealthyEndpoint(eps []discovery.EndpointSlice, filterNode func(*string) bool) bool {
    for _, slice := range eps {
        for _, ep := range slice.Endpoints {
            node := ep.NodeName
            if filterNode(node) {
                continue
            }
            if epslices.EndpointCanServe(ep.Conditions) {
                return true
            }
        }
    }
    return false
}

not sure I understand why the ready map is needed with this new behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand why the ready map is needed with this new behavior

Thanks for the suggestion! That makes perfect sense.
I used a version very similar to your code snippet. The only minor change I made was adding the check len(ep.Addresses) > 0:

if len(ep.Addresses) > 0 && epslices.EndpointCanServe(ep.Conditions) {
    return true
}

In the previous implementation, the for _, addr := range ep.Addresses loop implicitly skipped all endpoints that had not yet been assigned IP addresses. By adding len(ep.Addresses) > 0, we ensure that we maintain exactly the same behavior and do not accidentally treat an endpoint without IP addresses as a valid target.

I squashed the commits.

Signed-off-by: Denis Tarabrin <denis.tarabrin@flant.com>
@tarabrind tarabrind force-pushed the fix/eps-with-the-same-ips branch from 55bd809 to 22f0810 Compare March 11, 2026 14:41
@tarabrind tarabrind requested a review from oribon March 11, 2026 14:43
@oribon oribon enabled auto-merge March 15, 2026 11:12
@oribon oribon added this pull request to the merge queue Mar 15, 2026
@oribon
Copy link
Member

oribon commented Mar 15, 2026

lgtm, thanks!

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 15, 2026
@oribon oribon added this pull request to the merge queue Mar 15, 2026
Merged via the queue into metallb:main with commit 324b662 Mar 15, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants