Skip to content

A115: disable priority LB child policy retention cache#541

Open
apolcyn wants to merge 9 commits intogrpc:masterfrom
apolcyn:remove_cache
Open

A115: disable priority LB child policy retention cache#541
apolcyn wants to merge 9 commits intogrpc:masterfrom
apolcyn:remove_cache

Conversation

@apolcyn
Copy link
Contributor

@apolcyn apolcyn commented Mar 17, 2026

@apolcyn apolcyn changed the title Disable priority LB child cache A56 update: disable priority LB child cache Mar 17, 2026
@apolcyn apolcyn changed the title A56 update: disable priority LB child cache A56 update: disable priority LB child policy retention cache Mar 17, 2026
@markdroth markdroth changed the title A56 update: disable priority LB child policy retention cache A115: disable priority LB child policy retention cache Mar 18, 2026
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up!

Please let me know if you have any questions.

[[AA, BB, CC], [DD]] => [priority-0-3 priority-0-4]
[[BB, CC], [AA, DD]] => [priority-0-0 priority-0-1]
[[AA, BB, CC], [DD, EE]] => [priority-0-1 priority-0-2]
[[BB, CC], [AA, DD]] => [priority-0-2 priority-0-3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the algorithm should be sticking with child numbers 1 and 2 in this last step.

I've sent you grpc/grpc#41896 to add tests demonstrating the algorithm's behavior in C-core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm the original text is what Go was doing (i tagged you on an internal doc to illustrate)

but maybe that was specific to Go

Changed to your original suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case you have now in the gRFC, in the third update, the algorithm will use child number 3 for [AA, BB] and reuse child number 2 for [CC, DD], because it was already using that number for [CC] in the second update.

If what you're seeing in Go matches what you currently have in the doc, then I suspect Go's implementation of this algorithm isn't quite right. @easwars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look and get back asap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the actual algorithm described in the gRFC and I think this is where the algorithm was originally described: https://docs.google.com/document/d/13L2YVTGI2OrLpeCAPbgI4qEs0Oz02P7X0DKhk6Y9LMk/edit?tab=t.0#heading=h.r3twn1e6r8ct

Based on my reading of this section and my reading of C++, Java and Go, none of the languages actually implement the algorithm correctly. Specifically, none of the languages seem to implement this sentence:

  • If there is at least one locality in the priority that was present in the same priority in the old config, copy the name from the same priority in the old config.

All languages seem to use a greedy approach where the moment they find a locality (from the new config) being present in the old config, they end up using that name. They don't actually check for localities in the same priority.

And I do see that in C++, when a map contains locality names as keys, they are ordered using the following: https://github.com/grpc/grpc/blob/2c62d7b7a927db2572b50277ba56eb4a754c43b7/src/core/xds/xds_client/xds_locality.h#L38. And I think this probably helps because in most of our examples, we seem to use very ordered locality names like "L0", "L1" ... or "aa", "bb". In Go, map key ordering is not deterministic.

I do have the changes ready in Go to actually implement the algorithm specified at the top of this comment. But, before sending them out, wanted to check here if there was a reason why none of the languages ended up implementing the logic for looking the locality in the same priority.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up making the change in the name generation logic to give preference to localities in the same priority in the old config with the assumption that if we really don't want to be doing that, I can always revert the change back.

grpc/grpc-go#8997

Copy link
Member

@markdroth markdroth Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately, I didn't write a gRFC for that refactoring, just the internal doc. I've regretted that several times since then, since we've had to patch holes in several subsequent gRFCs as a result (and this PR is yet another example of that). And if I had done it as a gRFC, that would likely have prompted me to be more detailed about the intended algorithm here.

In any case, you're right, it looks like C-core isn't implementing that bullet correctly. But I'm not sure it actually matters that much, because regardless of whether or not we do that, I think there will be cases where the algorithm will do the wrong thing, since it doesn't know anything about which localities are actually reachable or which ones have actually been created by the priority policy. I'm not sure that the order of localities really matters at all -- although I would at least argue for deterministic behavior, because that will be much easier to debug.

(As an aside, it's worth noting that Java and Go also do not correctly implement the "all gRPC implementations must ensure that subchannels for locality 1 are retained through this transition instead of being destroyed and recreated" part, because that requires subchannel pooling. Only C-core and Node do this correctly.)

BTW, why are Go's child names of the form "priority-{X}-{Y}"? I see that Y is the child number here, but what is X? If X is the priority, then that should not actually be part of the name at all, because we want a specific child to be able to move between priorities. In C-core, the child name is just "child{N}", which is what the doc specifies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why are Go's child names of the form "priority-{X}-{Y}"? I see that Y is the child number here, but what is X?

Yes, X is the priority. That is also unfortunate how it ended up being implemented. I'll clean it up soonish.

because that requires subchannel pooling. Only C-core and Node do this correctly

Yes, yes, yes :)

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we (all of us) have a plan for cluster_manager? Are we going to make a separate gRFC for that?

@markdroth
Copy link
Member

Do we (all of us) have a plan for cluster_manager? Are we going to make a separate gRFC for that?

I don't think we need any changes there at this point. Arguably, the caching there is a gray area, because I don't think any gRFC ever specified that we would actually have caching there, but all of our implementations actually do. But on the other hand, I don't think the caching in that policy is actually causing any problems right now. So my inclination is to wait until we have a problem before changing that.

@apolcyn apolcyn marked this pull request as ready for review March 19, 2026 19:15
[[AA, BB, CC], [DD]] => [priority-0-3 priority-0-4]
[[BB, CC], [AA, DD]] => [priority-0-0 priority-0-1]
[[AA, BB, CC], [DD, EE]] => [priority-0-1 priority-0-2]
[[BB, CC], [AA, DD]] => [priority-0-2 priority-0-3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case you have now in the gRFC, in the third update, the algorithm will use child number 3 for [AA, BB] and reuse child number 2 for [CC, DD], because it was already using that number for [CC] in the second update.

If what you're seeing in Go matches what you currently have in the doc, then I suspect Go's implementation of this algorithm isn't quite right. @easwars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants