Skip to content

A102: xDS GrpcService Support#510

Open
markdroth wants to merge 15 commits intogrpc:masterfrom
markdroth:xds_grpc_service
Open

A102: xDS GrpcService Support#510
markdroth wants to merge 15 commits intogrpc:masterfrom
markdroth:xds_grpc_service

Conversation

@markdroth
Copy link
Member

No description provided.

@markdroth markdroth requested review from dfawley and ejona86 August 22, 2025 00:01
@markdroth
Copy link
Member Author

CC @sergiitk

for this right now but could add it later if needed.
- `channel_args`: Ignored. Not supportable across languages in gRPC.
- [`timeout`](https://github.com/envoyproxy/envoy/blob/7ebdf6da0a49240778fd6fed42670157fde371db/api/envoy/config/core/v3/grpc_service.proto#L308):
If set, this will be used to set the deadline on RPCs sent to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any recommendations for a default value for this timeout, when unset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If unset, the RPC will not have any deadline set, just like if you use the normal gRPC client API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this coming up at some point, but my memory is vague. Do we plan to cap this timeout at the timeout value specified on the client RPC, or do we want this to be completely independent? And what if the client RPC is cancelled or times out while this is still ongoing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember discussing this before, but these are good questions.

Capping this based on the deadline of the data plane RPC feels a little clunky to me, for two reasons:

  • From an xDS perspective, it isn't really clear what the data plane RPC's deadline is. Envoy doesn't actually have a single deadline timer the way that gRPC does; instead, it has a variety of different timeout knobs, so it's much less clear which one(s) would be used for this kind of propagation. So this behavior would probably be specific to gRPC.
  • Even in gRPC, we could not do this kind of propagation in every case, because there are cases like RLQS where the side-channel RPC is a long-running stream, not directly associated with individual data plane RPCs. But we could say that we could do this on a case-by-case basis, so it would work for things like ext_authz and ext_proc, even though we wouldn't do it for RLQS.

I think for now, let's not try to do this deadline propagation. But I am open to adding it later if we need it for some reason.

The second question is comparatively easier to answer, but the answer is different in each case, similar to what I described in the second bullet above. In a case like RLQS, where the side-channel call is not associated with any one individual data plane RPC, the termination of the data plane RPC does not affect the side-channel call in any way. However, in a case like ext_authz or ext_proc, where the side-channel RPC is one-to-one with the data plane RPC, we would absolutely terminate the side-channel RPC when the data plane RPC fails. But I don't think we need to note that here, because that should be a facet of the ext_authz and ext_proc designs, not of this design.

- `envoy.extensions.grpc_service.channel_credentials.local.v3.LocalCredentials`
- `envoy.extensions.grpc_service.channel_credentials.tls.v3.TlsCredentials`:
In this message:
- `root_certificate_provider`: Required. References certificate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: supporting the certificate providers here will be a bit hard in Java, because XdsChannelCredentials can delay the initial loading of credentials, but TlsChannelCredentials can't. We'll need to figure out a solution. (But it's not like there's any sensible alternatives, so we just have to figure something out. We'll probably create a new ChannelCredentials type specific to Netty for this.)

documentation](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#google.protobuf.Duration),
and it must have a positive value.
- [`initial_metadata`](https://github.com/envoyproxy/envoy/blob/7ebdf6da0a49240778fd6fed42670157fde371db/api/envoy/config/core/v3/grpc_service.proto#L315):
If present, specifies headers to be added to RPCs sent to the side-channel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we talk to security team about this feature? I thought setting header configuration in general was restricted to trusted sources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever explicitly decided that anything that sets headers is security-sensitive (although we have certainly talked about that idea several times). But in any case, I'm happy to get input on this from @matthewstevenson88 and @gtcooke94.

ctiller pushed a commit to grpc/grpc that referenced this pull request Feb 4, 2026
As per gRFC A102 (grpc/proposal#510).

Closes #41253

PiperOrigin-RevId: 865432750
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small nits & a question whose answer could potentially be explained better in the doc instead of just a response to the comment. But otherwise LGTM.

Comment on lines +38 to +41
[A77]: https://github.com/grpc/proposal/pull/414
[A81]: A81-xds-authority-rewriting.md
[A92]: https://github.com/grpc/proposal/pull/481
[A93]: https://github.com/grpc/proposal/pull/484
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these PR links will want to be updated later. :(

// List of channel creds. Client will stop at the first type it
// supports. This field is required and must contain at least one
// channel creds type that the client supports.
"channel_creds": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this explicitly only transport credentials, or is it allowed to be transport+call credentials together? IIUC Java bundles the two together into its ChannelCredentials. I'm not sure about C++.

In Go we don't call anything "channel credentials" -- we have TransportCredentials (non per-call creds), PerRPCCredentials (call creds) and a credentials.Bundle which supports both kinds in one object similar to (IIUC) Java's ChannelCredentials.

Copy link
Member Author

@markdroth markdroth Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this works in C++ (and I think Java is similar), there are only two types of creds objects, transport creds (called "channel creds" on the client side and "server creds" on the server side) and call creds. There is a special channel creds type called composite credentials that allows you to attach a call creds object to a channel creds object. But note that composite creds is not a third type of object; it's actually just a channel creds implementation that contains both a channel creds object and a call creds object.

This field is intended to configure channel creds objects. It's conceivably possible that a given channel creds type actually returns a composite creds object here, but that's not the normal case. (We do actually wind up using a composite creds object here for GoogleDefaultCreds, but that's not something we're doing specially here -- it just works out that way because the underlying C++ GoogleDefaultCreds API happens to generate a composite channel creds object.)

In most cases, I would expect the configuration to specify the channel creds and call creds separately. The gRPC implementation will most likely implement that by creating a composite creds object to bundle the channel creds and call creds together, but that's an implementation detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention: I don't feel the need to explicitly spell out the above explanation in this gRFC, because the fields here are basically exactly the same thing as the existing fields that we already use to specify the xDS server itself, as per gRFCs A27 and A97.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implication is that Go needs to use the creds Bundle type here.

We probably should have somewhere in our documentation where we explain what a "channel credential" is / is capable of, but I agree this isn't the doc for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that whatever type Go is already using for this field in the xDS server specification, it would also use here. The parsing code for the two fields should be identical.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have comments, but my biggest concern is still the security impact of arbitrary headers. It seems there's not been any discussion on that front.

for this right now but could add it later if needed.
- `channel_args`: Ignored. Not supportable across languages in gRPC.
- [`timeout`](https://github.com/envoyproxy/envoy/blob/7ebdf6da0a49240778fd6fed42670157fde371db/api/envoy/config/core/v3/grpc_service.proto#L308):
If set, this will be used to set the deadline on RPCs sent to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we envisioning this timeout would be implemented. We can't use service config, because that's necessary to use retries. So this would be an interceptor, or manually specified by each GrpcService user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting that it will be up to each individual caller to implement this. For example, for the ext_authz or ext_proc filters, the parsed form of the GrpcService proto will be part of the parsed filter config. The filter can use that data when creating both the channel and the call.

That having been said, I don't object if implementations want to provide some sort of common library for this that can be used by any filter that uses GrpcService.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting that it will be up to each individual caller to implement this.

I had thought everything in this would be generic, as that's how it is written. If it is per-caller, that means we'll need to say in every gRFC that uses this one, "and you must implement the specialized parts of GrpcService." So we're definitely going to want to be able to enumerate such responsibilities. And if there's things that are supposed to be done by each caller, that needs to be called out in the design, lest I claim I implemented this feature without doing anything by just saying it'll be done in the filters.

That having been said, I don't object if implementations want to provide some sort of common library for this that can be used by any filter that uses GrpcService.

I'd like that, but I also suspect you've not been thinking about that too much. In particular, that gets more complicated when combined with extra restrictions added from ext_authz/ext_proc:

We do not want to recreate this channel on LDS updates, unless the
target URI or channel credentials changes. The ext_authz filter will
use the filter state retention mechanism described in [A83] to retain
the channel across updates.

When getting a config update, we need to reuse the Channel, but not whatever infrastructure is surrounding the Channel to provide these other features. It seems that needs to be called out in this design, as it is a constraint this subsystem will need to support. Java and Go could do this with interceptors (or interceptor-like things), but that doesn't seem possible in C++. Again, just implementing what is written in this gRFC I would have assumed C++ would have a utility that consumes parsed GrpcService and produces a Channel, but apparently such an implementation would not actually satisfy the goals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out! I've added a section describing the parsed form of the proto, which should hopefully clarify the intent here.

Basically, I am proposing the following division of responsibilies here:

  • The GrpcService parsing code handles all of the interaction with the bootstrap file -- it looks at the trusted_xds_server bit and decides whether to use the target and credentials in the GrpcService proto or whether to look at the allowed_grpc_services map. Either way, if the proto passes validation, the parsed form will contain the target URI and credentials. It will also include the timeout and initial metadata.
  • The component that uses that parsed form (e.g., ext_authz or ext_proc) is responsible for creating the channel using the information described in the parsed form. If it gets a config update that changes the target URI or credentials in the parsed form, then it will be responsible for recreating that channel. Interacting with the filter state retention mechanism will also be the responsibility of this component. This component is also responsible for using the timeout and initial metadata when it creates a call on the channel.

With this approach, the security-sensitive part is implemented just once, not separately in each filter. It's true that individual filters need to actually use that info to create the channel and to create calls on that channel, but that isn't really security-sensitive and doesn't seem like a lot of code, so I don't feel the need to mandate it to be shared between filters -- I could imagine some implementations finding it convenient to have some common code here, and I could imagine others where this logic is too deeply integrated into the filter's functionality to make that worthwhile.

With regard to filter state retention, I specifically do not want to build that in here, because there are going to be cases where GrpcService is used that are not actually part of a filter. As an example, note that the lrs_server field in CDS may actually contain a GrpcService proto, and we could decide to support that at some point if we needed to. I think that interaction is better left in the individual filters that need it (although again, I don't object if implementations wind up with some common code for that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GrpcService parsing code...

Shared parsing code. Sounds great.

The component that uses that parsed form (e.g., ext_authz or ext_proc) is responsible for creating the channel using the information described in the parsed form.

That is not said anywhere in the text. And while that may be obvious in C, it is not obvious in Java, because we can set all that stuff in an interceptor. With the current text, I'd expect there to be shared code to support all fields that this gRFC claims to support.

With regard to filter state retention, I specifically do not want to build that in here,

I wasn't saying to do filter state retention here. I was saying that this was written ignoring filter state retention, even though certain obvious solutions are not permitted because it would be incompatible with your filter state retention design.

Even now, it would seem I could implement channel creation via a Channel createChannel(XdsGrpcService) that is shared across the filters and zero responsibility for filters to set deadline/headers. But in truth, that would be useless for the filters (without a more complex API), because it'd re-create the Channel with changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I was thinking that we didn't need to explicitly state whether something uses shared code or not if we're not mandating it either way, but I've added text to clarify that implementations have this choice.

FWIW, note that even in C-core, we could write a filter that imposes the per-RPC settings automatically. However, I'm just not sure it's worth the bother, since it's really trivial for each component to do the right thing. But we'll see how this turns out as we get further into our implementation.

for this right now but could add it later if needed.
- `channel_args`: Ignored. Not supportable across languages in gRPC.
- [`timeout`](https://github.com/envoyproxy/envoy/blob/7ebdf6da0a49240778fd6fed42670157fde371db/api/envoy/config/core/v3/grpc_service.proto#L308):
If set, this will be used to set the deadline on RPCs sent to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting that it will be up to each individual caller to implement this.

I had thought everything in this would be generic, as that's how it is written. If it is per-caller, that means we'll need to say in every gRFC that uses this one, "and you must implement the specialized parts of GrpcService." So we're definitely going to want to be able to enumerate such responsibilities. And if there's things that are supposed to be done by each caller, that needs to be called out in the design, lest I claim I implemented this feature without doing anything by just saying it'll be done in the filters.

That having been said, I don't object if implementations want to provide some sort of common library for this that can be used by any filter that uses GrpcService.

I'd like that, but I also suspect you've not been thinking about that too much. In particular, that gets more complicated when combined with extra restrictions added from ext_authz/ext_proc:

We do not want to recreate this channel on LDS updates, unless the
target URI or channel credentials changes. The ext_authz filter will
use the filter state retention mechanism described in [A83] to retain
the channel across updates.

When getting a config update, we need to reuse the Channel, but not whatever infrastructure is surrounding the Channel to provide these other features. It seems that needs to be called out in this design, as it is a constraint this subsystem will need to support. Java and Go could do this with interceptors (or interceptor-like things), but that doesn't seem possible in C++. Again, just implementing what is written in this gRFC I would have assumed C++ would have a utility that consumes parsed GrpcService and produces a Channel, but apparently such an implementation would not actually satisfy the goals.

- parameters to use when creating the side channel:
- target URI
- channel credentials
- call credentials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have to be in the "parameters to use when creating an RPC," since changes to the call credentials aren't supposed to re-create the channel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if changes to call creds cause us to recreate the channel. I don't expect that will happen very often.

I've added text to clarify that implementations can apply call creds either at channel creation time or on a per-call basis.

for this right now but could add it later if needed.
- `channel_args`: Ignored. Not supportable across languages in gRPC.
- [`timeout`](https://github.com/envoyproxy/envoy/blob/7ebdf6da0a49240778fd6fed42670157fde371db/api/envoy/config/core/v3/grpc_service.proto#L308):
If set, this will be used to set the deadline on RPCs sent to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GrpcService parsing code...

Shared parsing code. Sounds great.

The component that uses that parsed form (e.g., ext_authz or ext_proc) is responsible for creating the channel using the information described in the parsed form.

That is not said anywhere in the text. And while that may be obvious in C, it is not obvious in Java, because we can set all that stuff in an interceptor. With the current text, I'd expect there to be shared code to support all fields that this gRFC claims to support.

With regard to filter state retention, I specifically do not want to build that in here,

I wasn't saying to do filter state retention here. I was saying that this was written ignoring filter state retention, even though certain obvious solutions are not permitted because it would be incompatible with your filter state retention design.

Even now, it would seem I could implement channel creation via a Channel createChannel(XdsGrpcService) that is shared across the filters and zero responsibility for filters to set deadline/headers. But in truth, that would be useless for the filters (without a more complex API), because it'd re-create the Channel with changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants