Skip to content

Add spiffe:svid selector#6865

Open
kfox1111 wants to merge 2 commits intospiffe:mainfrom
kfox1111:svid-selector
Open

Add spiffe:svid selector#6865
kfox1111 wants to merge 2 commits intospiffe:mainfrom
kfox1111:svid-selector

Conversation

@kfox1111
Copy link
Copy Markdown
Contributor

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality
When you setup a node alias, it is difficult to match the workload without carefully looking at the documenation of the particular node attestor and see what selectors are unique. This is very hard to automate.

Description of change
Allow easy automation by making the node svid always available as a selector.

When you setup a node alias, it is difficult to match the workload
without carefully looking at the documenation of the particular
node attestor and see what selectors are unique. This is very hard
to automate. Allow easy automation by making the node svid
always available as a selector.

Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
@sorindumitru
Copy link
Copy Markdown
Collaborator

If you know the node SPIFFE ID can you not just create the registration entry with the parent id set to the node SPIFFE ID?

@kfox1111
Copy link
Copy Markdown
Contributor Author

If you know the node SPIFFE ID can you not just create the registration entry with the parent id set to the node SPIFFE ID?

No, because a node alias needs the parent id to be spiffe://<trustdomain>/spire/server.
workloads have parentid's that are node spiffe ids.

@sorindumitru
Copy link
Copy Markdown
Collaborator

@kfox1111 could you share an example of how this would be used?

@kfox1111
Copy link
Copy Markdown
Contributor Author

kfox1111 commented Apr 13, 2026

The exact use case is complicated, but reasonable. I'll try and lay it out

Say you want to use a stronger node attestor then the k8s one. For example, you want to attest with a TPM and are running them as VMs in the cloud. It is very hard before hand to determine what the TPM pubhash is going to be. let alone have a helper app understand that a node spiffeid of spiffe://<trustdomain>/spiffe/attestor/tpm/:xxx will map to seelctor tpm:pub_hash:xxx

for spire-controller-manager to map a pod (workload) to the parentid of the node it is on, it only has access to the k8s node document that the pod is running on. In addition, lots of properties of the node object are directly controllable by the node itself so are untrustworthy.

While the parentid setting is templated in spire-controller-manager, the limited set of data it can use to do the template means its best to set it to something simple and trustworthy. like spiffe://{{ .TrustDomain }}/spire/controller-manager/{{ .Node.MetaData.UUID}}

so, how do we autogenerate a link between what spire-controller-manager needs as input entries and the unknown spiffeids nodes are going to get?

Node aliases can really help with this.

We write a helper client/server pair. How does this work:

  1. the client runs on each node with a kubelet. it connects to the server using the node svid and passes a current k8s psat for its daemonset/serviceaccount.
  2. the server accepts connections secured with valid spiffe svids. it accepts the k8s psat and verifies its valid for the current k8s cluster, from the right serviceaccount/daemonset, and extracts the node name from the token. It then creates/updates a ClusterStaticEntry of the form:
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterStaticEntry
metadata:
  name: node-{{ .Node.MetaData.UUID }}
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: {{ .Node.MetaData.Name }}
    uid: { .Node.MetaData.UUID }}
    blockOwnerDeletion: true
    controller: false
spec:
  parentID: spiffe://{{ .TrustDomain }}/spire/server
  spiffeID: spiffe://{{ .TrustDomain }}/spire/controller-manager/{{ .Node.MetaData.UUID}}
  selectors:
  - svid:path:<path from node's real svid>

the spire-controller-manager will sync these entries automatically to the spire-server, so the rest service does not need access to the spire server directly, increasing security. validating policies can also be written to further restrict what it can do for increased security.

Because it uses the svid:path, it can be made to work no matter what node attestor is being used on that particular node, nor does the node id needs to be discovered beforehand, or special knowlege of how to map a particular node svid to a set of selectors that works for that particular type of node attestor.

now you have a valid trust path from a pod all the way to a specific node attestor on the node it is running on, without human intervention, so stuff like autoscaling of k8s cluster nodes running in aws secured with tpms "just works".

@sorindumitru
Copy link
Copy Markdown
Collaborator

sorindumitru commented Apr 14, 2026

Would it be possible to get the same effect by making it possible to specify the parent_id property on the ClusterSPIFFEID template and adding a nodeSelector field/filter to limit it's use on a specific node?

Seems somewhat cleaner and you effectively get what you have here.

I'm not opposed to having this, but I want to make sure it's actually needed.

@kfox1111
Copy link
Copy Markdown
Contributor Author

Would it be possible to get the same effect by making it possible to specify the parent_id property on the ClusterSPIFFEID template and adding a nodeSelector field/filter to limit it's use on a specific node?

Seems somewhat cleaner and you effectively get what you have here.

I'm not opposed to having this, but I want to make sure it's actually needed.

No. A ClusterSPIFFEID already is bound to a node because a pod is already bound to a node by design of pods. a node selector doesn't help. And we don't want to be dynamically creating ClusterSPIFFEID objects as nodes come and go either. They are static templates for dynamic pods.

But, just assuming it would, having a node selector wouldn't change the problem that the spire-controller-manager does not know what the node spiffeid is or that once the node spiffeid is known, its unknown how to map it to the node attestors particular quirky selectors.

Using a clusterspiffeid lets us node alias the needed information for the first problem, but without a clear selector to target, it would need a heuristic ball of code to parse spiffeids and guesstimate the needed selector. I'd really prefer not to guesstimate in a security system, hence this patch.

Besides, I think this could probably be useful outside of k8s as well.

@sorindumitru
Copy link
Copy Markdown
Collaborator

But, just assuming it would, having a node selector wouldn't change the problem that the spire-controller-manager does not know what the node spiffeid is or that once the node spiffeid is known, its unknown how to map it to the node attestors particular quirky selectors.

Using a clusterspiffeid lets us node alias the needed information for the first problem, but without a clear selector to target, it would need a heuristic ball of code to parse spiffeids and guesstimate the needed selector. I'd really prefer not to guesstimate in a security system, hence this patch.

As I see it you need to have something dynamically create the static entry CRD anyway so if it does that or it creates a ClusterSPIFFEID, it's really the same. You could add a different CRD, NodeSPIFFEID which might be clearer.

Regarding needing to know the SPIFFE ID, don't you also have that problem in your current proposal? You need to be able to figure out that selector from somewhere.

@kfox1111
Copy link
Copy Markdown
Contributor Author

As I see it you need to have something dynamically create the static entry CRD anyway so if it does that or it creates a ClusterSPIFFEID, it's really the same. You could add a different CRD, NodeSPIFFEID which might be clearer.

its not creating an object per pod though. its creating an object per node, to match up with the per pod object, so its not really the same.

A different CRD like NodeSPIFFEIDMap might work for this case. But would require a whole other reconciler in spire-controller-manager to handle. I'm not apposed to coding that up if the spiffe:svid selector is unacceptable though.

But its a lot of work, while I think this proposal solves a greater issue. How do you easily make a node alias without intricate knowledge of the node attestors selector inner workings? My k8s example is just my immediate use case I'm trying to solve, while that question shouldn't go unanswered.

Regarding needing to know the SPIFFE ID, don't you also have that problem in your current proposal? You need to be able to figure out that selector from somewhere.

The proposed workflow solves that piece.

The remaining problem is how to map a node spiffe id to a selector in a reliable way. which this patch does. Any node spiffe id can be mapped to selector spiffe:path:<path from node spiffeid> 1 to 1, for always reliable aliasing without special agent knowledge.

@sorindumitru
Copy link
Copy Markdown
Collaborator

its not creating an object per pod though. its creating an object per node, to match up with the per pod object, so its not really the same.

What I'm proposing is also an object per node. The ClusterSPIFFEID object will specify:

  • To use a specific SPIFFE ID template for all pods on a specific node
  • To use a specific parent_id for those registration entries, instead of what is uses now.

The remaining problem is how to map a node spiffe id to a selector in a reliable way. which this patch does. Any node spiffe id can be mapped to selector spiffe:path:<path from node spiffeid> 1 to 1, for always reliable aliasing without special agent knowledge.

I'm not sure you'd need/want to do this. You can just use parent_id directly.

@kfox1111
Copy link
Copy Markdown
Contributor Author

kfox1111 commented Apr 14, 2026

I think we are too far into the weeds of the particular implementation of a particular use case wanting the feature. I can solve that multiple ways.

The only question right now I think; Is this PR a reasonable feature for SPIRE, yes, or no. If it is a reasonable feature, I'm going to actually make use of it right away as its my preferred solution. If it gets blocked, I will implement around its lack. It will be more painful, but I can manage.

But everyone can manage with the lack of this feature too. Its just a lot more work for everyone needing to make node aliases.

Do we fix this usability issue?

@kfox1111
Copy link
Copy Markdown
Contributor Author

Ok. Had a conversation on slack and have some more details I think that have lead to confusion.

SPIRE's taken several shortcuts and has limitations over time that have lead to some strangeness, and it is not well documented. So lets lay some of it out here, and maybe that might help clarify things.

nodes can have aliases.

on the cli, this is done with the node flag like:
spire-server entry create -node --spiffeID spiffe://example.org/some/node --selectors tpm:pub_hash:123456789abcdefghijklmnop

this is actually shorthand for:
spire-server entry create -parentID spiffe://example.org/spire/server --spiffeID spiffe://example.org/some/node --selectors tpm:pub_hash:123456789abcdefghijklmnop

The records in the db are the same and follow the latter format. To build node aliases using the k8s ClusterStaticEntry, it would look similar:

  parentID: spiffe://{{ .TrustDomain }}/spire/server
  spiffeID: spiffe://{{ .TrustDomain }}/spire/controller-manager/{{ .Node.MetaData.UUID}}
  selectors:
  - tpm:pub_hash:123456789abcdefghijklmnop

You can hang workloads off of each other, but you can't skip a selector (non optional), it will generate x509 certs when unneeded, if you give it a dummy selector there is a non zero risk it might accidently match some day or its behavior is undefined in the documentation so this has not been as good a solution as node aliases as they are pure links in the chain without real identities.

most of my entries for a node look like:

spire-server entry create -node --spiffeID spiffe://example.org/some/node --selectors tpm:pub_hash:1b5bbe2e96054f7bc34ebe7ba9a4a9eac5611c6879285ceff6094fa556af485c

and then for workloads:

spire-server entry create --parentID spiffe://example.org/some/node --spiffeID spiffe://example.org/workload/foo --selectors systemd:id:foo.unit

This allows me to easily update just the node alias if I ever need to change out the tpm on that node, or change the attestor from one to another, like http_challenge -> tpm without having to change all the workload entries on it. This does require you to understand the weird formatting of node selectors though.

The main thing this shortcut has done though, is a node alias's parentID is consumed as a flag setting the entry as a node alias rather then letting it naturally have the node's spiffeid in that location and skipping the requirement of node selectors.

So, another way to really solve this would be to add a new entry flag labeling the entry as a node alias. if true, the parentID would actually be the node spiffeid and the requirement of selectors would be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants