AEP-7571: Pod-level resources support in VPA by iamzili · Pull Request #8586 · kubernetes/autoscaler

iamzili · 2025-09-29T18:39:29Z

What type of PR is this?

/kind documentation
/kind feature
/area vertical-pod-autoscaler

What this PR does / why we need it:

Autoscaling Enhancement Proposal (AEP) for pod-level resources support in VPA.

Related ticket from which this AEP originated: Issue

More details about pod-level resources can be found here:

I'd love to hear your thoughts on this feature.

k8s-ci-robot · 2025-09-29T18:39:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iamzili
Once this PR has been reviewed and has the lgtm label, please assign adrianmoisey for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

vertical-pod-autoscaler/enhancements/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-09-29T18:39:39Z

Hi @iamzili. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

adrianmoisey · 2025-09-29T18:56:24Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+## Summary
+
+Starting with Kubernetes version 1.34, it is now possible to specify CPU and memory `resources` for Pods at the pod level in addition to the existing container-level `resources` specifications. For example:


It may be worth linking the KEP here

I'm linking the KEP and the official blog post a little further down: here

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

adrianmoisey · 2025-10-06T17:23:59Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+This section describes how VPA reacts based on where resources are defined (pod level, container level or both).
+
+Before this KEP, the recommender computes recommendations only at the container level, and VPA applies changes only to container-level fields. With this proposal, the recommender also computes pod-level recommendations in addition to container-level ones. Pod-level recommendations are derived from per-container usage and recommendations, typically by aggregating container recommendations. Container-level policy still influences pod-level output: setting `mode: Off` in `spec.resourcePolicy.containerPolicies` excludes a container from recommendations, and `minAllowed`/`maxAllowed` bounds continue to apply.


Just want to sanity check this a little.

typically by aggregating container recommendations

From what I can tell, the metric that metric-server provides is per-container.

So the idea is to leave the recommender as is, making per-container recommendations based on its per-container metric, and let the updater/admission-controller use an aggregated value for the Pod resources.

Is my understanding here right?

Partially, since the recommender will calculate the pod-level recommendations (from your comment, it seems that the updater/admission controller would do that). My plan is to continue relying on the current approach for collecting and aggregating container-level metrics, as well as for generating per-container recommendations.

The difference introduced by this AEP is that if a pod-level resources stanza is defined at the workload API level, the recommender will also calculate pod-level recommendations, which are simply the sum of the container recommendations. The pod-level recommendations will be stored in the status.recommendation.podRecommendation stanza of the VPA object (new!).

The updater and the admission controller will read from status.recommendation.podRecommendation (and of course from status.recommendation.containerRecommendations) to perform their actions - the updater will evict pods or perform in-place container-level updates, while the admission controller will modify pod specs on the fly.

Partially, since the recommender will calculate the pod-level recommendations (from your comment, it seems that the updater/admission controller would do that).

I was just making an assumption. If I'm hearing you right, you want the recommender to create the pod recommendation, and store it in the VPA resource, which makes more sense than my assumption.

The difference introduced by this AEP is that if a pod-level resources stanza is defined at the workload API level, the recommender will also calculate pod-level recommendations, which are simply the sum of the container recommendations. The pod-level recommendations will be stored in the status.recommendation.podRecommendation stanza of the VPA object (new!).

The updater and the admission controller will read from status.recommendation.podRecommendation (and of course from status.recommendation.containerRecommendations) to perform their actions - the updater will evict pods or perform in-place container-level updates, while the admission controller will modify pod specs on the fly.

Makes sense!

adrianmoisey · 2025-10-06T17:25:53Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+- Extend the VPA object:
+  1. Add a new `spec.resourcePolicy.podPolicies` stanza. This stanza is user-modifiable and allows setting constraints for pod-level recommendations:
+     - `controlledResources`: Specifies which resource types are recommended (and possibly applied). Valid values are `cpu`, `memory`, or both. If not specified, both resource types are controlled by VPA.
+     - `controlledValues`: Specifies which resource values are controlled. Valid values are `RequestsAndLimits` and `RequestsOnly`. The default is `RequestsAndLimits`.
+     - `minAllowed`: Specifies the minimum resources that will be recommended for the Pod. The default is no minimum.
+     - `maxAllowed`: Specifies the maximum resources that will be recommended for the Pod. The default is no maximum. To ensure per-container recommendations do not exceed the Pod's defined maximum, apply the formula to adjust the recommendations for containers proposed by @omerap12 (see [discussion](https://github.com/kubernetes/autoscaler/issues/7147#issuecomment-2515296024)). This field takes precedence over the global Pod maximum set by the new flags (see "Global Pod maximums").
+  2. Add a new `status.recommendation.podRecommendation` stanza. This field is not user-modifiable, it is populated by the VPA recommender and stores the Pod-level recommendations. The updater and admission controller use this stanza to read Pod-level recommendations. The updater may evict Pods to apply the recommendation, the admission controller applies the recommendation when the Pod is recreated.


Would it be possible to have an example Go Type here?

adrianmoisey · 2025-10-06T17:26:49Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+## Proposal
+
+- Add a new feature flag named `PodLevelResources`. Because this proposal introduces new code paths across all three VPA components, this flag will be added to each component.


Is this a feature flag to assist with GAing the feature, or is it a flag to enable/disable the feature?

My intention is to use the flag to enable or disable the feature. In other words, the feature should be disabled by default at first, and once the feature matures, it can be enabled by default starting from a specific VPA version.

Could you please clarify what you mean by using the flag for GAing the feature?

The normal pattern for Kubernetes is to use a feature gate to introduce a new feature. Normally it works like this across many releases:

First release - add a feature gate as alpha - defaulted to off

Second release - promote to beta - default to on

Third release - promote to GA - locked to on

A few releases later (3 I think) - remove feature gate logic completely

This is mostly for the kubernetes components to handle roll forward/back gracefully.
I think the main thing it protects is if a user starts using the feature in the beta mode, if they roll back 1 release, that feature would continue to work (ie: the APIs would be valid) since the logic exists in the alpha mode.

Thanks for the explanation - I appreciate it! Based on your comment, the feature flags (there will be a new one for each component) will serve both purposes, i.e. GAing and enabling/disabling the feature.

Right, the point of feature gates in Kubernetes is to eventually remove them. enabling/disabling the feature should be driven by the API

jackfrancis · 2025-10-06T19:35:52Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+For workloads that define only pod-level resources, VPA will control resources at the pod level. At the time of writing, [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) is not available for pod-level fields, so applying pod-level recommendations requires evicting Pods. 
+
+When [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) becomes available, VPA should attempt to apply pod-level recommendations in place first and fall back to eviction if in-place updates fail, mirroring the current `InPlaceOrRecreate` behavior used for container-level updates.


Because this AEP has a dependency on the functionality described in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize, can we restate the language as if https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize is already implemented, and then add a note that we won't approve this AEP until post-1.35 (when in-place resizing of pod-level resources has been implemented)?

I thought that this AEP isn't dependant on that feature, it's calling out that we can't do in-place resizing until that KEP is ready

Let's remove this section, there is no connection between the current AEP and the in-place feature.
This AEP should focus on pod level resources only.

as @omerap12 suggested, I removed the parts mentioning the In-Place Pod-Level Resources Resize, and kept only a note stating that we should leverage it once it becomes available

Feel free to resolve the conversation if applicable.

omerap12

Really thanks for the hard work here Erik!
To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here.
I left couple of notes throughout the proposal.
Can we please remove the in-place feature from this AEP?
This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

omerap12 · 2025-10-07T09:14:58Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+     - `controlledResources`: Specifies which resource types are recommended (and possibly applied). Valid values are `cpu`, `memory`, or both. If not specified, both resource types are controlled by VPA.
+     - `controlledValues`: Specifies which resource values are controlled. Valid values are `RequestsAndLimits` and `RequestsOnly`. The default is `RequestsAndLimits`.
+     - `minAllowed`: Specifies the minimum resources that will be recommended for the Pod. The default is no minimum.
+     - `maxAllowed`: Specifies the maximum resources that will be recommended for the Pod. The default is no maximum. To ensure per-container recommendations do not exceed the Pod's defined maximum, apply the formula to adjust the recommendations for containers proposed by @omerap12 (see [discussion](https://github.com/kubernetes/autoscaler/issues/7147#issuecomment-2515296024)). This field takes precedence over the global Pod maximum set by the new flags (see "Global Pod maximums").


Thanks for catching that! (I forgot I wrote that TBH ) :)

My formula should be correct, but what happens if after the normalization of the container[i] resources we get a value which is little/bigger than the minAllowed/maxAllowed?
I thought we can do something like that:

- If adjusted[i] < container.minAllowed[i]: set to minAllowed[i] - If adjusted[i] > container.maxAllowed[i]: set to maxAllowed[i]

And then we need to re-check pod limits after container policy adjustments ( since it might be bigger ).
If we are still exceeding pod limits - what we wanna do here?
cc @adrianmoisey

Sorry if I wasn't clear enough :)

An individual container limit can't be larger than the pod-level limit, but the aggregated container-level limits can exceed the pod-level limit - Ref.

So, when a new pod-level recommendation is calculated and the limit is set proportionally at the pod level, we also need to check the container-level limits. If a container-level limit is greater than the pod-level limit, it should be set to the same value as the pod-level limit, and the calculated container-level recommendation should be reduced proportionally as well to maintain the original request to limit ratio (similar to how it works when a LimitRange API object is in place).

Yup. precisely!

omerap12 · 2025-10-07T09:34:38Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+### Test Plan
+
+TODO


In order for this AEP to merged this has to be filled ( I know it's a WIP but just a remainder ) :)

the missing test plan has been addressed

Co-authored-by: Adrian Moisey <adrian@changeover.za.net>

iamzili · 2025-10-07T18:52:53Z

Really thanks for the hard work here Erik! To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here. I left couple of notes throughout the proposal. Can we please remove the in-place feature from this AEP? This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

I would also prefer option 2 (control both pod-level and initially set container-level resources). BTW when do you think a decision will be made to go with this option? We will need to update the AEP to reflect the chosen approach. Once the decision is final, I also plan to add more details.

Furthermore, why are you suggesting that the pod-level resources in-place resize related parts should be removed from this AEP? Since this AEP focuses on the pod-level resources stanza, how it can be mutated (or not) seems relevant from the VPA's perspective

omerap12 · 2025-10-08T06:33:58Z

Really thanks for the hard work here Erik! To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here. I left couple of notes throughout the proposal. Can we please remove the in-place feature from this AEP? This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

I would also prefer option 2 (control both pod-level and initially set container-level resources). BTW when do you think a decision will be made to go with this option? We will need to update the AEP to reflect the chosen approach. Once the decision is final, I also plan to add more details.

Furthermore, why are you suggesting that the pod-level resources in-place resize related parts should be removed from this AEP? Since this AEP focuses on the pod-level resources stanza, how it can be mutated (or not) seems relevant from the VPA's perspective

I see your point - it’s not completely independent, since once Kubernetes supports in-place updates for pod-level resources, the VPA will likely extend that support as well (similar to what we already do for container-level in-place updates).

But, the main scope of this AEP is to define how we provide recommendations for pod-level resources. The actual application of those recommendations - whether in-place or through eviction - is more of an implementation detail and doesn’t directly affect the design decisions in this proposal.
We can add a short note about the current state of in-place updates for pod-level resources (KEP-5419
) and mention that future VPA enhancements will align once that functionality is available.

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

jackfrancis · 2025-10-10T19:46:43Z

/release-note-none

k8s-ci-robot · 2025-10-10T19:46:45Z

@jackfrancis: you can only set the release note label to release-note-none if the release-note block in the PR body text is empty or "none".

Details

In response to this:

/release-note-none

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jackfrancis · 2025-10-10T19:48:14Z

/ok-to-test

omerap12 · 2026-01-22T22:35:41Z

pinging this again, since VPA support is a blocker for Pod Level Resources GA.
/retest
/cc @adrianmoisey @iamzili

k8s-ci-robot · 2026-01-22T22:35:44Z

@omerap12: GitHub didn't allow me to request PR reviews from the following users: iamzili.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

pinging this again, since VPA support is a blocker for Pod Level Resources GA.
/retest
/cc @adrianmoisey @iamzili

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

iamzili · 2026-01-25T15:47:16Z

FYI – I have finished updating the AEP, it is ready for review. Thanks in advance!

cc @omerap12 @adrianmoisey

omerap12

Thanks for working on this! these are my initial thoughts

omerap12 · 2026-01-25T16:43:44Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+1. Extend the [GetUpdatePriority](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L45) method to also evaluate pod-level recommendations. The updated method verifies whether pod-level recommendations fall outside the recommended range and calculates the pod-level `resourceDiff`. These checks occur only when container-level recommendations do not set `OutsideRecommendedRange` to true and the container-level `resourceDiff` remains below the threshold.
+2. `[DOESN'T CHANGE]` When the updater adds a Pod to the `UpdatePriorityCalculator`, it marks the Pod for eviction or in-place update based on the VPA mode:
+   1. `[DOESN'T CHANGE]` If the updater evicts the Pod, control passes to the admission controller, and the updater proceeds to the next Pod in the list.
+   2. If the updater selects the Pod for an in-place update, it applies the pod-level and container-level recommendations directly to the running Pod using the in-place mechanism, based on the presence of resource requests in the Pod spec. The algorithm follows the approach proposed in the admission controller subsection - see [Patch Generation Algorithm](#patch-generation-algorithm). If the in-place update fails, the updater falls back to the eviction path.


So we are not changing anything here, right?
To my understanding, this means we calculate the target recommendation for all containers in a pod and then check if the total is different from the current pod resources (by resourceDiff and more).

Maybe we can remove the long paragraph and just keep this one simple sentence?

omerap12 · 2026-01-25T16:45:04Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+This behavior is also described in the [Pod-level Resource Spec KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#admission-controller), which suggests placing Pods in namespaces without any Container LimitRange objects. Therefore this AEP proposes that creation of Pods with Pod-level resources in namespaces containing Container LimitRange objects should be rejected. This validation is further detailed in the [Validation section](#dynamic-validation).
+
+In summary Pods with Pod-level resources should not be validated against Container LimitRange objects.


So are we are saying we would not support that? (It's fine with me as long is it being documented).

yes, at this stage (and this may change in the future if the LimitRanger admission controller updates its behavior for pods with a pod-level resources stanza) we should skip the Pod in both the updater and the admission controller when:

the Pod template defines pod-level resources, and

the Pod is deployed into a namespace that contains container scoped Limitrange objects

However, I don't agree with the current wording in the AEP, as we cannot simply reject Pod creation. Instead, I propose updating the AEP to state that in this situation the updater and the admission controller will skip managing the Pod and emit a log message explaining why (both components), advising the user to move the Pod to a namespace without container level Limitrange object

omerap12 · 2026-01-25T16:48:35Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+      - containerName: "c2"
+        mode: "Off"
+      - containerName: "c3"
+        mode: "Auto"


Since Auto is deprecated I would like it not to appear in AEPs and such. let's switch it to whatever.

this is not the VPA mode you are referring to, it is the mode defined under containerPolicies

Right (https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L252) sorry I was confused.

but I mention the auto vpa mode under the Goals section

omerap12 · 2026-01-25T16:49:42Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+The following example illustrates a more appropriate configuration. The user chooses not to calculate recommendations for the `sidecar` container. As a result, VPA excludes the `sidecar` container from pod-level recommendation calculations. VPA also omits container-level recommendations for this container from the VPA status stanza:
+
+```yaml
+# Target Pod has three containers (c1, c2, and sidecar).
+# Valid VPA object
+apiVersion: autoscaling.k8s.io/v1
+kind: VerticalPodAutoscaler
+metadata:
+  name: workload1
+  namespace: default
+spec:
+  targetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: workload1
+  updatePolicy:
+    updateMode: 'Recreate'
+  resourcePolicy:
+    containerPolicies:
+      - containerName: "sidecar"
+        mode: "Off"
+```
+


Can we merge this example with the above example? I am afraid this document becomes to excessive

omerap12 · 2026-01-25T16:55:48Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+`PodLevelResources` (i.e. the new VPA flag) is supported starting from Kubernetes v1.34, where the beta version of the [PodLevelResources](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#PodLevelResources) feature gate is enabled by default. In this version, all VPA modes are supported except `InPlaceOrRecreate`.
+
+To use the `InPlaceOrRecreate` VPA mode, Kubernetes v1.35 or later is required, and the alpha feature gate [InPlacePodLevelResourcesVerticalScaling](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#InPlacePodLevelResourcesVerticalScaling) must be enabled (introduced in [In-Place Pod-Level Resources Resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize)). If this feature gate is disabled or the user runs an earlier Kubernetes version, in-place updates for pod-level resources will fail, and the updater will fall back to eviction.


Why would the updater will not fallback to eviction? How is it differ from containers when we fallback to eviction?

I'm not sure I fully understand the comment from above:

what I'm proposing is that at this line, if err is non-nil, the pod is evicted when InPlaceOrRecreate mode is used - in other words "we fallback to eviction"

what do you mean by "containers when we fall back to eviction"?

omerap12 · 2026-01-25T16:58:27Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+##### Option 1: VPA Controls Only Pod-Level Resources
+
+With this option, VPA manages only the pod-level resources stanza. To follow this approach, the initially defined container-level resources for `ide` must be removed so that changes in usage are reflected only in pod-level recommendations.
+
+**Pros**:
+* VPA does not need to track which container-level resources were initially set.
+* Enables shared headroom across containers in the same Pod. With container-only limits, a sidecar (`tool1` or `tool2`) or the main workload (`ide` container) hitting its own CPU limit could get throttled even if other containers in the Pod have idle CPU. Pod-level resources allow a container experiencing a spike to access idle resources from others, optimizing overall utilization.
+
+**Cons**: 
+* A downside of this approach is that the most important container (`ide`) may be recreated without container-level resources, leading to an `oom_score_adj` that matches other sidecars in the Pod, as a result the OOM killer may target all containers more evenly under node memory pressure. For details on how `oom_score_adj` is computed when pod-level resources are present, see the [KEP section](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#oom-score-adjustment) on OOM score adjustment.


Since this is not the selected option, can we just mention the option and briefly explain why it was not chosen?

The formal KEP process has an "alternatives" option: https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template#alternatives

omerap12 · 2026-01-25T17:05:19Z

/cc @ndixita

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

adrianmoisey · 2026-01-25T19:06:11Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+	labelSetKey labelSetKey
+	// Containers that belong to the Pod, keyed by the container name.
+	Containers map[string]*ContainerState
+  // Current Pod-level requests (!NEW)


Spacing here is a little weird

adrianmoisey · 2026-01-26T12:23:17Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+##### Option 1: VPA Controls Only Pod-Level Resources
+
+With this option, VPA manages only the pod-level resources stanza. To follow this approach, the initially defined container-level resources for `ide` must be removed so that changes in usage are reflected only in pod-level recommendations.
+
+**Pros**:
+* VPA does not need to track which container-level resources were initially set.
+* Enables shared headroom across containers in the same Pod. With container-only limits, a sidecar (`tool1` or `tool2`) or the main workload (`ide` container) hitting its own CPU limit could get throttled even if other containers in the Pod have idle CPU. Pod-level resources allow a container experiencing a spike to access idle resources from others, optimizing overall utilization.
+
+**Cons**: 
+* A downside of this approach is that the most important container (`ide`) may be recreated without container-level resources, leading to an `oom_score_adj` that matches other sidecars in the Pod, as a result the OOM killer may target all containers more evenly under node memory pressure. For details on how `oom_score_adj` is computed when pod-level resources are present, see the [KEP section](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#oom-score-adjustment) on OOM score adjustment.


The formal KEP process has an "alternatives" option: https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template#alternatives

adrianmoisey · 2026-01-27T12:36:28Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+- Sidecars, such as logging agents or mesh proxies (like `tool1` or `tool2`), that don't use container-level limits can borrow idle CPU from other containers in the pod when they experience a spike in usage. Pod-level resources allow a container experiencing a spike to access idle resources from others, optimizing overall utilization.
+
+**Cons**:
+- Existing VPA users may find the behavior surprising because VPA does not control all container-level resources stanzas - only those initially set.


Hmmm, interesting con.
I believe this may go against the existing API though:

autoscaler/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go

Lines 96 to 104 in 5f6697d

// Controls how the autoscaler computes recommended resources.

// The resource policy may be used to set constraints on the recommendations

// for individual containers.

// If any individual containers need to be excluded from getting the VPA recommendations, then

// it must be disabled explicitly by setting mode to "Off" under containerPolicies.

// If not specified, the autoscaler computes recommended resources for all containers in the pod,

// without additional constraints.

// +optional

ResourcePolicy *PodResourcePolicy `json:"resourcePolicy,omitempty" protobuf:"bytes,3,opt,name=resourcePolicy"`

The current default is to control all containers, regardless of if their resources are set or not

Yes, but the current default behavior - controlling all container-level resources - does not make sense when pod-level resources are present. For example, this does not look correct, and it is not what users expect (what is the benefit of using pod-level resources here?):

from this:

kind: Pod apiVersion: v1 metadata: namespace: default name: mypod spec: resources: requests: cpu: 200m memory: "200Mi" containers: - name: c1 image: registry.k8s.io/pause:3.1 - name: c2 image: registry.k8s.io/pause:3.1

this should not become:

kind: Pod apiVersion: v1 metadata: namespace: default name: mypod spec: resources: requests: cpu: 200m memory: "200Mi" containers: - name: c1 image: registry.k8s.io/pause:3.1 resources: requests: cpu: 180m memory: "180Mi" - name: c2 image: registry.k8s.io/pause:3.1 resources: requests: cpu: 20m memory: "20Mi"

Yes, but the current default behavior - controlling all container-level resources - does not make sense when pod-level resources are present. For example, this does not look correct, and it is not what users expect (what is the benefit of using pod-level resources here?):

Right, I agree that this is strange, but, it's what the user asked for (using the existing defaults set for container resources, and opting in to pod level resources)

I think either way, if the behaviour is surprising, we should try change the API such that the behaviour isn't surprising anymore.

What if we had a way to select if you wanted container, pod or both controlled?

May be something living inside of spec.updatePolicy ?

adrianmoisey · 2026-01-27T12:46:26Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+When a workload is created without any resources defined at either the pod or container level, there are two options:
+
+##### [Selected] Option 1: VPA controls only the container-level resources


This seems to contradict the defaults set in the API:

https://github.com/kubernetes/autoscaler/pull/8586/changes#diff-3fe0ec0541bbb21c23bf40a53dcae34f461a2db784c094f4ad86d9a4f4a86d2fR235-R237

Add a new spec.resourcePolicy.podPolicies stanza. This stanza is user-modifiable and allows setting constraints for pod-level recommendations:
- controlledResources: Specifies which resource types are recommended (and possibly applied). Valid values are cpu, memory, or both. If not specified, both resource types are controlled by VPA.

My opinion is this:
Extend the API to include a "podPolicies", which has a "controlledResources" that defaults to nothing.

If a user wants to opt-in to pod level resources, they need to set spec.resourcePolicy.podPolicies.controlledResources to both memory and cpu.

The awkward part here is that container resources are enabled by default and pod resources are disabled by default. We could chat to someone in api-machinery about how we can safely roll this out with both pod and container being set, without breaking backwards compatibility.

actually this makes sense!

So you are suggesting that the pod-level and container-level resources present in the pod spec would still be relevant, but only because of the request-to-limit ratio. When a user wants VPA to manage pod-level resources, they need to specify this in the new podPolicies.controlledResources stanza (defaults to nothing), is my understanding correct?

Co-authored-by: Adrian Moisey <adrian@changeover.za.net>

iamzili · 2026-03-14T16:45:04Z

@adrianmoisey and @omerap12 I'm requesting a new review for this AEP.

I did a refactor on this AEP based on the ongoing code implementation PR which is 95% ready. I removed tons of stuff from the documentation, so it is not that dense that it was before, furthermore I reworked the whole mechanism as well.

Thanks in advance!

add support for pod-level resources in VPA

b1e87de

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 29, 2025

k8s-ci-robot requested review from omerap12 and voelzmo September 29, 2025 18:39

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2025

iamzili mentioned this pull request Sep 29, 2025

VPA support for pod-level resource specifications #7571

Open

fix clarity

7726b83

adrianmoisey reviewed Oct 6, 2025

View reviewed changes

jackfrancis reviewed Oct 6, 2025

View reviewed changes

omerap12 reviewed Oct 7, 2025

View reviewed changes

fix term

3690bb9

Co-authored-by: Adrian Moisey <adrian@changeover.za.net>

iamzili added 2 commits October 8, 2025 10:10

fix clarity

e79b091

drop in-place feature, fix terms

0d2e16e

adrianmoisey reviewed Oct 9, 2025

View reviewed changes

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md Outdated Show resolved Hide resolved

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 10, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 10, 2025

iamzili mentioned this pull request Dec 16, 2025

REQUEST: New membership for iamzili kubernetes/org#6032

Closed

11 tasks

k8s-ci-robot requested a review from adrianmoisey January 22, 2026 22:35

Merge branch 'kubernetes:master' into support-pod-level-resources-v2

290e30c

iamzili marked this pull request as draft January 23, 2026 13:21

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2026

refactor the AEP

62314bb

iamzili force-pushed the support-pod-level-resources-v2 branch from d711b01 to 62314bb Compare January 25, 2026 15:43

iamzili marked this pull request as ready for review January 25, 2026 15:44

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 25, 2026

k8s-ci-robot requested review from kwiesmueller and omerap12 January 25, 2026 15:44

update table of content

c9a3eca

omerap12 reviewed Jan 25, 2026

View reviewed changes

k8s-ci-robot requested a review from ndixita January 25, 2026 17:05

Merge branch 'kubernetes:master' into support-pod-level-resources-v2

48ac130

adrianmoisey reviewed Jan 27, 2026

View reviewed changes

iamzili and others added 2 commits January 27, 2026 20:48

Apply suggestions from code review

8fbdca8

Co-authored-by: Adrian Moisey <adrian@changeover.za.net>

refactor AEP based on the code implementation PR

8cee70e

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 14, 2026

small improvements

02a3dbe

iamzili requested review from adrianmoisey and omerap12 March 14, 2026 16:46


		## Summary

		Starting with Kubernetes version 1.34, it is now possible to specify CPU and memory `resources` for Pods at the pod level in addition to the existing container-level `resources` specifications. For example:


		This section describes how VPA reacts based on where resources are defined (pod level, container level or both).

		Before this KEP, the recommender computes recommendations only at the container level, and VPA applies changes only to container-level fields. With this proposal, the recommender also computes pod-level recommendations in addition to container-level ones. Pod-level recommendations are derived from per-container usage and recommendations, typically by aggregating container recommendations. Container-level policy still influences pod-level output: setting `mode: Off` in `spec.resourcePolicy.containerPolicies` excludes a container from recommendations, and `minAllowed`/`maxAllowed` bounds continue to apply.


		## Proposal

		- Add a new feature flag named `PodLevelResources`. Because this proposal introduces new code paths across all three VPA components, this flag will be added to each component.


		For workloads that define only pod-level resources, VPA will control resources at the pod level. At the time of writing, [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) is not available for pod-level fields, so applying pod-level recommendations requires evicting Pods.

		When [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) becomes available, VPA should attempt to apply pod-level recommendations in place first and fall back to eviction if in-place updates fail, mirroring the current `InPlaceOrRecreate` behavior used for container-level updates.


		This behavior is also described in the [Pod-level Resource Spec KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#admission-controller), which suggests placing Pods in namespaces without any Container LimitRange objects. Therefore this AEP proposes that creation of Pods with Pod-level resources in namespaces containing Container LimitRange objects should be rejected. This validation is further detailed in the [Validation section](#dynamic-validation).

		In summary Pods with Pod-level resources should not be validated against Container LimitRange objects.


		`PodLevelResources` (i.e. the new VPA flag) is supported starting from Kubernetes v1.34, where the beta version of the [PodLevelResources](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#PodLevelResources) feature gate is enabled by default. In this version, all VPA modes are supported except `InPlaceOrRecreate`.

		To use the `InPlaceOrRecreate` VPA mode, Kubernetes v1.35 or later is required, and the alpha feature gate [InPlacePodLevelResourcesVerticalScaling](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#InPlacePodLevelResourcesVerticalScaling) must be enabled (introduced in [In-Place Pod-Level Resources Resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize)). If this feature gate is disabled or the user runs an earlier Kubernetes version, in-place updates for pod-level resources will fail, and the updater will fall back to eviction.

	// Controls how the autoscaler computes recommended resources.
	// The resource policy may be used to set constraints on the recommendations
	// for individual containers.
	// If any individual containers need to be excluded from getting the VPA recommendations, then
	// it must be disabled explicitly by setting mode to "Off" under containerPolicies.
	// If not specified, the autoscaler computes recommended resources for all containers in the pod,
	// without additional constraints.
	// +optional
	ResourcePolicy *PodResourcePolicy `json:"resourcePolicy,omitempty" protobuf:"bytes,3,opt,name=resourcePolicy"`


		When a workload is created without any resources defined at either the pod or container level, there are two options:

		##### [Selected] Option 1: VPA controls only the container-level resources


		### Test Plan

		TODO

Conversation

iamzili commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Uh oh!

k8s-ci-robot commented Sep 29, 2025

Uh oh!

k8s-ci-robot commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrianmoisey Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omerap12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omerap12 commented Oct 8, 2025

Uh oh!

Uh oh!

jackfrancis commented Oct 10, 2025

Uh oh!

k8s-ci-robot commented Oct 10, 2025

Uh oh!

jackfrancis commented Oct 10, 2025

Uh oh!

omerap12 commented Jan 22, 2026

Uh oh!

k8s-ci-robot commented Jan 22, 2026

iamzili commented Sep 29, 2025 •

edited

Loading

adrianmoisey Oct 6, 2025 •

edited

Loading

iamzili Oct 8, 2025 •

edited

Loading

iamzili Oct 8, 2025 •

edited

Loading

iamzili Oct 16, 2025 •

edited

Loading

iamzili commented Oct 7, 2025 •

edited

Loading

iamzili Jan 26, 2026 •

edited

Loading

iamzili Jan 26, 2026 •

edited

Loading