Skip to content

KEP-5732: Remove desiredCount from the TAS KEP#5949

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
44past4:remove-desired-count
Mar 16, 2026
Merged

KEP-5732: Remove desiredCount from the TAS KEP#5949
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
44past4:remove-desired-count

Conversation

@44past4
Copy link
Contributor

@44past4 44past4 commented Mar 9, 2026

  • One-line PR description: Removes adding desiredCount field to Basic and Gang PodGroup scheduling policies from the scope of TAS KEP.

  • Issue link: KEP-5732: Topology-aware workload scheduling #5732

  • Other comments:

    • After some discussions there is no clarity when it comes to expected semantics of the desiredCount field which was planned to be added to Basic and Gang scheduling policies on PodGroup. Because of this desiredCount will be removed from the alpha scope of the TAS KEP.

/sig scheduling

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Mar 9, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Scheduling Mar 9, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Mar 9, 2026
@k8s-ci-robot k8s-ci-robot requested review from dom4ha and macsko March 9, 2026 10:23
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 9, 2026
```go
// BasicSchedulingPolicy indicates that standard Kubernetes
// scheduling behavior should be used.
type BasicSchedulingPolicy struct {
Copy link
Contributor

@mm4tt mm4tt Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that TAS for non-gangs is tricky due to edge cases (e.g., the scheduler making placement decisions based on a partial view of the pods). I treat this as an (unsuccessful) attempt to fix the problem. While removing it now makes sense until we have a more thorough design, I think we shouldn't just drop the context entirely.

Instead of simply removing it, should we add a short paragraph to the KEP outlining the current TAS limitations for basic policies, noting that this will be addressed in future releases? We could also mention scheduling gates as a currently available (albeit imperfect) mitigation.

The 'Phase 2' algorithm section might be a good place for this. We could expand on point '3. If all pods fit, the Placement is marked Feasible' to explain why this works perfectly for gangs (waiting for MinCount in pre-enqueue), but might behave inconsistently for basic policies (where the scheduler's decision depends on the number of observed pods).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of simply removing it, should we add a short paragraph to the KEP outlining the current TAS limitations for basic policies, noting that this will be addressed in future releases? We could also mention scheduling gates as a currently available (albeit imperfect) mitigation.

+1, but I think only the problem of "not-ready" pods is worth addressing as a part of TAS as it makes the feature working sub-optimally.

Addressing the desired future number of pods is rather an independent extension to TAS and IMHO deserves a separate KEP.

Copy link
Contributor Author

@44past4 44past4 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated Phase 2 algorithm description with a point explaining the limitations of using TAS with Basic Scheduling Policy. Please take a look.

@44past4 44past4 force-pushed the remove-desired-count branch from 7bf0cc3 to 54508a8 Compare March 9, 2026 10:36
Copy link
Member

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```go
// BasicSchedulingPolicy indicates that standard Kubernetes
// scheduling behavior should be used.
type BasicSchedulingPolicy struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of simply removing it, should we add a short paragraph to the KEP outlining the current TAS limitations for basic policies, noting that this will be addressed in future releases? We could also mention scheduling gates as a currently available (albeit imperfect) mitigation.

+1, but I think only the problem of "not-ready" pods is worth addressing as a part of TAS as it makes the feature working sub-optimally.

Addressing the desired future number of pods is rather an independent extension to TAS and IMHO deserves a separate KEP.

@dom4ha
Copy link
Member

dom4ha commented Mar 9, 2026

FTR, dropped PR with the discussion: kubernetes/kubernetes#137137

Add warning about using the Basic Scheduling Policy with TAS.
- **Potential Optimization:** Pre-filtering can check aggregate resources
requested by PodGroup Pods before running the full simulation.

- **Basic Scheduling Policy Handling:** The current algorithm may exhibit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to Gang Policy if more pods than minCount are created.

Copy link
Contributor Author

@44past4 44past4 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true that we have a similar problem with Gang Policy but in case of Gang Policy we have explicit information on minCount coming from the user and it is their conscious decision if they are setting minCount to lower number compared to total number of pods. So in the context of Gang Policy TAS adheres to explicit user intent and makes sure that it is fulfilled. For Basic scheduling policy user may expect some best effort behaviour from TAS which in many cases will simply not be true so this disclaimer makes sense for me for Basic Policy but not necessary for Gang Policy.

@wojtek-t
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 12, 2026
@dom4ha
Copy link
Member

dom4ha commented Mar 16, 2026

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 44past4, dom4ha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2026
@k8s-ci-robot k8s-ci-robot merged commit b87f851 into kubernetes:master Mar 16, 2026
4 checks passed
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in SIG Scheduling Mar 16, 2026
@k8s-ci-robot k8s-ci-robot added this to the v1.36 milestone Mar 16, 2026
@44past4 44past4 deleted the remove-desired-count branch March 16, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants