diff --git a/keps/sig-scheduling/5732-topology-aware-workload-scheduling/README.md b/keps/sig-scheduling/5732-topology-aware-workload-scheduling/README.md index 4775199b74b3..7e4a417f5238 100644 --- a/keps/sig-scheduling/5732-topology-aware-workload-scheduling/README.md +++ b/keps/sig-scheduling/5732-topology-aware-workload-scheduling/README.md @@ -14,7 +14,6 @@ - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [Workload API Changes](#workload-api-changes) - - [Basic and Gang Policy Extension](#basic-and-gang-policy-extension) - [Scheduling Framework Extensions](#scheduling-framework-extensions) - [1. Data Structures](#1-data-structures) - [2. New Plugin Interfaces](#2-new-plugin-interfaces) @@ -281,78 +280,6 @@ will be defined in a separate KEP: Note: For the initial alpha scope, only a single TopologyConstraint will be supported. -#### Basic and Gang Policy Extension - -In the first alpha version of the Workload API, the `Basic` policy was a no-op. -We propose extending the `Basic` and `Gang` policies to accept a `desiredCount` -field. This field serves as a scheduler hint to improve placement decisions -without imposing hard scheduling constraints. - -This feature will be gated behind a separate feature gate -(`PodGroupDesiredCount`) to decouple it from the core Gang Scheduling -and Topology Aware Scheduling features. - -**1. Basic Policy Update** - -We introduce `desiredCount` to the `Basic` policy to allow users to signal the -expected group size for optimization purposes. - -```go -// BasicSchedulingPolicy indicates that standard Kubernetes -// scheduling behavior should be used. -type BasicSchedulingPolicy struct { - // DesiredCount is the expected number of pods that will belong to this - // PodGroup. This field is a hint to the scheduler to help it make better - // placement decisions for the group as a whole. - // - // Unlike gang's minCount, this field does not block scheduling. If the number - // of available pods is less than desiredCount, the scheduler can still attempt - // to schedule the available pods, but will optimistically try to select a - // placement that can accommodate the future pods. - // - // +optional - DesiredCount *int32 -} -``` - -**2. Gang Policy Update** - -We similarly extend the `Gang` policy. While `minCount` provides a hard constraint -for admission, `desiredCount` provides a soft target for placement optimization. - -```go -// GangSchedulingPolicy defines the parameters for gang scheduling. -type GangSchedulingPolicy struct { - // MinCount is the minimum number of pods that must be schedulable or scheduled - // at the same time for the scheduler to admit the entire group. - // It must be a positive integer. - // - // +required - MinCount int32 - - // DesiredCount is the expected number of pods that will belong to this - // PodGroup. This field is a hint to the scheduler to help it make better - // placement decisions for the group as a whole. - // - // Unlike gang's minCount, this field does not block scheduling. If the number - // of available pods is less than desiredCount but at least minCount, the scheduler - // can still attempt to schedule the available pods, but will optimistically try - // to select a placement that can accommodate the future pods. - // - // When provided desiredCount must be greater or equal to minCount. - // - // +optional - DesiredCount *int32 -} -``` - -Those fields allow users to express their "true" workloads more easily and enables -the scheduler to optimize the placement of such pod groups by taking the desired state -into account. Ideally, the scheduler should prefer placements that can accommodate -the full `desiredCount`, even if not all pods are created yet. When `desiredCount` -is specified, the scheduler can delay scheduling the first Pod it sees for a short -amount of time in order to wait for more Pods to be observed. - ### Scheduling Framework Extensions The scheduler framework requires new plugin interfaces to handle "Placements". A @@ -518,6 +445,14 @@ The algorithm proceeds in three main phases for a given PodGroup. - **Potential Optimization:** Pre-filtering can check aggregate resources requested by PodGroup Pods before running the full simulation. +- **Basic Scheduling Policy Handling:** The current algorithm may exhibit + inconsistent behavior when used with the PodGroup Basic Scheduling Policy. + Because the scheduler may only observe a subset of pods when scheduling + a PodGroup, placement feasibility is only validated for those specific + pods rather than the entire group. This limitation may be addressed in + future releases; currently, scheduling gates may be used as a partial + mitigation. + - **Heterogeneous PodGroup Handling**: Sequential processing will be used initially. Pods are processed sequentially; if any fail, the placement is rejected. @@ -702,10 +637,6 @@ kube-scheduler instance being a leader). - Components depending on the feature gate: - kube-apiserver - kube-scheduler - - Feature gate name: PodGroupDesiredCount - - Components depending on the feature gate: - - kube-apiserver - - kube-scheduler - [ ] Other - Describe the mechanism: - Will enabling / disabling the feature require downtime of the control diff --git a/keps/sig-scheduling/5732-topology-aware-workload-scheduling/kep.yaml b/keps/sig-scheduling/5732-topology-aware-workload-scheduling/kep.yaml index e03b6f6c3bdf..e3d2c528668b 100644 --- a/keps/sig-scheduling/5732-topology-aware-workload-scheduling/kep.yaml +++ b/keps/sig-scheduling/5732-topology-aware-workload-scheduling/kep.yaml @@ -38,10 +38,6 @@ milestone: # List the feature gate name and the components for which it must be enabled feature-gates: - name: TopologyAwareWorkloadScheduling - components: - - kube-apiserver - - kube-scheduler - - name: PodGroupDesiredCount components: - kube-apiserver - kube-scheduler