Add performance benchmark for the CA RunOnce control loop#9237
Add performance benchmark for the CA RunOnce control loop#9237Choraden wants to merge 9 commits intokubernetes:masterfrom
Conversation
|
Hi @Choraden. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/uncc aleksandra-malinowska vadasambar Keeping as draft until #9099 is merged. |
|
@Choraden: GitHub didn't allow me to request PR reviews from the following users: pmendelski. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Sharing results: |
bd996c2 to
d79c751
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Choraden The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d79c751 to
ce198a3
Compare
| } | ||
|
|
||
| // WithMinSize sets the minimum size of the node group. | ||
| func WithMinSize(min int) NodeGroupOption { |
There was a problem hiding this comment.
nit: A single WithSizeConfig(min, max int) option makes more sense to me, I think you'd always want to set both ends of the limits if you're setting them at all (unless you want to rely on the default being 0, but it seems better to force callers to be explicit).
| } | ||
|
|
||
| // WithAutoscalingKubeClients allows injecting autoscaling kube clients. | ||
| func (b *AutoscalerBuilder) WithAutoscalingKubeClients(kubeClients *cacontext.AutoscalingKubeClients) *AutoscalerBuilder { |
There was a problem hiding this comment.
AutoscalingKubeClients is normally built based on the KubeClient and InformerFactory, both of which are already injectable (and as far I can see they need to be injected). Why do we need to make AutoscalingKubeClients directly injectable too? It seems like we'd always want to have the AutoscalingKubeClients field in sync with the other two. Otherwise we'd get different behavior for components that use the client/informers directly than for components that use AutoscalingKubeClients.
There was a problem hiding this comment.
Ah okay, after reading some more I see why - we want to change the log recorder. Could you add a warning to the method comment that this is not needed for most use-cases, and if you're using it it has to be in sync with the objects provided in WithKubeClient() and WithInformerFactory()?
There was a problem hiding this comment.
Added a comment to the method.
| klog.SetOutput(io.Discard) | ||
| ctrl.SetLogger(klog.Background()) | ||
|
|
||
| // Disable automatic Garbage Collection during the timed portion of the benchmark |
There was a problem hiding this comment.
How did the results of the benchmark differ without this part? If we have a performance regression caused by frequent and quickly discarded memory allocations, it seems like GC would be a considerable portion of the regression. It seems that disabling GC would mask such regressions.
There was a problem hiding this comment.
The hard part about benchmarking GC'ed apps is that you have little control over when the GC happens. That makes the results unstable and with high variance. As the comments say, I decided to disable the GC to stabilize the benchmark. Since it also measures allocations, it should be fine. It might be underestimating GC pressure sometimes, but that's the price for results stability.
I also added -gc flag to control that behavior e.g. when someone wants to profile the benchmark with gc.
|
|
||
| // run executes the benchmark for a given scenario. It handles environment stabilization, | ||
| // profiling, and repeated execution of the RunOnce loop. | ||
| func run(b *testing.B, s scenario) { |
There was a problem hiding this comment.
nit: I think run() being a method on scenario would be more idiomatic in Go.
| runtime.GC() | ||
|
|
||
| if f != nil && i == 0 { | ||
| if err := pprof.StartCPUProfile(f); err != nil { |
There was a problem hiding this comment.
Have you evaluated having the profile capture the sum of all RunOnce executions instead of just the first loop? It'd be nice to have the reasoning for only capturing one loop in a comment.
There was a problem hiding this comment.
The benchmark usually runs only 1 loop due to it's heavy load. I decided to keep it this way for simplicity.
|
|
||
| // fastTaintingKubeClient tracks nodes that were artificially tainted in-memory to bypass | ||
| // the overhead of full API server round-trips and complex object tracking in the fake client. | ||
| type fastTaintingKubeClient struct { |
There was a problem hiding this comment.
What was the difference in results between this and the normal tainting logic?
I'm surprised that fake K8s client logic (and the fake CloudProvider logic for that matter) would be significant enough to impact the overall results. There's a fair bit of complexity in this benchmark to deal with that, I'm wondering if we can avoid it somehow (e.g. grab the profile from all loops to address variance; increase the parameters of the scenario so that binpacking is more expensive and dominates the whole loop more).
There was a problem hiding this comment.
Fake k8s takes suspiciously much space in the cpu profile:
I'm not sure, if it correctly represents the actual update logic, which would be just an API call. Instead, it tries to simulate the outer world.
Moreover, it introduces some non-deterministic behaviour in the allocations:
│ bench_no_fast_tainting.txt │
│ B/op │
RunOnceScaleDown-24 1.239Gi ± 19%
│ bench_no_fast_tainting.txt │
│ allocs/op │
RunOnceScaleDown-24 11.15M ± 8%
| testprovider.WithMinSize(0), | ||
| ) | ||
|
|
||
| if _, err := cluster.KubeClient.CoreV1().Namespaces().Create(context.Background(), &corev1.Namespace{ |
There was a problem hiding this comment.
nit: Which part of RunOnce() depends on this? You normally get the default namespace automatically, so IMO it'd make sense to have this centralized as part of the BUT framework somehow.
There was a problem hiding this comment.
I must have been my mistake. It's not needed. I removed that part completely.
| cpu := int64(nodeCPU / podsPerNode) | ||
| mem := int64(nodeMem / podsPerNode) | ||
| pod := BuildTestPod(podName, cpu, mem, MarkUnschedulable()) | ||
| if _, err := cluster.KubeClient.CoreV1().Pods("default").Create(context.Background(), pod, metav1.CreateOptions{}); err != nil { |
There was a problem hiding this comment.
I think the BUT framework gives you a more convenient way to manipulate the Pods and the Nodes at least, see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/test/integration/synctest/template_test.go
There was a problem hiding this comment.
Replaced with K8s.AddPod from the framework.
|
|
||
| func BenchmarkRunOnceScaleUp(b *testing.B) { | ||
| s := scenario{ | ||
| setup: setupScaleUp(200), |
There was a problem hiding this comment.
200 Nodes * 50 PodsPerNode is not a very complex scenario. Maybe this is why the actual bottleneck logic isn't dominating the whole result and we need to make other parts more synthetic?
Have you evaluated higher values here? How did the results differ?
There was a problem hiding this comment.
I have run the benchmark for 1000 Nodes * 100 Pods (as nodes are usually limited to run ~100 pods) and other scenarios, but the results and profiles did not have any new findings. I decided to keep the scenario minimal enough to enable benchmark presubmit job.
| // This is for benchmarks that will only evaluate target size and | ||
| // want to avoid the overhead (and CPU profile noise) of adding | ||
| // nodes to the internal state of the fake cloud provider. | ||
| func (n *NodeGroup) NoOpIncreaseSize(delta int) error { |
There was a problem hiding this comment.
Ugh, it's pretty ugly to have an exported method specific to just a single benchmark here in the common CloudProvider fake. I don't see a good way around it though... If we're keeping it, WDYT about changing it slightly to SetTargetSize()/SetTargetSizeOnly()? In this form it seems like it could be useful for other tests more than in the current one.
@mtrqq @GaetanoMar96 @kawych Maybe you have a better idea here?
There was a problem hiding this comment.
I found existing DecreaseTargetSize method that only modifies target size. It can be used as NoOpIncreaseSize by calling it with negative delta. We should be fine with it for now and it eliminates the need to extend test package interface.
Updated MustCreateManager in the integration test package to accept testing.TB instead of *testing.T. This allows the helper to be used within both standard tests and performance benchmarks (which use *testing.B). This change is a prerequisite for introducing performance benchmarking for the RunOnce control loop.
ce198a3 to
abe3907
Compare
|
Rebased on master. |
This commit adds a new benchmarking suite in core/bench to evaluate the performance of the primary Cluster Autoscaler control loop (RunOnce). These benchmarks simulate large-scale cluster operations using a mock Kubernetes API and cloud provider, allowing for comparative analysis and detection of performance regressions.
Introduced a -profile-cpu flag to the RunOnce benchmarking suite. When specified, the benchmark will capture a CPU profile during the first execution of the RunOnce loop and write it to the provided file path.
Disable Garbage Collection during RunOnce benchmarks to ensure stable and reproducible results. This prioritizes consistency over absolute performance metrics, allowing for a generic way to calculate performance differences between patches and providing a clean CPU profile for the RunOnce loop.
Introduce a no-op event recorder in RunOnce benchmarks to prevent event dropping and potential performance side-effects. This change also extends AutoscalerBuilder to support injecting custom AutoscalingKubeClients, allowing for better control over the environment in performance-sensitive tests.
Introduce fastScaleUpCloudProvider and fastScaleUpNodeGroup in benchmarks to avoid the overhead of simulating real node creation in the fake cloud provider. This significantly reduces noise in CPU profiles when benchmarking the core autoscaling logic, as it avoids unnecessary node object management in the fake provider. Added NoOpIncreaseSize to the fake NodeGroup to support this faster scale-up simulation.
This change introduces fastTaintingKubeClient which uses reactors to track and inject ToBeDeleted taints on nodes during the benchmark. This allows the scale-down logic to correctly identify nodes that have been marked for deletion by the autoscaler without relying on standard fake client persistence for these taints. This simply removes fake client from cpu profile.
abe3907 to
2c85ed6
Compare


What type of PR is this?
/kind cleanup
What this PR does / why we need it:
While working on #9022, it became clear that a standardized benchmark is necessary to quantify performance gains and prevent potential regressions in the core logic.
Leveraging the ongoing refactor of the autoscaler building logic in #9099, this PR introduces an initial draft of a benchmark specifically for the RunOnce function. This provides a controlled environment to measure the impact of architectural changes on the main execution loop.
Initial benchmark version at #9199 was difficult to stabilize and reason about. So we decided to simplify it to only one RunOnce call, simulating "cold start" of the CA.
Which issue(s) this PR fixes:
Relates to #9022
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: