Commit Graph

67 Commits

Author SHA1 Message Date
Patrick Ohly
d85b91f343 scheduler-perf: measure workload runtime and relabel workloads
The goal is to only label workloads as "performance" which actually run long
enough to provide useful metrics. The throughput collector samples once per
second, so a workload should run at least 5, better 10 seconds to get at least
a minimal amount of samples for the percentile calculation.

For benchstat analysis of runs with sufficient repetitions to get statistically
meaningful results, each workload shouldn't run more than one minute, otherwise
before/after analysis becomes too slow.

The labels were chosen based on benchmark runs on a reasonably fast desktop. To
know how long each workload takes, a new "runtime_seconds" benchmark result
gets added.
2023-05-15 14:33:40 +02:00
Kubernetes Prow Robot
fb93000eb5
Merge pull request #117468 from HirazawaUi/replace-test-deprecated-ioutil
Replace the deprecated ioutil methods in the test directory
2023-05-03 12:02:32 -07:00
Kubernetes Prow Robot
aece6838e8
Merge pull request #117232 from pohly/scheduler-perf-code-cleanups
scheduler_perf: code cleanups
2023-05-03 09:54:13 -07:00
Kubernetes Prow Robot
b4c6a70927
Merge pull request #117230 from pohly/scheduler-perf-throughput
scheduler_perf: update throughputCollector
2023-04-29 12:12:17 -07:00
Patrick Ohly
b3e0bc8864 scheduler_perf: let the test decide which informers are needed
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
2023-04-27 15:31:40 +02:00
Patrick Ohly
969d28b12b scheduler_perf: refactor common code 2023-04-27 15:31:37 +02:00
Kubernetes Prow Robot
dd62a53e1a
Merge pull request #117196 from pohly/scheduler-perf-labels
scheduler_perf: support test case selection via labels
2023-04-26 14:26:14 -07:00
Patrick Ohly
550d4c0074 scheduler_perf: support test case selection via labels
Entire test cases and workloads can have labels attached to them. The union of
these must match the label filter which works as in GitHub. The benchmark by
default runs the tests that are labeled "performance", which is the same as
before.
2023-04-26 21:01:31 +02:00
Patrick Ohly
78b8af9fed scheduler_perf: update throughputCollector
The previous solution had some shortcomings:

- It was based on the assumption that the goroutine gets woken up at regular
  intervals. This is not actually guaranteed. Now the code keeps track of the
  actual start and end of an interval and verifies that assumption.

- If no pod was scheduled (unlikely, but could happen), then
  "0 pods/s" got recorded. In such a case, the metric was always either
  zero or >= 1. A better solution is to extend the interval
  until some pod gets scheduled. With the larger time interval
  it is then possible to also track, for example, 0.5 pods/s.
2023-04-26 08:11:50 +02:00
HirazawaUi
a8b808ee6c Replace the deprecated ioutil methods in the test directory 2023-04-18 21:51:10 +08:00
Kubernetes Prow Robot
aa026a6b30
Merge pull request #117202 from pohly/scheduler-perf-zero-count
scheduler perf: allow creating 0 items
2023-04-11 21:18:20 -07:00
Kubernetes Prow Robot
69b59b9d42
Merge pull request #117199 from pohly/scheduler-perf-race-fix
scheduler_perf: fix race condition
2023-04-11 21:18:05 -07:00
Patrick Ohly
aa73f06e56 scheduler perf: allow creating 0 items
It makes sense to define a test where, depending on the parameters, some
operation creations zero pods, namespaces or nodes. The validation didn't allow
that previously due to the way how it was implemented although the underlying
code works fine with zero as count.
2023-04-11 09:59:16 +02:00
Patrick Ohly
49bbf7c268 scheduler_perf: fix race condition
collector.collect got called without ensuring that collector.run had
terminated, so it could have happened that collector.run adds another sample
while collector.collect is reading them.
2023-04-11 09:46:34 +02:00
Patrick Ohly
a869a89825 scheduler perf: remove cleanup func
b.Cleanup may as well get called inside the function instead
of leaving that to the caller.
2023-04-11 09:43:45 +02:00
Patrick Ohly
cc4bcd1d8e scheduler_perf: report data items as benchmark results
This replaces the pretty useless us/op metric (useless because it includes
setup and teardown times) with the same values that also get stored in the JSON
file.

The main advantage is that benchstat can be used to analyze and compare
results.
2023-02-28 23:08:23 +01:00
Patrick Ohly
961129c5f1 scheduler_perf: add logging flags
This enables testing of different real production configurations (JSON
vs. text, different log levels, contextual logging).
2023-02-28 23:08:17 +01:00
Kante Yin
3d0894fabf
Fix failure(context canceled) in scheduler_perf benchmark (#114843)
* Fix failure in scheduler_perf benchmark

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Fatal when error in cleaning up nodes in scheduler perf tests

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Use derived context to better organize the codes

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Change log level to 2 in scheduler perf-test

Signed-off-by: Kante Yin <kerthcet@gmail.com>

---------

Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-01-30 16:21:00 -08:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
kerthcet
d6ffb47832 Replace klog with benchmark log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-09 09:11:55 +08:00
Kubernetes Prow Robot
73f6b96f0a
Merge pull request #113615 from kerthcet/feat/add-benchmark-tests
Add nodeInclusionPolicy benchmark tests to scheduler_perf
2022-11-07 09:18:28 -08:00
kerthcet
bc15aca26d Refactor SchedulerConfigFile
Rename to SchedulerConfigPath and make it a pointer
to be consist with other fields

Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-05 00:30:34 +08:00
kerthcet
48f2c9ec20 Add benchmark tests for nodeInclusionPolicy
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-05 00:13:43 +08:00
kerthcet
cfc53ee524 Refactor code and annotations for readability
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-01 17:44:45 +08:00
kerthcet
21e8a69a22 Use operationCode instead of string directly
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-01 17:01:22 +08:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
Wojciech Tyczyński
5b042f0bf4 Remove RunAnAPIServer from integration tests 2022-07-25 17:52:31 +02:00
Kensei Nakada
b0d47cb380
scheduler_perf: allow users to specify default pod and node specs (#101799)
* scheduler_perf: default pod and node spec

* Fix: un-support DefaultNodeTemplatePath
2022-06-29 11:44:07 -07:00
Kubernetes Prow Robot
629706e0fe
Merge pull request #109546 from sanposhiho/replace-metrics
Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf
2022-05-04 01:29:22 -07:00
Kubernetes Prow Robot
f0cd3725d3
Merge pull request #101835 from sanposhiho/scheduler_perf/feature/op-sleep
scheduler_perf: create sleep operation
2022-05-03 17:17:11 -07:00
sanposhiho
b7b94b6b39 scheduler_perf: create sleep operation 2022-04-25 23:02:09 +00:00
sanposhiho
6e0da69632 Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf 2022-04-20 00:48:12 +09:00
Kubernetes Prow Robot
546e4fa1ef
Merge pull request #107771 from sanposhiho/fix-tiny
make scheduler_perf stable
2022-03-04 17:22:52 -08:00
sanposhiho
4c3a1000c7 fix by gofmt 2022-02-25 00:23:01 +09:00
sanposhiho
1080c2d717 Make scheduler_perf stable 2022-02-24 01:29:38 +09:00
Kubernetes Prow Robot
21c0f6f6ff
Merge pull request #107677 from pohly/scheduler-integration-benchmark
scheduler integration benchmark improvements
2022-02-14 01:23:28 -08:00
Patrick Ohly
c62d7407c8 scheduler_perf: dump test data when writing it failed
Occasionally, writing as JSON failed because a NaN float couldn't be
encoded. The extended log message helps understand where that comes from, for
example:

F0120 20:24:45.515745  511835 scheduler_perf_test.go:540] BenchmarkPerfScheduling: unable to write measured data {Version:v1 DataItems:[{Data:map[Average:35.714285714285715 Perc50:2 Perc90:36 Perc95:412 Perc99:412] Unit:pods/s Labels:map[Metric:SchedulingThroughput Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2]} {Data:map[Average:27.863967530999993 Perc50:13.925925925925926 Perc90:30.06711409395973 Perc95:31.85682326621924 Perc99:704] Unit:ms Labels:map[Metric:scheduler_e2e_scheduling_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2]} {Data:map[Average:11915.651577744 Perc50:15168.796680497926 Perc90:19417.759336099585 Perc95:19948.87966804979 Perc99:20373.77593360996] Unit:ms Labels:map[Metric:scheduler_pod_scheduling_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2]} {Data:map[Average:1.1865832049999983 Perc50:0.7636363636363637 Perc90:2.891903719912473 Perc95:3.066958424507659 Perc99:5.333333333333334] Unit:ms Labels:map[Metric:scheduler_framework_extension_point_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2 extension_point:Filter]} {Data:map[Average:NaN Perc50:NaN Perc90:NaN Perc95:NaN Perc99:NaN] Unit:ms Labels:map[Metric:scheduler_framework_extension_point_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2 extension_point:Score]}]}: json: unsupported value: NaN
2022-02-07 08:59:19 +01:00
ahrtr
fe95aa614c io/ioutil has already been deprecated in golang 1.16, so replace all ioutil with io and os 2022-02-03 05:32:12 +08:00
sanposhiho
d8840405e2 Create namespace for Pod not to occur error log of namespace not-found 2022-01-26 00:39:12 +09:00
sanposhiho
1318f74609 Fix: use Fatalf and list all unused params in one error 2021-09-09 07:34:30 +09:00
sanposhiho
6bf6e424a1 Fix: rename getParams→get 2021-09-09 07:11:36 +09:00
sanposhiho
24643c67d5 Fix: make struct un-exported 2021-09-09 07:10:37 +09:00
sanposhiho
cc846c9d33 Feature: check for unused template parameters 2021-09-02 01:46:34 +09:00
Dave Chen
58ab18bc1e Add the metric data for different extension points
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-08-23 13:43:48 +08:00
Abdullah Gharaibeh
6988653457 Added benchmarks for pod affinity namespaceselector 2021-04-23 14:14:38 -04:00
Kubernetes Prow Robot
6d130d3b97
Merge pull request #100557 from chendave/validation_cleanup
Validate plugin config for KubeSchedulerConfiguration
2021-04-14 18:20:01 -07:00
Dave Chen
c6e65079c7 Validate plugin config for KubeSchedulerConfiguration
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-04-14 09:30:20 +08:00
Nicolas Mitchell
0e994e9481 return error with non-unique workload name in scheduler_perf_test 2021-04-06 10:24:04 -04:00
Nicolas Mitchell
338b06fb69 validate test/workload names in validateTestCases 2021-04-04 14:18:39 -04:00
Kubernetes Prow Robot
c78b5497ae
Merge pull request #99638 from chendave/perf_config
Enable scheduler_perf to support scheduler config file
2021-03-16 14:49:03 -07:00