Commit Graph

85 Commits

Author SHA1 Message Date
Kensei Nakada
5310abe14a make scheduler_perf usable from other repositories 2023-12-01 12:43:08 +00:00
Kubernetes Prow Robot
5cc92713d1
Merge pull request #120335 from pohly/scheduler-perf-pod-name
scheduler_perf: show name of one pending pod in log
2023-10-02 06:32:45 -07:00
Junhao Zou
43c05e98ca
cleanup: Replace the deprecated NewMemCacheClient with memory.NewMemCacheClient 2023-09-08 11:57:46 +08:00
Patrick Ohly
c74d045c4b scheduler_perf: show name of one pending pod in error message
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
2023-09-04 09:54:26 +02:00
SataQiu
5524f1651a using wait.PollUntilContextTimeout instead of deprecated wait.Poll/PollWithContext/PollImmediate/PollImmediateWithContext methods for scheduler 2023-08-24 18:35:59 +08:00
Heba Elayoty
224087abfa
Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
Patrick Ohly
29e5771aa4 scheduler-perf: shorten "Name" label in metrics
Because the JSON file gets written at the end of the top-level benchmark, all
data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This
is redundant and makes it harder to see the actual name. Now that common prefix
gets removed.
2023-07-03 21:15:16 +02:00
Patrick Ohly
0d41d509d2 scheduler_perf: replace gomega.Eventually with wait.PollUntilContextTimeout
This is done for the sake of consistency. The failure message becomes less
useful.
2023-06-28 09:22:26 +02:00
Patrick Ohly
cecebe8ea2 scheduler_perf: add TestScheduling integration test
This runs workloads that are labeled as "integration-test". The apiserver and
scheduler are only started once per unique configuration, followed by each
workload using that configuration. This makes execution faster. In contrast to
benchmarking, we care less about starting with a clean slate for each test.
2023-06-28 09:22:25 +02:00
Patrick Ohly
dfd646e0a8 scheduler_perf: fix namespace deletion
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
  so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
  deleted namespaces.
2023-06-28 09:22:25 +02:00
Patrick Ohly
d9c16a1ced scheduler_perf: fix goroutine leak in runWorkload
This becomes relevant when doing more fine-grained leak checking.
2023-06-28 08:14:34 +02:00
Patrick Ohly
c91c578795 scheduler_perf: skip expensive cleanup during benchmarks
Each benchmark test case runs with a fresh etcd instance. Therefore it is not
necessary to delete objects after a run.

A future unit test might reuse etcd, therefore cleanup is optional.
2023-06-22 08:56:14 +02:00
Kubernetes Prow Robot
2057a48ee5
Merge pull request #114771 from sanposhiho/scheduling_perf_scheduler_scheduling_attempt_duration_seconds
feature(scheduler_perf): distinguish result in scheduler_scheduling_attempt_duration_seconds metric result
2023-06-07 06:18:13 -07:00
Kensei Nakada
a4ea058cc7 feature(scheduler_perf): distinguish result in scheduler_scheduling_attempt_duration_seconds metric result 2023-06-02 14:45:55 +00:00
Patrick Ohly
2db577a560 scheduler-perf: inject "benchmark" as name into JSON result filename
This is required because an empty name is no longer supported: the
perf-dashboard is run with --allow-parsers-matching-all-tests=false with causes
perfdash to skip current configuration for BenchmarkPerfResults as it does not
have name
set (4674704f45/perfdash/metrics-downloader.go (L165-L167)).

The perf-dash config needs to be updated accordingly.
2023-05-22 08:07:15 +02:00
Kubernetes Prow Robot
f84ff3d052
Merge pull request #117813 from pohly/scheduler-perf-test-runtime
scheduler-perf: measure workload runtime and relabel workloads
2023-05-15 12:19:18 -07:00
Patrick Ohly
d85b91f343 scheduler-perf: measure workload runtime and relabel workloads
The goal is to only label workloads as "performance" which actually run long
enough to provide useful metrics. The throughput collector samples once per
second, so a workload should run at least 5, better 10 seconds to get at least
a minimal amount of samples for the percentile calculation.

For benchstat analysis of runs with sufficient repetitions to get statistically
meaningful results, each workload shouldn't run more than one minute, otherwise
before/after analysis becomes too slow.

The labels were chosen based on benchmark runs on a reasonably fast desktop. To
know how long each workload takes, a new "runtime_seconds" benchmark result
gets added.
2023-05-15 14:33:40 +02:00
Kubernetes Prow Robot
8b33eaa0a7
Merge pull request #116207 from pohly/dra-scheduler-perf
scheduler_perf: dynamic resource allocation test cases
2023-05-10 10:58:59 -07:00
Patrick Ohly
034528a9f0 scheduler perf: add DynamicResourceAllocation test cases
The default scheduler configuration must be based on the v1 API where the
plugin is enabled by default. Then if (and only if) the
DynamicResourceAllocation feature gate for a test is set, the corresponding
API group also gets enabled.

The normal dynamic resource claim controller is started if needed to create
ResourceClaims from ResourceClaimTemplates.

Without the upcoming optimizations in the scheduler, scheduling with dynamic
resources is fairly slow. The new test cases take around 15 minutes wall clock
time on my desktop.
2023-05-04 13:08:06 +02:00
Kubernetes Prow Robot
fb93000eb5
Merge pull request #117468 from HirazawaUi/replace-test-deprecated-ioutil
Replace the deprecated ioutil methods in the test directory
2023-05-03 12:02:32 -07:00
Kubernetes Prow Robot
aece6838e8
Merge pull request #117232 from pohly/scheduler-perf-code-cleanups
scheduler_perf: code cleanups
2023-05-03 09:54:13 -07:00
Kubernetes Prow Robot
b4c6a70927
Merge pull request #117230 from pohly/scheduler-perf-throughput
scheduler_perf: update throughputCollector
2023-04-29 12:12:17 -07:00
Patrick Ohly
b3e0bc8864 scheduler_perf: let the test decide which informers are needed
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
2023-04-27 15:31:40 +02:00
Patrick Ohly
969d28b12b scheduler_perf: refactor common code 2023-04-27 15:31:37 +02:00
Kubernetes Prow Robot
dd62a53e1a
Merge pull request #117196 from pohly/scheduler-perf-labels
scheduler_perf: support test case selection via labels
2023-04-26 14:26:14 -07:00
Patrick Ohly
550d4c0074 scheduler_perf: support test case selection via labels
Entire test cases and workloads can have labels attached to them. The union of
these must match the label filter which works as in GitHub. The benchmark by
default runs the tests that are labeled "performance", which is the same as
before.
2023-04-26 21:01:31 +02:00
Patrick Ohly
78b8af9fed scheduler_perf: update throughputCollector
The previous solution had some shortcomings:

- It was based on the assumption that the goroutine gets woken up at regular
  intervals. This is not actually guaranteed. Now the code keeps track of the
  actual start and end of an interval and verifies that assumption.

- If no pod was scheduled (unlikely, but could happen), then
  "0 pods/s" got recorded. In such a case, the metric was always either
  zero or >= 1. A better solution is to extend the interval
  until some pod gets scheduled. With the larger time interval
  it is then possible to also track, for example, 0.5 pods/s.
2023-04-26 08:11:50 +02:00
HirazawaUi
a8b808ee6c Replace the deprecated ioutil methods in the test directory 2023-04-18 21:51:10 +08:00
Kubernetes Prow Robot
aa026a6b30
Merge pull request #117202 from pohly/scheduler-perf-zero-count
scheduler perf: allow creating 0 items
2023-04-11 21:18:20 -07:00
Kubernetes Prow Robot
69b59b9d42
Merge pull request #117199 from pohly/scheduler-perf-race-fix
scheduler_perf: fix race condition
2023-04-11 21:18:05 -07:00
Patrick Ohly
aa73f06e56 scheduler perf: allow creating 0 items
It makes sense to define a test where, depending on the parameters, some
operation creations zero pods, namespaces or nodes. The validation didn't allow
that previously due to the way how it was implemented although the underlying
code works fine with zero as count.
2023-04-11 09:59:16 +02:00
Patrick Ohly
49bbf7c268 scheduler_perf: fix race condition
collector.collect got called without ensuring that collector.run had
terminated, so it could have happened that collector.run adds another sample
while collector.collect is reading them.
2023-04-11 09:46:34 +02:00
Patrick Ohly
a869a89825 scheduler perf: remove cleanup func
b.Cleanup may as well get called inside the function instead
of leaving that to the caller.
2023-04-11 09:43:45 +02:00
Patrick Ohly
cc4bcd1d8e scheduler_perf: report data items as benchmark results
This replaces the pretty useless us/op metric (useless because it includes
setup and teardown times) with the same values that also get stored in the JSON
file.

The main advantage is that benchstat can be used to analyze and compare
results.
2023-02-28 23:08:23 +01:00
Patrick Ohly
961129c5f1 scheduler_perf: add logging flags
This enables testing of different real production configurations (JSON
vs. text, different log levels, contextual logging).
2023-02-28 23:08:17 +01:00
Kante Yin
3d0894fabf
Fix failure(context canceled) in scheduler_perf benchmark (#114843)
* Fix failure in scheduler_perf benchmark

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Fatal when error in cleaning up nodes in scheduler perf tests

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Use derived context to better organize the codes

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Change log level to 2 in scheduler perf-test

Signed-off-by: Kante Yin <kerthcet@gmail.com>

---------

Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-01-30 16:21:00 -08:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
kerthcet
d6ffb47832 Replace klog with benchmark log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-09 09:11:55 +08:00
Kubernetes Prow Robot
73f6b96f0a
Merge pull request #113615 from kerthcet/feat/add-benchmark-tests
Add nodeInclusionPolicy benchmark tests to scheduler_perf
2022-11-07 09:18:28 -08:00
kerthcet
bc15aca26d Refactor SchedulerConfigFile
Rename to SchedulerConfigPath and make it a pointer
to be consist with other fields

Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-05 00:30:34 +08:00
kerthcet
48f2c9ec20 Add benchmark tests for nodeInclusionPolicy
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-05 00:13:43 +08:00
kerthcet
cfc53ee524 Refactor code and annotations for readability
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-01 17:44:45 +08:00
kerthcet
21e8a69a22 Use operationCode instead of string directly
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-01 17:01:22 +08:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
Wojciech Tyczyński
5b042f0bf4 Remove RunAnAPIServer from integration tests 2022-07-25 17:52:31 +02:00
Kensei Nakada
b0d47cb380
scheduler_perf: allow users to specify default pod and node specs (#101799)
* scheduler_perf: default pod and node spec

* Fix: un-support DefaultNodeTemplatePath
2022-06-29 11:44:07 -07:00
Kubernetes Prow Robot
629706e0fe
Merge pull request #109546 from sanposhiho/replace-metrics
Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf
2022-05-04 01:29:22 -07:00
Kubernetes Prow Robot
f0cd3725d3
Merge pull request #101835 from sanposhiho/scheduler_perf/feature/op-sleep
scheduler_perf: create sleep operation
2022-05-03 17:17:11 -07:00
sanposhiho
b7b94b6b39 scheduler_perf: create sleep operation 2022-04-25 23:02:09 +00:00
sanposhiho
6e0da69632 Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf 2022-04-20 00:48:12 +09:00