Commit Graph

309 Commits

Author SHA1 Message Date
Patrick Ohly
aa73f06e56 scheduler perf: allow creating 0 items
It makes sense to define a test where, depending on the parameters, some
operation creations zero pods, namespaces or nodes. The validation didn't allow
that previously due to the way how it was implemented although the underlying
code works fine with zero as count.
2023-04-11 09:59:16 +02:00
Patrick Ohly
49bbf7c268 scheduler_perf: fix race condition
collector.collect got called without ensuring that collector.run had
terminated, so it could have happened that collector.run adds another sample
while collector.collect is reading them.
2023-04-11 09:46:34 +02:00
Patrick Ohly
a869a89825 scheduler perf: remove cleanup func
b.Cleanup may as well get called inside the function instead
of leaving that to the caller.
2023-04-11 09:43:45 +02:00
sarab
8d18ae6fc2 Use the generic Set in scheduler 2023-04-09 11:34:17 +05:30
Wei Huang
c9bc2f98d0
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test 2023-03-08 08:34:44 -08:00
Patrick Ohly
cc4bcd1d8e scheduler_perf: report data items as benchmark results
This replaces the pretty useless us/op metric (useless because it includes
setup and teardown times) with the same values that also get stored in the JSON
file.

The main advantage is that benchstat can be used to analyze and compare
results.
2023-02-28 23:08:23 +01:00
Patrick Ohly
961129c5f1 scheduler_perf: add logging flags
This enables testing of different real production configurations (JSON
vs. text, different log levels, contextual logging).
2023-02-28 23:08:17 +01:00
Kante Yin
3d0894fabf
Fix failure(context canceled) in scheduler_perf benchmark (#114843)
* Fix failure in scheduler_perf benchmark

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Fatal when error in cleaning up nodes in scheduler perf tests

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Use derived context to better organize the codes

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Change log level to 2 in scheduler perf-test

Signed-off-by: Kante Yin <kerthcet@gmail.com>

---------

Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-01-30 16:21:00 -08:00
Kensei Nakada
e8092cc885 cleanup(scheduler_perf): remove all removed feature gates 2023-01-04 01:07:47 +00:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
Mark Rossetti
534f052a8d
Updating pause image refernces to 3.9
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2022-11-14 10:24:54 -08:00
kerthcet
d6ffb47832 Replace klog with benchmark log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-09 09:11:55 +08:00
Kubernetes Prow Robot
73f6b96f0a
Merge pull request #113615 from kerthcet/feat/add-benchmark-tests
Add nodeInclusionPolicy benchmark tests to scheduler_perf
2022-11-07 09:18:28 -08:00
kerthcet
bc15aca26d Refactor SchedulerConfigFile
Rename to SchedulerConfigPath and make it a pointer
to be consist with other fields

Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-05 00:30:34 +08:00
kerthcet
48f2c9ec20 Add benchmark tests for nodeInclusionPolicy
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-05 00:13:43 +08:00
kerthcet
cfc53ee524 Refactor code and annotations for readability
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-01 17:44:45 +08:00
kerthcet
21e8a69a22 Use operationCode instead of string directly
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-01 17:01:22 +08:00
Patrick Ohly
41619ace15 stop using deprecated klog flags
Some scripts and tools still relied on the deprecated flags, the ones
which are about to be removed.

This is intentionally not a complete removal of all those flags in the entire
repo. This would lead to much more code churn also in places where commands
still accept the flags because they use klog directly.
2022-09-04 21:02:43 +02:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
Wojciech Tyczyński
5b042f0bf4 Remove RunAnAPIServer from integration tests 2022-07-25 17:52:31 +02:00
Mark Rossetti
40f3e624a6 Switching everything to use pause:3.8
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2022-07-21 14:53:15 -07:00
Kensei Nakada
b0d47cb380
scheduler_perf: allow users to specify default pod and node specs (#101799)
* scheduler_perf: default pod and node spec

* Fix: un-support DefaultNodeTemplatePath
2022-06-29 11:44:07 -07:00
Davanum Srinivas
50bea1dad8
Move from k8s.gcr.io to registry.k8s.io
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-05-31 10:16:53 -04:00
Kubernetes Prow Robot
570f1092f4
Merge pull request #109542 from sanposhiho/fix-test-case-scheduler-perf
scheduler_perf: Remove test cases for Preemption which always fail
2022-05-07 03:33:29 -07:00
Kubernetes Prow Robot
71df3e819b
Merge pull request #109545 from sanposhiho/fix-nun-on-scheduler_perf
Skip adding data to avoid "json: unsupported value: NaN" panic when data is NaN
2022-05-05 11:53:45 -07:00
Kensei Nakada
4af3c5efeb Skip adding data to avoid "json: unsupported value: NaN" panic when data is NaN 2022-05-05 16:11:22 +00:00
Kubernetes Prow Robot
3bef1692ef
Merge pull request #109696 from Huang-Wei/rm-sched-perf-legacy
Cleanup legacy scheduler perf tests
2022-05-04 02:35:43 -07:00
Kubernetes Prow Robot
629706e0fe
Merge pull request #109546 from sanposhiho/replace-metrics
Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf
2022-05-04 01:29:22 -07:00
sanposhiho
1c2c20e6bd Change test cases for Preemption to create fewer Pods 2022-05-04 07:47:46 +00:00
Kubernetes Prow Robot
f0cd3725d3
Merge pull request #101835 from sanposhiho/scheduler_perf/feature/op-sleep
scheduler_perf: create sleep operation
2022-05-03 17:17:11 -07:00
Wei Huang
846ebf7814
Cleanup legacy scheduler perf tests 2022-04-27 09:57:17 -07:00
sanposhiho
b7b94b6b39 scheduler_perf: create sleep operation 2022-04-25 23:02:09 +00:00
sanposhiho
6e0da69632 Replace scheduler_e2e_scheduling_duration_seconds with scheduler_scheduling_attempt_duration_seconds in scheduler_perf 2022-04-20 00:48:12 +09:00
Davanum Srinivas
f7ad09c447
Switch to pause 3.7
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-03-29 15:36:38 -04:00
Kubernetes Prow Robot
546e4fa1ef
Merge pull request #107771 from sanposhiho/fix-tiny
make scheduler_perf stable
2022-03-04 17:22:52 -08:00
sanposhiho
4c3a1000c7 fix by gofmt 2022-02-25 00:23:01 +09:00
sanposhiho
1080c2d717 Make scheduler_perf stable 2022-02-24 01:29:38 +09:00
Kubernetes Prow Robot
21c0f6f6ff
Merge pull request #107677 from pohly/scheduler-integration-benchmark
scheduler integration benchmark improvements
2022-02-14 01:23:28 -08:00
Patrick Ohly
e1e84c8e5f scheduler_perf: run with -v=0 by default
This provides a mechanism for overriding the forced increase of the klog
verbosity to 4 when starting the apiserver and uses that for the scheduler_perf
benchmark. Other tests run as before.

A global variable was used because adding an explicit parameter to several
helper functions would have caused a lot of code churn (test ->
integration/util.StartApiserver ->
integration/framework.RunAnAPIServerUsingServer ->
integration/framework.startAPIServerOrDie).
2022-02-11 16:58:33 +01:00
Patrick Ohly
c62d7407c8 scheduler_perf: dump test data when writing it failed
Occasionally, writing as JSON failed because a NaN float couldn't be
encoded. The extended log message helps understand where that comes from, for
example:

F0120 20:24:45.515745  511835 scheduler_perf_test.go:540] BenchmarkPerfScheduling: unable to write measured data {Version:v1 DataItems:[{Data:map[Average:35.714285714285715 Perc50:2 Perc90:36 Perc95:412 Perc99:412] Unit:pods/s Labels:map[Metric:SchedulingThroughput Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2]} {Data:map[Average:27.863967530999993 Perc50:13.925925925925926 Perc90:30.06711409395973 Perc95:31.85682326621924 Perc99:704] Unit:ms Labels:map[Metric:scheduler_e2e_scheduling_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2]} {Data:map[Average:11915.651577744 Perc50:15168.796680497926 Perc90:19417.759336099585 Perc95:19948.87966804979 Perc99:20373.77593360996] Unit:ms Labels:map[Metric:scheduler_pod_scheduling_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2]} {Data:map[Average:1.1865832049999983 Perc50:0.7636363636363637 Perc90:2.891903719912473 Perc95:3.066958424507659 Perc99:5.333333333333334] Unit:ms Labels:map[Metric:scheduler_framework_extension_point_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2 extension_point:Filter]} {Data:map[Average:NaN Perc50:NaN Perc90:NaN Perc95:NaN Perc99:NaN] Unit:ms Labels:map[Metric:scheduler_framework_extension_point_duration_seconds Name:BenchmarkPerfScheduling/PreemptionPVs/500Nodes/namespace-2 extension_point:Score]}]}: json: unsupported value: NaN
2022-02-07 08:59:19 +01:00
Patrick Ohly
8d44b819b3 scheduler_perf: avoid ambiguous test names
"-bench=PerfScheduling/Preemption/500Nodes" ran both the
PerfScheduling/Preemption/500Nodes and the
PerfScheduling/PreemptionPVs/500Nodes benchmark.

This can be avoided by choosing names where none is the prefix of another.
2022-02-07 08:59:19 +01:00
Patrick Ohly
259a8ad0b7 test: allow controlling etcd log level
When running an integration test that measures performance, like for example
test/integration/scheduler_perf, running etcd with debug level output is
undesirable because it creates additional load on the system and isn't
realistic.

The default is still "debug", but ETCD_LOGLEVEL=warn can be used to override
that.
2022-02-07 08:59:19 +01:00
ahrtr
fe95aa614c io/ioutil has already been deprecated in golang 1.16, so replace all ioutil with io and os 2022-02-03 05:32:12 +08:00
sanposhiho
d8840405e2 Create namespace for Pod not to occur error log of namespace not-found 2022-01-26 00:39:12 +09:00
Kubernetes Prow Robot
ca4af7a981
Merge pull request #104716 from sanposhiho/feature/scheduler_perf/unused-template-params
test/integration/scheduler_perf: check for unused template parameters
2022-01-10 16:21:16 -08:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
Hanna Lee
07a883d8e6 Remove //lint:ignore pragmas that aren't being used anymore 2021-11-17 08:56:54 +01:00
Hanna Lee
c8fde197f5 Add more //nolint:staticcheck for failures caught in PR tests 2021-11-17 08:56:02 +01:00
kerthcet
75a255d2ed remove scheduler component config v1beta1
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-09-28 13:13:17 +08:00
Kubernetes Prow Robot
86d23cf441
Merge pull request #105206 from pohly/test-integration-help
test/integration: skip etcd startup for -help flag
2021-09-24 10:29:23 -07:00
Patrick Ohly
81b4a695b3 test/integration: skip etcd startup for -help flag
By parsing flags in the test's main function before starting etcd we bail out
early without ever starting etcd when the test was invoked with -help.

Otherwise etcd must be available, gets started and then hangs because
flag.Parse itself exits when called by testing.go. This bypasses the code in
EtcdMain which normally stops etcd.
2021-09-24 11:51:58 +02:00
Kubernetes Prow Robot
857d4c107c
Merge pull request #104808 from chendave/indent
Format json file with proper indentation
2021-09-21 19:14:00 -07:00
sanposhiho
1318f74609 Fix: use Fatalf and list all unused params in one error 2021-09-09 07:34:30 +09:00
sanposhiho
6bf6e424a1 Fix: rename getParams→get 2021-09-09 07:11:36 +09:00
sanposhiho
24643c67d5 Fix: make struct un-exported 2021-09-09 07:10:37 +09:00
Dave Chen
dda8090037 Format json file with proper indentation
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-09-07 16:14:34 +08:00
sanposhiho
cc846c9d33 Feature: check for unused template parameters 2021-09-02 01:46:34 +09:00
Claudiu Belu
18936d4785 updates pause image references
The pause:3.6 image has been published.

Also updates older / incorrect references.
2021-08-29 21:50:05 -07:00
Dave Chen
63b4710f38 Don't expose struct from prometheus client library 2021-08-27 22:21:24 +08:00
Dave Chen
58ab18bc1e Add the metric data for different extension points
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-08-23 13:43:48 +08:00
Kubernetes Prow Robot
4ab9c950d9
Merge pull request #102007 from vaibhav2107/perf-config
Update the typo in values of pods in performance-config.yaml
2021-08-12 13:59:50 -07:00
Wei Huang
55765f1b49
sched: support HistogramVec in scheduler performance test 2021-07-26 20:27:37 -07:00
Mengjiao Liu
4eab19ae7d Clean up the master term in test/integration comments 2021-06-18 16:31:05 +08:00
Sascha Grunert
b167fc24d7
Update pause image to v3.5
Update dependencies and the test images to use pause 3.5. We also
provide a changelog entry for the new container image version.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-05-25 09:04:46 +02:00
vaibhav
a1e56b4f6d Update the typo in values of pods in performance-config.yaml 2021-05-14 17:16:48 +05:30
Abdullah Gharaibeh
6988653457 Added benchmarks for pod affinity namespaceselector 2021-04-23 14:14:38 -04:00
Kubernetes Prow Robot
6d130d3b97
Merge pull request #100557 from chendave/validation_cleanup
Validate plugin config for KubeSchedulerConfiguration
2021-04-14 18:20:01 -07:00
Dave Chen
c6e65079c7 Validate plugin config for KubeSchedulerConfiguration
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-04-14 09:30:20 +08:00
Kubernetes Prow Robot
ed3e0d302f
Merge pull request #100644 from Huang-Wei/sched-fwk-config
Surface kube config in scheduler framework handle
2021-04-12 19:12:49 -07:00
Nicolas Mitchell
0e994e9481 return error with non-unique workload name in scheduler_perf_test 2021-04-06 10:24:04 -04:00
Nicolas Mitchell
338b06fb69 validate test/workload names in validateTestCases 2021-04-04 14:18:39 -04:00
Wei Huang
e7f67b1a63
Surface kube config in scheduler framework handle 2021-03-30 11:54:59 -07:00
Kubernetes Prow Robot
c78b5497ae
Merge pull request #99638 from chendave/perf_config
Enable scheduler_perf to support scheduler config file
2021-03-16 14:49:03 -07:00
Dave Chen
d50c0aeb5f Enable scheduler_perf to support scheduler config file
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-03-16 23:13:40 +08:00
Wei Huang
68ff3168b8
sched: fix a bug that literal 'p99' is mapped to 95th-percentile 2021-03-12 12:03:12 -08:00
Wei Huang
b93b4a2c96
sched: fix a bug that metrics of init or collected pods are re-collected 2021-03-11 10:28:39 -08:00
Kubernetes Prow Robot
823fa75643
Merge pull request #98900 from Huang-Wei/churn-cluster-op
Introduce a churnOp to scheduler perf testing framework
2021-03-11 02:00:24 -08:00
Kubernetes Prow Robot
23af91b293
Merge pull request #97779 from tiloso/staticcheck-test-integration-gs
Fix staticcheck in test/integration/{garbagecollector,scheduler_perf}
2021-03-10 16:04:23 -08:00
Kubernetes Prow Robot
841cb4adc4
Merge pull request #99844 from minbaev/scheduler-test-perf-optimization
add if check for number of scheduled pods to be greater than 0
2021-03-08 20:47:37 -08:00
David Eads
b8194cf77c switch most e2e tests to storage/v1 over v1beta1 2021-03-08 13:04:24 -05:00
Alexander Minbaev
359116f525 add if check for number of scheduled pods to be greater than 0 2021-03-05 09:05:42 -06:00
Kubernetes Prow Robot
0d4924e371
Merge pull request #99439 from minbaev/fix-typos
Fix typo in util.go
2021-03-04 11:00:22 -08:00
Wei Huang
1e5878b910
Introduce a churnOp to scheduler perf testing framework
- support two modes: recreate and create
- use DynmaicClient to create API objects
2021-03-03 06:51:53 -08:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Alexander Minbaev
5b73122105 Fix typo in util.go 2021-02-25 00:54:37 -06:00
Wei Huang
983272ce6a
sched: create dataItemsDir during a performance test if not exist 2021-02-17 12:44:16 -08:00
tiloso
e1ceac0783 Fix staticcheck in test/integration/{scheduler_perf,garbagecollector} 2021-02-10 10:55:09 +01:00
Kubernetes Prow Robot
2b7c61b1bb
Merge pull request #98205 from pacoxu/build/pauses
update pause image to 3.4.1 and also update the change log
2021-02-08 18:20:58 -08:00
pacoxu
0c152cbbbe update pause to 3.4.1 for tests(e2e)
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-02-05 21:32:53 +08:00
Adhityaa Chandrasekar
b5808c6df9 scheduler_perf: remove implicit barrier at the end
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-02-03 12:49:28 +00:00
Kubernetes Prow Robot
6d43e2b3bb
Merge pull request #96834 from chendave/fix_race
Add performance benchmark for the preemption with volume
2020-12-15 07:13:49 -08:00
Dave Chen
ebcca92771 Add performance benchmark for the preemption with volume
This will help to reveal the potential issues when the
volume is in place.

Signed-off-by: Dave Chen <dave.chen@arm.com>
2020-12-15 10:54:01 +08:00
Kubernetes Prow Robot
bd4d197b52
Merge pull request #96447 from chendave/bind_postfilter
Remove the deprecated metrics from scheduler
2020-12-14 06:31:28 -08:00
Dave Chen
5144e2ec78 Remove the deprecated metrics from scheduler
Deprecated metrics are removed and suggest to use the Histogram
metrics got from scheduler extension points.

Signed-off-by: Dave Chen <dave.chen@arm.com>
Co-authored-by: wawa0210 <xiaozhang0210@hotmail.com>
2020-12-14 11:31:50 +08:00
Kubernetes Prow Robot
65d57211e3
Merge pull request #97068 from chendave/selectors
Add constraint selector to pod template
2020-12-08 22:01:19 -08:00
Dave Chen
58142288a5 Add constraint selector to pod template
PodTopologySpread plugin will only count the existing pod when that
pod's label matches with `constraint.Selector`, which means all pods
could be scheduled to one topology zone when the constraint does not
have any selector defined.

Signed-off-by: Dave Chen <dave.chen@arm.com>
2020-12-04 18:04:26 +08:00
Tim Hockin
4068402459 Change trivial topology labels
In these cases the actual label key is incidental.
2020-11-12 11:21:37 -08:00
Wei Huang
267acdbe81
Update docs and fix redundant logic of scheduler perf 2020-11-06 23:45:09 -08:00
Tim Hockin
819ff9b087
Use topology labels instead of old beta names (#96033)
* Rename const for topology.../zone

* Rename const for topology.../region

* Rename const for failure-domain.../zone

* Rename const for failure-domain.../region

* Restore old names for compat
2020-11-05 20:26:50 -08:00
tangwz
518c502f54 scheduler_perf: use time.Ticker in throughput measurement 2020-09-19 09:36:17 +08:00