kubernetes

Author	SHA1	Message	Date
Patrick Ohly	da0c9a93ae	scheduler_perf: use dynamic client to create arbitrary objects With a dynamic client and a rest mapper it is possible to load arbitrary YAML files and create the object defined by it. This is simpler than adding specific Go code for each supported type. Because the version now matters, the incorrect version in the DRA YAMLs were found and fixed.	2024-02-11 10:51:38 +01:00
Patrick Ohly	c46ae1b26a	scheduler_perf: use ktesting.TContext + staging StartTestServer ktesting.TContext combines several different interfaces. This makes the code simpler because less parameters need to be passed around. An intentional side effect is that the apiextensions client interface becomes available, which makes it possible to use CRDs. This will be needed for future DRA tests. Support for CRDs depends on starting the apiserver via k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the CRD extensions. As discussed on Slack, the long-term goal is to replace the in-tree StartTestServer with the one in staging, so this is going in the right direction.	2024-02-11 10:51:38 +01:00
Kensei Nakada	f29d6970c9	doc(scheduler_perf): enrich the documentation	2024-01-15 08:50:08 +00:00
Kensei Nakada	74a6a4581f	fix by linters	2023-12-02 09:58:34 +00:00
Kensei Nakada	5310abe14a	make scheduler_perf usable from other repositories	2023-12-01 12:43:08 +00:00
Kubernetes Prow Robot	4b9e15e0fe	Merge pull request #120873 from pohly/dra-e2e-test-driver-enhancements e2e dra: enhance test driver	2023-10-10 13:32:55 +02:00
Kubernetes Prow Robot	44cfd556b3	Merge pull request #120339 from pohly/scheduler-perf-dra-driver-names scheduler_perf: use different log names for different DRA drivers	2023-10-02 06:32:56 -07:00
Kubernetes Prow Robot	5cc92713d1	Merge pull request #120335 from pohly/scheduler-perf-pod-name scheduler_perf: show name of one pending pod in log	2023-10-02 06:32:45 -07:00
Patrick Ohly	36146ad686	e2e dra: enhance test driver Several enhancements: - `--resource-config` is now listed under `controller` options instead of `leader election`: merely a cosmetic change - The driver name can be configured as part of the resource config. The command line flag overrides the config, but only when set explicitly. This makes it possible to pre-define complete driver setups where the name is associated with certain resource availability. This will be used for testing cluster autoscaling. - The set of nodes where resources are available can optionally be specified via node labels. This will be used for testing cluster autoscaling.	2023-09-25 19:50:33 +02:00
Junhao Zou	43c05e98ca	cleanup: Replace the deprecated NewMemCacheClient with memory.NewMemCacheClient	2023-09-08 11:57:46 +08:00
Patrick Ohly	c74d045c4b	scheduler_perf: show name of one pending pod in error message If pods get stuck, then giving the name of one makes it possible to search for it in the log output. Without the name it's hard to figure out which pods got stuck.	2023-09-04 09:54:26 +02:00
Patrick Ohly	78f3b76390	scheduler_perf: use different log names for different DRA drivers This helps when using -feature-gate=ContextualLogging=true and running the SchedulingWithMultipleResourceClaims test case because then output from the two driver instances is easy to distinguish.	2023-09-01 09:25:09 +02:00
Kubernetes Prow Robot	8428655308	Merge pull request #119963 from pohly/dra-scheduler-perf-multiple-claims dra: scheduler_perf test case with multiple claims per pod	2023-08-29 00:25:34 -07:00
SataQiu	5524f1651a	using wait.PollUntilContextTimeout instead of deprecated wait.Poll/PollWithContext/PollImmediate/PollImmediateWithContext methods for scheduler	2023-08-24 18:35:59 +08:00
Patrick Ohly	1e961af858	scheduler_perf: test case for DRA with multiple claims The new test case covers pods with multiple claims from multiple drivers. This leads to different behavior (scheduler waits for information from all drivers instead of optimistically selecting one node right away) and to more concurrent updates of the PodSchedulingContext objects. The test case is currently not enabled for unit testing or integration testing. It can be used manually with: -bench=BenchmarkPerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes ... -perf-scheduling-label-filter=	2023-08-16 08:32:36 +02:00
Patrick Ohly	0331e98957	scheduler_perf: fix installing DRA test driver multiple times The driver name configuration option was ignored, so a second driver would have used the same name.	2023-08-16 08:32:36 +02:00
Heba Elayoty	224087abfa	Add Pod Scheduling SLI Duration metric (#119049 ) Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com> Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>	2023-08-15 15:17:41 -07:00
Jordan Liggitt	a164005cc0	Fix non-test code relying on test-code	2023-07-24 11:37:57 -04:00
Patrick Ohly	6b01ece580	scheduler-perf: fix perfdash display problem perfdash expects all data items to have the same set of labels. It then renders drop-down buttons for each label with all values found for each label. Previously, data items that didn't have a label didn't match any label filter in perfdash and couldn't get selected because perfdash doesn't have "unset" in it's drop-down menus. To avoid that, scheduler-perf now collects all labels and then adds missing labels with "not applicable" as value: { "data": { "Average": 939.7071223010004, "Perc50": 927.7987421383649, "Perc90": 2166.153846153846, "Perc95": 2363.076923076923, "Perc99": 2520.6153846153848 }, "unit": "ms", "labels": { "Metric": "scheduler_pod_scheduling_duration_seconds", "Name": "SchedulingBasic/5000Nodes/namespace-2", "extension_point": "not applicable", "result": "not applicable" } }, ... { "data": { "Average": 1.1172570650000004, "Perc50": 1.1418367346938776, "Perc90": 1.5500000000000003, "Perc95": 1.6410256410256412, "Perc99": 3.7333333333333334 }, "unit": "ms", "labels": { "Metric": "scheduler_framework_extension_point_duration_seconds", "Name": "SchedulingBasic/5000Nodes/namespace-2", "extension_point": "Score", "result": "not applicable" } },	2023-07-03 21:16:53 +02:00
Patrick Ohly	29e5771aa4	scheduler-perf: shorten "Name" label in metrics Because the JSON file gets written at the end of the top-level benchmark, all data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This is redundant and makes it harder to see the actual name. Now that common prefix gets removed.	2023-07-03 21:15:16 +02:00
Patrick Ohly	0d41d509d2	scheduler_perf: replace gomega.Eventually with wait.PollUntilContextTimeout This is done for the sake of consistency. The failure message becomes less useful.	2023-06-28 09:22:26 +02:00
Patrick Ohly	cecebe8ea2	scheduler_perf: add TestScheduling integration test This runs workloads that are labeled as "integration-test". The apiserver and scheduler are only started once per unique configuration, followed by each workload using that configuration. This makes execution faster. In contrast to benchmarking, we care less about starting with a clean slate for each test.	2023-06-28 09:22:25 +02:00
Patrick Ohly	dfd646e0a8	scheduler_perf: fix namespace deletion Merely deleting the namespace is not enough: - Workloads might rely on the garbage collector to get rid of obsolete objects, so we should run it to be on the safe side. - Pods must be force-deleted because kubelet is not running. - Finally, the namespace controller is needed to get rid of deleted namespaces.	2023-06-28 09:22:25 +02:00
Patrick Ohly	d9c16a1ced	scheduler_perf: fix goroutine leak in runWorkload This becomes relevant when doing more fine-grained leak checking.	2023-06-28 08:14:34 +02:00
Patrick Ohly	c91c578795	scheduler_perf: skip expensive cleanup during benchmarks Each benchmark test case runs with a fresh etcd instance. Therefore it is not necessary to delete objects after a run. A future unit test might reuse etcd, therefore cleanup is optional.	2023-06-22 08:56:14 +02:00
Kubernetes Prow Robot	2057a48ee5	Merge pull request #114771 from sanposhiho/scheduling_perf_scheduler_scheduling_attempt_duration_seconds feature(scheduler_perf): distinguish result in scheduler_scheduling_attempt_duration_seconds metric result	2023-06-07 06:18:13 -07:00
Kensei Nakada	a4ea058cc7	feature(scheduler_perf): distinguish result in scheduler_scheduling_attempt_duration_seconds metric result	2023-06-02 14:45:55 +00:00
Patrick Ohly	2db577a560	scheduler-perf: inject "benchmark" as name into JSON result filename This is required because an empty name is no longer supported: the perf-dashboard is run with --allow-parsers-matching-all-tests=false with causes perfdash to skip current configuration for BenchmarkPerfResults as it does not have name set (`4674704f45/perfdash/metrics-downloader.go (L165-L167)`). The perf-dash config needs to be updated accordingly.	2023-05-22 08:07:15 +02:00
kerthcet	0616d15712	Fix perf-test by increasing the error margin Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-05-17 12:14:06 +08:00
Kubernetes Prow Robot	f84ff3d052	Merge pull request #117813 from pohly/scheduler-perf-test-runtime scheduler-perf: measure workload runtime and relabel workloads	2023-05-15 12:19:18 -07:00
Patrick Ohly	d85b91f343	scheduler-perf: measure workload runtime and relabel workloads The goal is to only label workloads as "performance" which actually run long enough to provide useful metrics. The throughput collector samples once per second, so a workload should run at least 5, better 10 seconds to get at least a minimal amount of samples for the percentile calculation. For benchstat analysis of runs with sufficient repetitions to get statistically meaningful results, each workload shouldn't run more than one minute, otherwise before/after analysis becomes too slow. The labels were chosen based on benchmark runs on a reasonably fast desktop. To know how long each workload takes, a new "runtime_seconds" benchmark result gets added.	2023-05-15 14:33:40 +02:00
Kubernetes Prow Robot	8b33eaa0a7	Merge pull request #116207 from pohly/dra-scheduler-perf scheduler_perf: dynamic resource allocation test cases	2023-05-10 10:58:59 -07:00
Patrick Ohly	034528a9f0	scheduler perf: add DynamicResourceAllocation test cases The default scheduler configuration must be based on the v1 API where the plugin is enabled by default. Then if (and only if) the DynamicResourceAllocation feature gate for a test is set, the corresponding API group also gets enabled. The normal dynamic resource claim controller is started if needed to create ResourceClaims from ResourceClaimTemplates. Without the upcoming optimizations in the scheduler, scheduling with dynamic resources is fairly slow. The new test cases take around 15 minutes wall clock time on my desktop.	2023-05-04 13:08:06 +02:00
Kante Yin	a7035f5459	Pass Context to StartTestServer Signed-off-by: Kante Yin <kerthcet@gmail.com>	2023-05-04 10:25:09 +08:00
Kubernetes Prow Robot	fb93000eb5	Merge pull request #117468 from HirazawaUi/replace-test-deprecated-ioutil Replace the deprecated ioutil methods in the test directory	2023-05-03 12:02:32 -07:00
Kubernetes Prow Robot	47f1bd9f80	Merge pull request #117649 from SataQiu/scheduler-remove-v1beta2-20230427 scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration component config	2023-05-03 09:54:41 -07:00
Kubernetes Prow Robot	aece6838e8	Merge pull request #117232 from pohly/scheduler-perf-code-cleanups scheduler_perf: code cleanups	2023-05-03 09:54:13 -07:00
SataQiu	1f7c07f355	scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration	2023-05-03 21:43:19 +08:00
Kubernetes Prow Robot	b4c6a70927	Merge pull request #117230 from pohly/scheduler-perf-throughput scheduler_perf: update throughputCollector	2023-04-29 12:12:17 -07:00
Patrick Ohly	b3e0bc8864	scheduler_perf: let the test decide which informers are needed This will change when adding dynamic resource allocation test cases. Instead of changing mustSetupScheduler and StartScheduler for that, let's return the informer factory and create informers as needed in the test.	2023-04-27 15:31:40 +02:00
Patrick Ohly	969d28b12b	scheduler_perf: refactor common code	2023-04-27 15:31:37 +02:00
Kubernetes Prow Robot	dd62a53e1a	Merge pull request #117196 from pohly/scheduler-perf-labels scheduler_perf: support test case selection via labels	2023-04-26 14:26:14 -07:00
Patrick Ohly	550d4c0074	scheduler_perf: support test case selection via labels Entire test cases and workloads can have labels attached to them. The union of these must match the label filter which works as in GitHub. The benchmark by default runs the tests that are labeled "performance", which is the same as before.	2023-04-26 21:01:31 +02:00
Patrick Ohly	78b8af9fed	scheduler_perf: update throughputCollector The previous solution had some shortcomings: - It was based on the assumption that the goroutine gets woken up at regular intervals. This is not actually guaranteed. Now the code keeps track of the actual start and end of an interval and verifies that assumption. - If no pod was scheduled (unlikely, but could happen), then "0 pods/s" got recorded. In such a case, the metric was always either zero or >= 1. A better solution is to extend the interval until some pod gets scheduled. With the larger time interval it is then possible to also track, for example, 0.5 pods/s.	2023-04-26 08:11:50 +02:00
HirazawaUi	a8b808ee6c	Replace the deprecated ioutil methods in the test directory	2023-04-18 21:51:10 +08:00
Kubernetes Prow Robot	aa026a6b30	Merge pull request #117202 from pohly/scheduler-perf-zero-count scheduler perf: allow creating 0 items	2023-04-11 21:18:20 -07:00
Kubernetes Prow Robot	251d4a00e6	Merge pull request #117201 from pohly/scheduler-perf-node-labels test/integration: create nodes directly with kubernetes.io/hostname label	2023-04-11 21:18:12 -07:00
Kubernetes Prow Robot	69b59b9d42	Merge pull request #117199 from pohly/scheduler-perf-race-fix scheduler_perf: fix race condition	2023-04-11 21:18:05 -07:00
Kubernetes Prow Robot	2bfaaf21c1	Merge pull request #117197 from pohly/scheduler-perf-cleanup scheduler perf: remove cleanup func	2023-04-11 21:17:57 -07:00
Patrick Ohly	464edfe6f6	test/integration: create nodes directly with kubernetes.io/hostname label By generating the unique name in advance, the label also can be set to a matching value directly in the Create request. This makes test startup in test/integration/scheduler_perf a bit faster because the extra patching can be avoided. It also leads to a better label because previously, the unique label value didn't match the node name. This is required for simulating dynamic resource allocation, which relies on the label to track where an allocated claim is available.	2023-04-11 16:35:37 +02:00

1 2 3 4 5 ...

309 Commits