Commit Graph

104 Commits

Author SHA1 Message Date
kerthcet
1ffa1e17cd Remove noisy log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-06-12 11:53:35 +08:00
Kensei Nakada
c72b688e12 support scheduler_plugin_execution_duration_seconds in scheduler_perf 2024-04-27 08:22:53 +00:00
Patrick Ohly
c46ae1b26a scheduler_perf: use ktesting.TContext + staging StartTestServer
ktesting.TContext combines several different interfaces. This makes the code
simpler because less parameters need to be passed around.

An intentional side effect is that the apiextensions client interface becomes
available, which makes it possible to use CRDs. This will be needed for future
DRA tests.

Support for CRDs depends on starting the apiserver via
k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the
CRD extensions. As discussed on Slack, the long-term goal is to replace the
in-tree StartTestServer with the one in staging, so this is going in the right
direction.
2024-02-11 10:51:38 +01:00
Kensei Nakada
5310abe14a make scheduler_perf usable from other repositories 2023-12-01 12:43:08 +00:00
Patrick Ohly
c74d045c4b scheduler_perf: show name of one pending pod in error message
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
2023-09-04 09:54:26 +02:00
Patrick Ohly
6b01ece580 scheduler-perf: fix perfdash display problem
perfdash expects all data items to have the same set of labels.  It then
renders drop-down buttons for each label with all values found for each
label. Previously, data items that didn't have a label didn't match any label
filter in perfdash and couldn't get selected because perfdash doesn't have
"unset" in it's drop-down menus.

To avoid that, scheduler-perf now collects all labels and then adds missing
labels with "not applicable" as value:

    {
      "data": {
        "Average": 939.7071223010004,
        "Perc50": 927.7987421383649,
        "Perc90": 2166.153846153846,
        "Perc95": 2363.076923076923,
        "Perc99": 2520.6153846153848
      },
      "unit": "ms",
      "labels": {
        "Metric": "scheduler_pod_scheduling_duration_seconds",
        "Name": "SchedulingBasic/5000Nodes/namespace-2",
        "extension_point": "not applicable",
        "result": "not applicable"
      }
    },
    ...
    {
      "data": {
        "Average": 1.1172570650000004,
        "Perc50": 1.1418367346938776,
        "Perc90": 1.5500000000000003,
        "Perc95": 1.6410256410256412,
        "Perc99": 3.7333333333333334
      },
      "unit": "ms",
      "labels": {
        "Metric": "scheduler_framework_extension_point_duration_seconds",
        "Name": "SchedulingBasic/5000Nodes/namespace-2",
        "extension_point": "Score",
        "result": "not applicable"
      }
    },
2023-07-03 21:16:53 +02:00
Patrick Ohly
cecebe8ea2 scheduler_perf: add TestScheduling integration test
This runs workloads that are labeled as "integration-test". The apiserver and
scheduler are only started once per unique configuration, followed by each
workload using that configuration. This makes execution faster. In contrast to
benchmarking, we care less about starting with a clean slate for each test.
2023-06-28 09:22:25 +02:00
Patrick Ohly
dfd646e0a8 scheduler_perf: fix namespace deletion
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
  so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
  deleted namespaces.
2023-06-28 09:22:25 +02:00
kerthcet
0616d15712 Fix perf-test by increasing the error margin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-05-17 12:14:06 +08:00
Kubernetes Prow Robot
8b33eaa0a7
Merge pull request #116207 from pohly/dra-scheduler-perf
scheduler_perf: dynamic resource allocation test cases
2023-05-10 10:58:59 -07:00
Patrick Ohly
034528a9f0 scheduler perf: add DynamicResourceAllocation test cases
The default scheduler configuration must be based on the v1 API where the
plugin is enabled by default. Then if (and only if) the
DynamicResourceAllocation feature gate for a test is set, the corresponding
API group also gets enabled.

The normal dynamic resource claim controller is started if needed to create
ResourceClaims from ResourceClaimTemplates.

Without the upcoming optimizations in the scheduler, scheduling with dynamic
resources is fairly slow. The new test cases take around 15 minutes wall clock
time on my desktop.
2023-05-04 13:08:06 +02:00
Kante Yin
a7035f5459 Pass Context to StartTestServer
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-05-04 10:25:09 +08:00
Kubernetes Prow Robot
47f1bd9f80
Merge pull request #117649 from SataQiu/scheduler-remove-v1beta2-20230427
scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration  component config
2023-05-03 09:54:41 -07:00
Kubernetes Prow Robot
aece6838e8
Merge pull request #117232 from pohly/scheduler-perf-code-cleanups
scheduler_perf: code cleanups
2023-05-03 09:54:13 -07:00
SataQiu
1f7c07f355 scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration 2023-05-03 21:43:19 +08:00
Patrick Ohly
b3e0bc8864 scheduler_perf: let the test decide which informers are needed
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
2023-04-27 15:31:40 +02:00
Patrick Ohly
78b8af9fed scheduler_perf: update throughputCollector
The previous solution had some shortcomings:

- It was based on the assumption that the goroutine gets woken up at regular
  intervals. This is not actually guaranteed. Now the code keeps track of the
  actual start and end of an interval and verifies that assumption.

- If no pod was scheduled (unlikely, but could happen), then
  "0 pods/s" got recorded. In such a case, the metric was always either
  zero or >= 1. A better solution is to extend the interval
  until some pod gets scheduled. With the larger time interval
  it is then possible to also track, for example, 0.5 pods/s.
2023-04-26 08:11:50 +02:00
Kubernetes Prow Robot
2bfaaf21c1
Merge pull request #117197 from pohly/scheduler-perf-cleanup
scheduler perf: remove cleanup func
2023-04-11 21:17:57 -07:00
Patrick Ohly
a869a89825 scheduler perf: remove cleanup func
b.Cleanup may as well get called inside the function instead
of leaving that to the caller.
2023-04-11 09:43:45 +02:00
sarab
8d18ae6fc2 Use the generic Set in scheduler 2023-04-09 11:34:17 +05:30
Kante Yin
3d0894fabf
Fix failure(context canceled) in scheduler_perf benchmark (#114843)
* Fix failure in scheduler_perf benchmark

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Fatal when error in cleaning up nodes in scheduler perf tests

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Use derived context to better organize the codes

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Change log level to 2 in scheduler perf-test

Signed-off-by: Kante Yin <kerthcet@gmail.com>

---------

Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-01-30 16:21:00 -08:00
kerthcet
d6ffb47832 Replace klog with benchmark log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-11-09 09:11:55 +08:00
Wojciech Tyczyński
5b042f0bf4 Remove RunAnAPIServer from integration tests 2022-07-25 17:52:31 +02:00
Kensei Nakada
4af3c5efeb Skip adding data to avoid "json: unsupported value: NaN" panic when data is NaN 2022-05-05 16:11:22 +00:00
Kubernetes Prow Robot
21c0f6f6ff
Merge pull request #107677 from pohly/scheduler-integration-benchmark
scheduler integration benchmark improvements
2022-02-14 01:23:28 -08:00
Patrick Ohly
e1e84c8e5f scheduler_perf: run with -v=0 by default
This provides a mechanism for overriding the forced increase of the klog
verbosity to 4 when starting the apiserver and uses that for the scheduler_perf
benchmark. Other tests run as before.

A global variable was used because adding an explicit parameter to several
helper functions would have caused a lot of code churn (test ->
integration/util.StartApiserver ->
integration/framework.RunAnAPIServerUsingServer ->
integration/framework.startAPIServerOrDie).
2022-02-11 16:58:33 +01:00
ahrtr
fe95aa614c io/ioutil has already been deprecated in golang 1.16, so replace all ioutil with io and os 2022-02-03 05:32:12 +08:00
kerthcet
75a255d2ed remove scheduler component config v1beta1
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-09-28 13:13:17 +08:00
Dave Chen
dda8090037 Format json file with proper indentation
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-09-07 16:14:34 +08:00
Dave Chen
63b4710f38 Don't expose struct from prometheus client library 2021-08-27 22:21:24 +08:00
Dave Chen
58ab18bc1e Add the metric data for different extension points
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-08-23 13:43:48 +08:00
Wei Huang
55765f1b49
sched: support HistogramVec in scheduler performance test 2021-07-26 20:27:37 -07:00
Mengjiao Liu
4eab19ae7d Clean up the master term in test/integration comments 2021-06-18 16:31:05 +08:00
Wei Huang
e7f67b1a63
Surface kube config in scheduler framework handle 2021-03-30 11:54:59 -07:00
Kubernetes Prow Robot
c78b5497ae
Merge pull request #99638 from chendave/perf_config
Enable scheduler_perf to support scheduler config file
2021-03-16 14:49:03 -07:00
Dave Chen
d50c0aeb5f Enable scheduler_perf to support scheduler config file
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-03-16 23:13:40 +08:00
Wei Huang
68ff3168b8
sched: fix a bug that literal 'p99' is mapped to 95th-percentile 2021-03-12 12:03:12 -08:00
Wei Huang
b93b4a2c96
sched: fix a bug that metrics of init or collected pods are re-collected 2021-03-11 10:28:39 -08:00
Kubernetes Prow Robot
823fa75643
Merge pull request #98900 from Huang-Wei/churn-cluster-op
Introduce a churnOp to scheduler perf testing framework
2021-03-11 02:00:24 -08:00
Alexander Minbaev
359116f525 add if check for number of scheduled pods to be greater than 0 2021-03-05 09:05:42 -06:00
Wei Huang
1e5878b910
Introduce a churnOp to scheduler perf testing framework
- support two modes: recreate and create
- use DynmaicClient to create API objects
2021-03-03 06:51:53 -08:00
Alexander Minbaev
5b73122105 Fix typo in util.go 2021-02-25 00:54:37 -06:00
Wei Huang
983272ce6a
sched: create dataItemsDir during a performance test if not exist 2021-02-17 12:44:16 -08:00
tangwz
518c502f54 scheduler_perf: use time.Ticker in throughput measurement 2020-09-19 09:36:17 +08:00
Adhityaa Chandrasekar
71bc9ce9c2 scheduler_perf: refactor to allow arbitrary workloads
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2020-09-17 19:22:20 +00:00
Dave Chen
ae735a1189 scheduler_perf: fix the nil pointer dereference
Signed-off-by: Dave Chen <dave.chen@arm.com>
2020-06-16 13:23:05 +08:00
Abdullah Gharaibeh
d650b57141 Added Preemption benchmark 2020-05-28 14:05:52 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Dave Chen
49283364bf Decouple yaml based integration test from legacy test
- Move utilities or constants out so that both of them should be able
to run independently.
- Rename the legacy test so that it can eventually be deleted when the
perf dash changes is done
2020-03-27 08:45:59 +08:00
Jan Chaloupka
5b3b4de972 scheduler_perf: do not override throughput labels
Throughput labels are currently initialized with a "Name" label.
So we need to append to the map instead of creating a new one.
2020-02-28 16:10:50 +01:00