Commit Graph

332 Commits

Author SHA1 Message Date
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Maciej Skoczeń
7532e74117 Don't fail on churn delete in scheduler_perf tests when context canceled 2024-06-19 08:08:13 +00:00
Maciej Skoczeń
05b2c14d64 Measure performance of scheduling when many gated pods 2024-06-18 12:39:21 +00:00
Maciej Skoczeń
c09440c691 Add possibility to delete pods at specified frequency in scheduler_perf tests 2024-06-18 09:40:50 +00:00
Kubernetes Prow Robot
5df8e15a84 Merge pull request #125562 from pohly/scheduler-perf-default-verbosity
scheduler_perf: fix setting default verbosity
2024-06-18 02:16:07 -07:00
Kubernetes Prow Robot
3b90ae4f58 Merge pull request #124548 from pohly/dra-scheduler-perf-structured-parameters
scheduler_perf: add DRA structured parameters test with shared claims
2024-06-18 02:15:58 -07:00
Patrick Ohly
381c28407e scheduler_perf: fix setting default verbosity
It needs to be set twice, once for ktesting+klog, once for
component-base/logs. The latter was not done before and thus quite a bit of log
output was produced with verbosity 0.
2024-06-18 08:44:16 +02:00
Patrick Ohly
d88a153086 scheduler_perf: add DRA structured parameters test with shared claims
Several pods sharing the same claim is not common, but can be useful and thus
should get tested.

Before, createPods and createAny operations were not able to do this because
each generated object was the same. What we need are different, predictable
names of the claims (from createAny) and different references to those in the
pods (from createPods). Now text/template processing with the index number of
the pod respectively claim as input is used to inject these varying fields. A
"div" function is needed to use the same claim in several different pods.

While at it, some existing test cases get cleaned up a bit (removal of
incorrect comments, adding comments for testing with queuing hints).
2024-06-17 10:13:22 +02:00
Kubernetes Prow Robot
0fd6746b2a Merge pull request #125518 from pohly/scheduler-perf-cleanup-fix
scheduler_perf: shut down apiserver clients before apiserver
2024-06-16 10:03:29 -07:00
kerthcet
1ffa1e17cd Remove noisy log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-06-12 11:53:35 +08:00
Patrick Ohly
246e2aedf5 scheduler_perf: shut down apiserver clients before apiserver
The cancellation of the context happened after the cleanup of the apiserver, so
clients using that context were kept running. That wasn't the intent and causes
a slow shutdown because the apiserver delays its shutdown when it has active
clients.

The fix is to create a new cancellation context and to use that for the
clients. The automatic cancellation of it then happens before the apiserver
cleanup.
2024-06-05 11:00:46 +02:00
Kensei Nakada
ef9e14db79 scheduler_perf: measure the degradation of daemonset scheduling 2024-06-05 02:36:31 +00:00
kerthcet
e678496c6e reorganize the scheduler_perf testcases
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-31 16:47:19 +08:00
Lubomir I. Ivanov
5e290ebc90 switch k/k to pause version 3.10 2024-05-24 10:02:51 +03:00
Kubernetes Prow Robot
ade0d2140a Merge pull request #124578 from sanposhiho/scheduler_perf_scheduler_plugin_execution_duration_seconds
support `scheduler_plugin_execution_duration_seconds` in scheduler_perf
2024-05-05 06:40:44 -07:00
Kensei Nakada
c72b688e12 support scheduler_plugin_execution_duration_seconds in scheduler_perf 2024-04-27 08:22:53 +00:00
Marek Siarkowicz
3ee8178768 Cleanup defer from SetFeatureGateDuringTest function call 2024-04-24 20:25:29 +02:00
Patrick Ohly
a0add8d2c7 dra api: NodeResourceModel -> ResourceModel
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00
Patrick Ohly
0b6a0d686a dra api: rename NodeResourceSlice -> ResourceSlice
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.

The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00
Patrick Ohly
4ed2b3eaeb scheduler_perf: test DRA with structured parameters 2024-03-07 22:21:58 +01:00
Kubernetes Prow Robot
55d1518126 Merge pull request #123588 from pohly/scheduler-perf-any-cleanup
scheduler_perf: automatically delete created objects
2024-03-04 04:49:12 -08:00
Patrick Ohly
eb6abf0462 scheduler_perf: automatically delete created objects
This is not relevant for namespaced objects, but matters for the cluster-scoped
ResourceClass during unit testing. This works right now because there is only
one such unit test, but will fail when adding a second one.

Instead of passing a boolean flag down into all functions where it might be
needed, it's now a context value.
2024-03-04 09:54:38 +01:00
Patrick Ohly
d6851ec735 scheduler_perf: fail when input YAML is invalid
The YAML files get decoded into an unstructured object, without validation, and
then sent to the apiserver with a generic client. The default behavior is to
issue a warning to the client, which gets logged by client-go. What we want
instead is an error that causes the test to fail in a clean way right at the
beginning.
2024-02-29 09:53:16 +01:00
Patrick Ohly
da0c9a93ae scheduler_perf: use dynamic client to create arbitrary objects
With a dynamic client and a rest mapper it is possible to load arbitrary YAML
files and create the object defined by it. This is simpler than adding specific
Go code for each supported type.

Because the version now matters, the incorrect version in the DRA YAMLs were
found and fixed.
2024-02-11 10:51:38 +01:00
Patrick Ohly
c46ae1b26a scheduler_perf: use ktesting.TContext + staging StartTestServer
ktesting.TContext combines several different interfaces. This makes the code
simpler because less parameters need to be passed around.

An intentional side effect is that the apiextensions client interface becomes
available, which makes it possible to use CRDs. This will be needed for future
DRA tests.

Support for CRDs depends on starting the apiserver via
k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the
CRD extensions. As discussed on Slack, the long-term goal is to replace the
in-tree StartTestServer with the one in staging, so this is going in the right
direction.
2024-02-11 10:51:38 +01:00
Kensei Nakada
f29d6970c9 doc(scheduler_perf): enrich the documentation 2024-01-15 08:50:08 +00:00
Kensei Nakada
74a6a4581f fix by linters 2023-12-02 09:58:34 +00:00
Kensei Nakada
5310abe14a make scheduler_perf usable from other repositories 2023-12-01 12:43:08 +00:00
Kubernetes Prow Robot
4b9e15e0fe Merge pull request #120873 from pohly/dra-e2e-test-driver-enhancements
e2e dra: enhance test driver
2023-10-10 13:32:55 +02:00
Kubernetes Prow Robot
44cfd556b3 Merge pull request #120339 from pohly/scheduler-perf-dra-driver-names
scheduler_perf: use different log names for different DRA drivers
2023-10-02 06:32:56 -07:00
Kubernetes Prow Robot
5cc92713d1 Merge pull request #120335 from pohly/scheduler-perf-pod-name
scheduler_perf: show name of one pending pod in log
2023-10-02 06:32:45 -07:00
Patrick Ohly
36146ad686 e2e dra: enhance test driver
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
  `leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
  command line flag overrides the config, but only when set explicitly.
  This makes it possible to pre-define complete driver setups where the
  name is associated with certain resource availability. This will be
  used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
  via node labels. This will be used for testing cluster autoscaling.
2023-09-25 19:50:33 +02:00
Junhao Zou
43c05e98ca cleanup: Replace the deprecated NewMemCacheClient with memory.NewMemCacheClient 2023-09-08 11:57:46 +08:00
Patrick Ohly
c74d045c4b scheduler_perf: show name of one pending pod in error message
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
2023-09-04 09:54:26 +02:00
Patrick Ohly
78f3b76390 scheduler_perf: use different log names for different DRA drivers
This helps when using -feature-gate=ContextualLogging=true and running the
SchedulingWithMultipleResourceClaims test case because then output from the two
driver instances is easy to distinguish.
2023-09-01 09:25:09 +02:00
Kubernetes Prow Robot
8428655308 Merge pull request #119963 from pohly/dra-scheduler-perf-multiple-claims
dra: scheduler_perf test case with multiple claims per pod
2023-08-29 00:25:34 -07:00
SataQiu
5524f1651a using wait.PollUntilContextTimeout instead of deprecated wait.Poll/PollWithContext/PollImmediate/PollImmediateWithContext methods for scheduler 2023-08-24 18:35:59 +08:00
Patrick Ohly
1e961af858 scheduler_perf: test case for DRA with multiple claims
The new test case covers pods with multiple claims from multiple drivers. This
leads to different behavior (scheduler waits for information from all drivers
instead of optimistically selecting one node right away) and to more concurrent
updates of the PodSchedulingContext objects.

The test case is currently not enabled for unit testing or integration
testing. It can be used manually with:

   -bench=BenchmarkPerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes
   ... -perf-scheduling-label-filter=
2023-08-16 08:32:36 +02:00
Patrick Ohly
0331e98957 scheduler_perf: fix installing DRA test driver multiple times
The driver name configuration option was ignored, so a second driver
would have used the same name.
2023-08-16 08:32:36 +02:00
Heba Elayoty
224087abfa Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
Jordan Liggitt
a164005cc0 Fix non-test code relying on test-code 2023-07-24 11:37:57 -04:00
Patrick Ohly
6b01ece580 scheduler-perf: fix perfdash display problem
perfdash expects all data items to have the same set of labels.  It then
renders drop-down buttons for each label with all values found for each
label. Previously, data items that didn't have a label didn't match any label
filter in perfdash and couldn't get selected because perfdash doesn't have
"unset" in it's drop-down menus.

To avoid that, scheduler-perf now collects all labels and then adds missing
labels with "not applicable" as value:

    {
      "data": {
        "Average": 939.7071223010004,
        "Perc50": 927.7987421383649,
        "Perc90": 2166.153846153846,
        "Perc95": 2363.076923076923,
        "Perc99": 2520.6153846153848
      },
      "unit": "ms",
      "labels": {
        "Metric": "scheduler_pod_scheduling_duration_seconds",
        "Name": "SchedulingBasic/5000Nodes/namespace-2",
        "extension_point": "not applicable",
        "result": "not applicable"
      }
    },
    ...
    {
      "data": {
        "Average": 1.1172570650000004,
        "Perc50": 1.1418367346938776,
        "Perc90": 1.5500000000000003,
        "Perc95": 1.6410256410256412,
        "Perc99": 3.7333333333333334
      },
      "unit": "ms",
      "labels": {
        "Metric": "scheduler_framework_extension_point_duration_seconds",
        "Name": "SchedulingBasic/5000Nodes/namespace-2",
        "extension_point": "Score",
        "result": "not applicable"
      }
    },
2023-07-03 21:16:53 +02:00
Patrick Ohly
29e5771aa4 scheduler-perf: shorten "Name" label in metrics
Because the JSON file gets written at the end of the top-level benchmark, all
data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This
is redundant and makes it harder to see the actual name. Now that common prefix
gets removed.
2023-07-03 21:15:16 +02:00
Patrick Ohly
0d41d509d2 scheduler_perf: replace gomega.Eventually with wait.PollUntilContextTimeout
This is done for the sake of consistency. The failure message becomes less
useful.
2023-06-28 09:22:26 +02:00
Patrick Ohly
cecebe8ea2 scheduler_perf: add TestScheduling integration test
This runs workloads that are labeled as "integration-test". The apiserver and
scheduler are only started once per unique configuration, followed by each
workload using that configuration. This makes execution faster. In contrast to
benchmarking, we care less about starting with a clean slate for each test.
2023-06-28 09:22:25 +02:00
Patrick Ohly
dfd646e0a8 scheduler_perf: fix namespace deletion
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
  so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
  deleted namespaces.
2023-06-28 09:22:25 +02:00
Patrick Ohly
d9c16a1ced scheduler_perf: fix goroutine leak in runWorkload
This becomes relevant when doing more fine-grained leak checking.
2023-06-28 08:14:34 +02:00
Patrick Ohly
c91c578795 scheduler_perf: skip expensive cleanup during benchmarks
Each benchmark test case runs with a fresh etcd instance. Therefore it is not
necessary to delete objects after a run.

A future unit test might reuse etcd, therefore cleanup is optional.
2023-06-22 08:56:14 +02:00
Kubernetes Prow Robot
2057a48ee5 Merge pull request #114771 from sanposhiho/scheduling_perf_scheduler_scheduling_attempt_duration_seconds
feature(scheduler_perf): distinguish result in scheduler_scheduling_attempt_duration_seconds metric result
2023-06-07 06:18:13 -07:00
Kensei Nakada
a4ea058cc7 feature(scheduler_perf): distinguish result in scheduler_scheduling_attempt_duration_seconds metric result 2023-06-02 14:45:55 +00:00