With a dynamic client and a rest mapper it is possible to load arbitrary YAML
files and create the object defined by it. This is simpler than adding specific
Go code for each supported type.
Because the version now matters, the incorrect version in the DRA YAMLs were
found and fixed.
ktesting.TContext combines several different interfaces. This makes the code
simpler because less parameters need to be passed around.
An intentional side effect is that the apiextensions client interface becomes
available, which makes it possible to use CRDs. This will be needed for future
DRA tests.
Support for CRDs depends on starting the apiserver via
k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the
CRD extensions. As discussed on Slack, the long-term goal is to replace the
in-tree StartTestServer with the one in staging, so this is going in the right
direction.
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
`leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
command line flag overrides the config, but only when set explicitly.
This makes it possible to pre-define complete driver setups where the
name is associated with certain resource availability. This will be
used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
via node labels. This will be used for testing cluster autoscaling.
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
This helps when using -feature-gate=ContextualLogging=true and running the
SchedulingWithMultipleResourceClaims test case because then output from the two
driver instances is easy to distinguish.
The new test case covers pods with multiple claims from multiple drivers. This
leads to different behavior (scheduler waits for information from all drivers
instead of optimistically selecting one node right away) and to more concurrent
updates of the PodSchedulingContext objects.
The test case is currently not enabled for unit testing or integration
testing. It can be used manually with:
-bench=BenchmarkPerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes
... -perf-scheduling-label-filter=
perfdash expects all data items to have the same set of labels. It then
renders drop-down buttons for each label with all values found for each
label. Previously, data items that didn't have a label didn't match any label
filter in perfdash and couldn't get selected because perfdash doesn't have
"unset" in it's drop-down menus.
To avoid that, scheduler-perf now collects all labels and then adds missing
labels with "not applicable" as value:
{
"data": {
"Average": 939.7071223010004,
"Perc50": 927.7987421383649,
"Perc90": 2166.153846153846,
"Perc95": 2363.076923076923,
"Perc99": 2520.6153846153848
},
"unit": "ms",
"labels": {
"Metric": "scheduler_pod_scheduling_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "not applicable",
"result": "not applicable"
}
},
...
{
"data": {
"Average": 1.1172570650000004,
"Perc50": 1.1418367346938776,
"Perc90": 1.5500000000000003,
"Perc95": 1.6410256410256412,
"Perc99": 3.7333333333333334
},
"unit": "ms",
"labels": {
"Metric": "scheduler_framework_extension_point_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "Score",
"result": "not applicable"
}
},
Because the JSON file gets written at the end of the top-level benchmark, all
data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This
is redundant and makes it harder to see the actual name. Now that common prefix
gets removed.
This runs workloads that are labeled as "integration-test". The apiserver and
scheduler are only started once per unique configuration, followed by each
workload using that configuration. This makes execution faster. In contrast to
benchmarking, we care less about starting with a clean slate for each test.
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
deleted namespaces.
Each benchmark test case runs with a fresh etcd instance. Therefore it is not
necessary to delete objects after a run.
A future unit test might reuse etcd, therefore cleanup is optional.
This is required because an empty name is no longer supported: the
perf-dashboard is run with --allow-parsers-matching-all-tests=false with causes
perfdash to skip current configuration for BenchmarkPerfResults as it does not
have name
set (4674704f45/perfdash/metrics-downloader.go (L165-L167)).
The perf-dash config needs to be updated accordingly.
The goal is to only label workloads as "performance" which actually run long
enough to provide useful metrics. The throughput collector samples once per
second, so a workload should run at least 5, better 10 seconds to get at least
a minimal amount of samples for the percentile calculation.
For benchstat analysis of runs with sufficient repetitions to get statistically
meaningful results, each workload shouldn't run more than one minute, otherwise
before/after analysis becomes too slow.
The labels were chosen based on benchmark runs on a reasonably fast desktop. To
know how long each workload takes, a new "runtime_seconds" benchmark result
gets added.
The default scheduler configuration must be based on the v1 API where the
plugin is enabled by default. Then if (and only if) the
DynamicResourceAllocation feature gate for a test is set, the corresponding
API group also gets enabled.
The normal dynamic resource claim controller is started if needed to create
ResourceClaims from ResourceClaimTemplates.
Without the upcoming optimizations in the scheduler, scheduling with dynamic
resources is fairly slow. The new test cases take around 15 minutes wall clock
time on my desktop.
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
Entire test cases and workloads can have labels attached to them. The union of
these must match the label filter which works as in GitHub. The benchmark by
default runs the tests that are labeled "performance", which is the same as
before.
The previous solution had some shortcomings:
- It was based on the assumption that the goroutine gets woken up at regular
intervals. This is not actually guaranteed. Now the code keeps track of the
actual start and end of an interval and verifies that assumption.
- If no pod was scheduled (unlikely, but could happen), then
"0 pods/s" got recorded. In such a case, the metric was always either
zero or >= 1. A better solution is to extend the interval
until some pod gets scheduled. With the larger time interval
it is then possible to also track, for example, 0.5 pods/s.
By generating the unique name in advance, the label also can be set to a
matching value directly in the Create request. This makes test startup in
test/integration/scheduler_perf a bit faster because the extra patching can be
avoided.
It also leads to a better label because previously, the unique label value
didn't match the node name. This is required for simulating dynamic resource
allocation, which relies on the label to track where an allocated claim is
available.