This makes the API nicer:
resourceClaims:
- name: with-template
resourceClaimTemplateName: test-inline-claim-template
- name: with-claim
resourceClaimName: test-shared-claim
Previously, this was:
resourceClaims:
- name: with-template
source:
resourceClaimTemplateName: test-inline-claim-template
- name: with-claim
source:
resourceClaimName: test-shared-claim
A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.
This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
It needs to be set twice, once for ktesting+klog, once for
component-base/logs. The latter was not done before and thus quite a bit of log
output was produced with verbosity 0.
Several pods sharing the same claim is not common, but can be useful and thus
should get tested.
Before, createPods and createAny operations were not able to do this because
each generated object was the same. What we need are different, predictable
names of the claims (from createAny) and different references to those in the
pods (from createPods). Now text/template processing with the index number of
the pod respectively claim as input is used to inject these varying fields. A
"div" function is needed to use the same claim in several different pods.
While at it, some existing test cases get cleaned up a bit (removal of
incorrect comments, adding comments for testing with queuing hints).
The cancellation of the context happened after the cleanup of the apiserver, so
clients using that context were kept running. That wasn't the intent and causes
a slow shutdown because the apiserver delays its shutdown when it has active
clients.
The fix is to create a new cancellation context and to use that for the
clients. The automatic cancellation of it then happens before the apiserver
cleanup.
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.
The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
This is not relevant for namespaced objects, but matters for the cluster-scoped
ResourceClass during unit testing. This works right now because there is only
one such unit test, but will fail when adding a second one.
Instead of passing a boolean flag down into all functions where it might be
needed, it's now a context value.
The YAML files get decoded into an unstructured object, without validation, and
then sent to the apiserver with a generic client. The default behavior is to
issue a warning to the client, which gets logged by client-go. What we want
instead is an error that causes the test to fail in a clean way right at the
beginning.
With a dynamic client and a rest mapper it is possible to load arbitrary YAML
files and create the object defined by it. This is simpler than adding specific
Go code for each supported type.
Because the version now matters, the incorrect version in the DRA YAMLs were
found and fixed.
ktesting.TContext combines several different interfaces. This makes the code
simpler because less parameters need to be passed around.
An intentional side effect is that the apiextensions client interface becomes
available, which makes it possible to use CRDs. This will be needed for future
DRA tests.
Support for CRDs depends on starting the apiserver via
k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the
CRD extensions. As discussed on Slack, the long-term goal is to replace the
in-tree StartTestServer with the one in staging, so this is going in the right
direction.
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
`leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
command line flag overrides the config, but only when set explicitly.
This makes it possible to pre-define complete driver setups where the
name is associated with certain resource availability. This will be
used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
via node labels. This will be used for testing cluster autoscaling.
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
This helps when using -feature-gate=ContextualLogging=true and running the
SchedulingWithMultipleResourceClaims test case because then output from the two
driver instances is easy to distinguish.
The new test case covers pods with multiple claims from multiple drivers. This
leads to different behavior (scheduler waits for information from all drivers
instead of optimistically selecting one node right away) and to more concurrent
updates of the PodSchedulingContext objects.
The test case is currently not enabled for unit testing or integration
testing. It can be used manually with:
-bench=BenchmarkPerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes
... -perf-scheduling-label-filter=
perfdash expects all data items to have the same set of labels. It then
renders drop-down buttons for each label with all values found for each
label. Previously, data items that didn't have a label didn't match any label
filter in perfdash and couldn't get selected because perfdash doesn't have
"unset" in it's drop-down menus.
To avoid that, scheduler-perf now collects all labels and then adds missing
labels with "not applicable" as value:
{
"data": {
"Average": 939.7071223010004,
"Perc50": 927.7987421383649,
"Perc90": 2166.153846153846,
"Perc95": 2363.076923076923,
"Perc99": 2520.6153846153848
},
"unit": "ms",
"labels": {
"Metric": "scheduler_pod_scheduling_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "not applicable",
"result": "not applicable"
}
},
...
{
"data": {
"Average": 1.1172570650000004,
"Perc50": 1.1418367346938776,
"Perc90": 1.5500000000000003,
"Perc95": 1.6410256410256412,
"Perc99": 3.7333333333333334
},
"unit": "ms",
"labels": {
"Metric": "scheduler_framework_extension_point_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "Score",
"result": "not applicable"
}
},
Because the JSON file gets written at the end of the top-level benchmark, all
data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This
is redundant and makes it harder to see the actual name. Now that common prefix
gets removed.
This runs workloads that are labeled as "integration-test". The apiserver and
scheduler are only started once per unique configuration, followed by each
workload using that configuration. This makes execution faster. In contrast to
benchmarking, we care less about starting with a clean slate for each test.
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
deleted namespaces.
Each benchmark test case runs with a fresh etcd instance. Therefore it is not
necessary to delete objects after a run.
A future unit test might reuse etcd, therefore cleanup is optional.