Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
`leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
command line flag overrides the config, but only when set explicitly.
This makes it possible to pre-define complete driver setups where the
name is associated with certain resource availability. This will be
used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
via node labels. This will be used for testing cluster autoscaling.
Analyzing the CPU profile of
go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.*Claim.* -benchtime=1ns -run=xxx ./test/integration/scheduler_perf
showed that a significant amount of time was spent iterating over allocated
claims to determine how many were allocated per node. That "naive" approach was
taken to avoid maintaining a redundant data structure, but now that performance
measurements show that this comes at a cost, it's not "premature optimization"
anymore to introduce such a second field.
The average scheduling throughput in
SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4
pods/s to 19.2 pods/s.
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
When running as part of the scheduler_perf benchmark testing, we want to print
less information by default, so we should use V to limit verbosity
Pretty-printing doesn't belong into "application" code. I am moving that into
the ktesting formatting (https://github.com/kubernetes/kubernetes/pull/116180).
The check for "resources available on a node" must treat nodes that are not
listed as "no resources available". The previous logic only worked because all
nodes were listed during E2E testing. The upcoming integration testing is
covering additional scenarios and triggered this broken case.
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
The driver can be used manually against a cluster started with
local-up-cluster.sh and is also used for E2E testing. Because the tests proxy
connections from the nodes into the e2e.test binary and create/delete files via
the equivalent of "kubectl exec dd/rm", they can be run against arbitrary
clusters. Each test gets its own driver instance and resource class, therefore
they can run in parallel.