The goal is to only label workloads as "performance" which actually run long
enough to provide useful metrics. The throughput collector samples once per
second, so a workload should run at least 5, better 10 seconds to get at least
a minimal amount of samples for the percentile calculation.
For benchstat analysis of runs with sufficient repetitions to get statistically
meaningful results, each workload shouldn't run more than one minute, otherwise
before/after analysis becomes too slow.
The labels were chosen based on benchmark runs on a reasonably fast desktop. To
know how long each workload takes, a new "runtime_seconds" benchmark result
gets added.
Add two new metrics to monitor the client-go logic that
generate http.Transports for the clients.
- rest_client_transport_cache_entries is a gauge metrics
with the number of existin entries in the internal cache
- rest_client_transport_create_calls_total is a counter
that increments each time a new transport is created, storing
the result of the operation needed to generate it: hit, miss
or uncacheable
Change-Id: I2d8bde25281153d8f8e8faa249385edde3c1cb39
* update serial number to a valid non-zero number in ca certificate
* fix the existing problem (0 SerialNumber in all certificate) as part of this PR in a separate commit
The default scheduler configuration must be based on the v1 API where the
plugin is enabled by default. Then if (and only if) the
DynamicResourceAllocation feature gate for a test is set, the corresponding
API group also gets enabled.
The normal dynamic resource claim controller is started if needed to create
ResourceClaims from ResourceClaimTemplates.
Without the upcoming optimizations in the scheduler, scheduling with dynamic
resources is fairly slow. The new test cases take around 15 minutes wall clock
time on my desktop.
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
Entire test cases and workloads can have labels attached to them. The union of
these must match the label filter which works as in GitHub. The benchmark by
default runs the tests that are labeled "performance", which is the same as
before.
The previous solution had some shortcomings:
- It was based on the assumption that the goroutine gets woken up at regular
intervals. This is not actually guaranteed. Now the code keeps track of the
actual start and end of an interval and verifies that assumption.
- If no pod was scheduled (unlikely, but could happen), then
"0 pods/s" got recorded. In such a case, the metric was always either
zero or >= 1. A better solution is to extend the interval
until some pod gets scheduled. With the larger time interval
it is then possible to also track, for example, 0.5 pods/s.