use Extraconfig to configure the repair interval
and add an integration test for services finalizers, and
possible races with the services repair loop.
- Debian base used was older (v2.1.3) missing multiple fixed CVEs
- Minor update to distroless debian image name to explicitly point
to debian 10
- Debian base image now points to buster-1.9.0
The Topology Manager e2e tests wants to run on real multi-NUMA system
and want to consume real devices supported by device plugins; SRIOV
devices happen to be the most commonly available of such devices.
CI machines aren't multi NUMA nor expose SRIOV devices, so the biggest portion
of the tests will just skip, and we need to keep it like this until we
figure out how to enable these features.
However, some organizations can and want to run the testsuite on bare metal;
in this case, the current test will skip (not fail) with misconfigured
boxes, and this reports a misleading result. It will be much better to
fail if the test preconditions aren't met.
To satisfy both needs, we add an option, controlled by an environment
variable, to fail (not skip) if the machine on which the test run
doesn't meet the expectations (multi-NUMA, 4+ cores per NUMA cell,
expose SRIOV VFs).
We keep the old behaviour as default to keep being CI friendly.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Previously we would try to infer the `ipFamilyPolicy` from `clusterIPs`
and/or `ipFamilies`. That is too tricky. Now you MUST specify
`ipFamilyPolicy` as one of the dual-stack options in order to get a
dual-stack service.
In older versions of Kubernetes (at least pre-0.19, it's the earliest
this test will run unmodified on), Pods that depended on devices could be
restarted after the device plugin had been removed. Currently however,
this isn't possible, as during ContainerManager.GetResources(), we
attempt to DeviceManager.GetDeviceRunContainerOptions() which fails as
there's no cached endpoint information for the plugin type.
This commit therefore breaks apart the existing test into two:
- One active test that validates that assignments are maintained across
restarts
- One skipped test that validates the behaviour after GPUs have been
removed, in case we decide that this is a bug that should be fixed in
the future.
15m is enough for Cluster Autoscaler to remove empty nodes, so we need
to break them sooner than that. Instead, wait 15m after breaking them to
ensure Cluster Autoscaler will consider them as unready instead of still
starting.
The e2e test "should have Endpoints and EndpointSlices pointing to
the API Server Service" was veryfing the current endpoints
reconciler implementation on the apiservers, however, users may
disable the endpoint reconciler and create their own.
This e2e test is also a conformance test, so we should test the
behaviour and not the implementation details. The test verifies
that a kubernetes.default service exist, an endpoint and endpoint
slices object referencing that service exist and are equivalent.