Let's wait for the local node (aka the kubelet)
to be ready before to query podresources again,
to avoid false negatives.
Co-authored-by: Artyom Lukianov <alukiano@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
we need to make sure the system state is completely cleaned up
again, to avoid to mess up with the shared node state, before
we transition from one test to another.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Since commit 42dd01aa3f the cpuRequest is in millicores, hence
we need to properly check translating to exclusive cpus
when verifying the resource allocation.
Signed-off-by: Francesco Romani <fromani@redhat.com>
the intent is to make the code more readable, no intended
changes in behaviour. Now it should be a bit more explicit
why the code is checking some values.
Signed-off-by: Francesco Romani <fromani@redhat.com>
* Cleanup FeatureGate skippers
* Perform changes requested by review
* some more review related changes
* Rename skipper functions to make code more readable
* add utilfeature back in
Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.
To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.
Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.
In the happy path, no expected change in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch changes cpuCount to cpuRequest in order to cater to cases
where guaranteed pods make non-integral CPU Requests.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
The Topology Manager e2e tests wants to run on real multi-NUMA system
and want to consume real devices supported by device plugins; SRIOV
devices happen to be the most commonly available of such devices.
CI machines aren't multi NUMA nor expose SRIOV devices, so the biggest portion
of the tests will just skip, and we need to keep it like this until we
figure out how to enable these features.
However, some organizations can and want to run the testsuite on bare metal;
in this case, the current test will skip (not fail) with misconfigured
boxes, and this reports a misleading result. It will be much better to
fail if the test preconditions aren't met.
To satisfy both needs, we add an option, controlled by an environment
variable, to fail (not skip) if the machine on which the test run
doesn't meet the expectations (multi-NUMA, 4+ cores per NUMA cell,
expose SRIOV VFs).
We keep the old behaviour as default to keep being CI friendly.
Signed-off-by: Francesco Romani <fromani@redhat.com>
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.
It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Previously the code used to delete pods serially.
In this patch we factor out code to do that in parallel,
using goroutines.
This shaves some time in the e2e tm test run with no intended
changes in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add feature gate to disable the GetAllocatableResources API.
The feature gate isd alpha stage, disabled by default.
Add e2e test to demonstrate the behaviour with feature gate disabled.
Signed-off-by: Francesco Romani <fromani@redhat.com>
speedup the cleanup after testcases deleting pods in separate
goroutines.
The post-test cleanup stage must be done carefully since pod require
exclusive allocation - so pods must take all the steps to properly
cleanup the tests to avoid to pollute the environment, but
this has a negative effect on test duration (take longer).
Hence, we add safe speedups like doing pod deletions in parallel.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add e2e tests for the new GetAllocatableResources API.
The tests are added in the `podresources_test` suite
created previously in this series.
Signed-off-by: Francesco Romani <fromani@redhat.com>