Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.
To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.
Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.
In the happy path, no expected change in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
In the `restartKubelet` helper, we use `exec.Command`, whose
return value is the output as the command, but as `[]byte`.
The way we logged the output of the command was as value, making
the output, meant to be human readable, unnecessarily hard to read.
We fix this annoying behaviour converting the output to string before
to log it out, making pretty obvious to understand the outcome of
the command.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch changes cpuCount to cpuRequest in order to cater to cases
where guaranteed pods make non-integral CPU Requests.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
apparmor is no longer found in Alpine edge/testing but in
edge/community, presumably in preparation for full-fledged inclusion in
3.15. If so, once that is released, BASEIMAGE can be updated again and
the explicit --repository flag to 'apk add' dropped.
Fixes: https://github.com/kubernetes/kubernetes/issues/105528
Once the node gets deleted, the nodelifecycle controller
is racing to update pod status and the pod deletion logic
is failing causing tests to flake. This commit moves
the testContext creation to within the test loop and deletes nodes,
namespace within the test loop. We don't explicitly call the node
deletion within the loop but the `testutils.CleanupTest(t, testCtx)`
call ensures that the namespace, nodes gets deleted.
While running tests in parallel, especially those with higher loads
than others, it might take some time for Pods to be Running, even more
so if the image has to be pulled as well.
The test [sig-node] Pods should delete a collection of pods [Conformance]
only waits for the for the pods to be scheduled before deleting them, and
expects them to be gone in 1 minute, which can flake because of the above
reasons. Note that the operations are in order, and kubelet runs them in
order, which means that the pod first has to enter the Running state
before attempting to delete it.
This commit waits for the Pods to enter the Running state first before
deleting the entire collection.
Co-Authored-By: Antonio Ojea <aojea@redhat.com>
This commit fixes the LocalStorageCapacityIsolationEviction test by
acknowledging that in its default configuration kubelet will no-longer
evict memory-backed volume pods as they cannot use more than their
assigned limit with SizeMemoryBackedVolumes enabled.
To account for the old behaviour, we also add a test that explicitly
disables the feature to test the behaviour of memory backed local
volumes in those scenarios. That test can be removed when/if the feature
gate is removed.
Currently the storage eviction tests fail for a few reasons:
- They re-enter storage exhaustion after pulling the images during
cleanup (increasing test storage reqs, and adding verification for
future diagnosis)
- They were timing out, as in practice it seems that eviction takes just
over 10 minutes on an n1-standard in many cases. I'm raising these to
15 to provide some padding.
This should ideally bring these tests to passing on CI, as they've now
passed locally for me several times with the remote GCE env.
Follow up work involves diagnosing why these take so long, and
restructuring them to be less finicky.