/metrics/resource/v1alpha1 was deprecated and moved to
/metrics/resource
Renames to remove v1alpha1 from function names and matcher variables.
Pod deletion was taking multiple minutes, so set GracePeriodSeconds to 0.
Commented restart loop during test pod startup.
Move ResourceMetricsAPI out of Orphans by giving it a NodeFeature tag.
API removed in 7b7c73b#88568
Test created 6051664#73946
Under e2e tests possible the situation when we restart the kubelet
number of times in the short time frame. When it happens the systemd
can fail the service restart with the `Failed with result 'start-limit-hit'.`
error.
To avoid this situation the code will reset the kubelet service start failures
on each call to the kubelet restart command.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
This drops testfiles.ReadOrDie and updated testfiles.Exists to return an
error, forcing the caller to decide whether to call framework.Fail or do
something else.
It makes for a slightly less friendly API, but also means the package is
decoupled from framework again, as per the comments at the top of the
file
Part of work to remove racist language, this name change also improves on the
semantics of this variable name as it was not actually a list of permissible
images but rather a list of images that are required for e2e_node tests that
are to be pre-pulled so that they are available prior to running e2e tests.
Worth noting that this list of images is "union merged" with another list when
setting up e2e_node tests and as such there is the possibilty for overlap.
# Please enter the commit message for your changes. Lines starting
The node-kubelet-flaky e2e job that runs the the
`Node Performance Testing [Serial] [Slow] [Flaky]` e2e tests have been
flaking because of inconsistencies on the cpu manager checkpoint file.
This seems to be caused because the checkpoint file is deleted (which is
what needs to happen in order to change the CPU manager policy which is
used for these e2e tests) right after the e2e tests asserts that a pod
does not exist anymore.
However, after a pod is deleted, the CPU manager may still be cleaning
up the resources used by the pod which may result in the checkpoint file
being created.
Whenever this happened, the kubelet would panic if we then try to
subsequently change the CPU manager policy to "static" from "none" or
vice versa (this is done 4 times in these tests).
Signed-off-by: alejandrox1 <alarcj137@gmail.com>