This drops testfiles.ReadOrDie and updated testfiles.Exists to return an
error, forcing the caller to decide whether to call framework.Fail or do
something else.
It makes for a slightly less friendly API, but also means the package is
decoupled from framework again, as per the comments at the top of the
file
Part of work to remove racist language, this name change also improves on the
semantics of this variable name as it was not actually a list of permissible
images but rather a list of images that are required for e2e_node tests that
are to be pre-pulled so that they are available prior to running e2e tests.
Worth noting that this list of images is "union merged" with another list when
setting up e2e_node tests and as such there is the possibilty for overlap.
# Please enter the commit message for your changes. Lines starting
The node-kubelet-flaky e2e job that runs the the
`Node Performance Testing [Serial] [Slow] [Flaky]` e2e tests have been
flaking because of inconsistencies on the cpu manager checkpoint file.
This seems to be caused because the checkpoint file is deleted (which is
what needs to happen in order to change the CPU manager policy which is
used for these e2e tests) right after the e2e tests asserts that a pod
does not exist anymore.
However, after a pod is deleted, the CPU manager may still be cleaning
up the resources used by the pod which may result in the checkpoint file
being created.
Whenever this happened, the kubelet would panic if we then try to
subsequently change the CPU manager policy to "static" from "none" or
vice versa (this is done 4 times in these tests).
Signed-off-by: alejandrox1 <alarcj137@gmail.com>
e2e_node tests trigger OOM events on COS versions > 73-11636-0-0
possibly because of this change in the COS v.73-11636-0-0:
Made containerd run as a standalone systemd service
OOM killer usually kills cadvisor and e2e_node.test processes
causing node-kubelet-benchmark failures.
Decreasing amount of pods from 105 to 90 frees enough memory for
the test to succeed.
Lowering the amount of cpu allocated to this workload will set the
resources allocated to be similar to the other npb and tf workload in
this tests.
This will also allow to run all three workloads in a n1-standard-12 gcp
instance - which has 16 cpus and 60 GB.
Signed-off-by: alejandrox1 <alarcj137@gmail.com>
commit 43c56eb403 introduced a change
where CPUAccounting, CPUAccounting and TasksAccounting are enabled for
the systemd service.
It causes a regression on RHEL 7.8 where systemd-run doesn't allow to
set TasksAccounting.
Since Delegate= already enables all the controllers, it is superfluous
to specify them.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>