For some reason when we send them to journald, many log lines are
consistently dropped as soon as the PLEG is started.
If we log directly to file, we don't have this problem. As a bonus, if
the tests crash, the kubelet logs will always be available since they
were already written; otherwise we normally wait until the end of the
test run to collect them from journald, meaning that we often end up
with empty logs.
Add e2e tests to cover the basic flows for the `full-pcpus-only` option:
negative flow to ensure rejection with proper error message, and
positive flow to verify the actual cpu allocation.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR https://github.com/kubernetes/kubernetes/pull/100041 updated
node-problem-detector to v0.8.7, but unfortunately we didn't update
also the image using in the e2e_node tests.
As result, the tests were failing like
E2eNode Suite: [sig-node] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial] SystemLogMonitor should generate node condition and events for corresponding errors
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:301
Timed out after 60.000s.
Expected success, but got an error:
<*errors.errorString | 0xc0011f2600>: {
s: "expected total number of events was 4, actual events counted was 7\nEvents
This in turn was one of the contributing factors in making the
pull-kubernetes-node-kubelet-serial lane constantly failing.
This patch updates the image used in the tests, fixing the failure.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The CPUManager graduated to beta a while ago (k8s 1.10?)
so let's get rid of the obsolete Alpha tag on its e2e tests.
Signed-off-by: Francesco Romani <fromani@redhat.com>
- verify memory manager data returned by `GetAllocatableResources`
- verify pod container memory manager data
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The apiserver and test suite in node e2e runs under the sshd daemon
that can limit the amount of files it can open. Set a higher limit
to address the issues.
Signed-off-by: Odin Ugedal <odin@uged.al>
Node e2e tests exceeding the global timeout are sent SIGINT, resulting
in no artifacts or console output. This will ignore the first SIGINT,
and since all children processes are being stopped due to SIGINT, we can
clean up before exiting.
Make sure to use SIGKILL so that the service is killed in a dirty way.
In case container runtime use "Restart=on-abnormal" in systemd, killing
with SIGTERM will not restart the service, as the kill looks intentional
and clean. This is used by cri-o by default.
Current test assumes that test pod is deleted when the test
namespace is deleted. However, namespace deletion is an asynchronous
operation. The pod may still be running and allocating hugepages
resources when next test case creates another pod that requests
the same hugepages resources. This can cause kubelet to fail the test
pod with this kind of error:
OutOfhugepages-2Mi: Node didn't have enough resource: hugepages-2Mi
requested: 6291456, used: 6291456, capacity: 10485760
Explicitly deleting test pod should fix this issue.