Commit Graph

1980 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
15a60d1a19 Merge pull request #100180 from fromanirh/tm-e2e-fix-wait
e2e: TM: wait for SRIOV devices in pod scope tests
2021-06-23 11:42:10 -07:00
Kubernetes Prow Robot
af60bebde3 Merge pull request #97028 from knabben/e2e-restart-kubelet
Adding restart kubelet flag on e2e test
2021-06-22 21:00:09 -07:00
Kubernetes Prow Robot
2453f07e93 Merge pull request #102396 from odinuge/restart_test
Restart test: Kill container runtime with SIGKILL
2021-06-22 13:10:10 -07:00
Artyom Lukianov
681905706d e2e node: provide tests for memory manager pod resources metrics
- verify memory manager data returned by `GetAllocatableResources`
- verify pod container memory manager data

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-06-22 13:06:32 +03:00
Davanum Srinivas
7fcdbbef06 Switch to github.com/coreos/go-systemd/v22 and drop older package
- We use the new v22 module released on May 10
- We drop the unmaintained `github.com/coreos/pkg`

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-06-16 11:14:16 -04:00
Kubernetes Prow Robot
fa152d25d8 Merge pull request #102209 from odinuge/node-e2e-fix
Ignore first SIGINT in node-e2e tests
2021-06-15 11:31:23 -07:00
Kubernetes Prow Robot
4e7fc6df63 Merge pull request #100369 from wzshiming/fix/restart-dbus-for-graceful-node-shutdown
After DBus restarts, make GracefulNodeShutdown work again
2021-06-14 20:50:00 -07:00
Kubernetes Prow Robot
94707017e1 Merge pull request #102773 from bart0sh/PR0097-run_remote-report-error
run_remote: improve error reporting
2021-06-14 19:00:25 -07:00
Ed Bartosh
89284a1ba7 run_remote: improve error reporting
Included more info to the error message.
2021-06-10 14:34:05 +03:00
Giuseppe Scrivano
c98306a09e test: adjust summary test for cgroup v2
on cgroup v2 the reported metric is recursive for the entire and it
includes all the sub cgroups.

Adjust the test accordingly.

Closes: https://github.com/kubernetes/kubernetes/issues/99230

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-06-09 14:04:06 +02:00
Odin Ugedal
c0c9f1f318 Ignore first SIGINT in node-e2e tests
Node e2e tests exceeding the global timeout are sent SIGINT, resulting
in no artifacts or console output. This will ignore the first SIGINT,
and since all children processes are being stopped due to SIGINT, we can
clean up before exiting.
2021-06-09 10:12:05 +02:00
Odin Ugedal
2787e8c18c Kill container runtime with SIGKILL
Make sure to use SIGKILL so that the service is killed in a dirty way.
In case container runtime use "Restart=on-abnormal" in systemd, killing
with SIGTERM will not restart the service, as the kill looks intentional
and clean. This is used by cri-o by default.
2021-05-28 10:16:23 +02:00
Lennart Jern
507710b50f Update CNI plugins v0.9.1
ref: https://github.com/containernetworking/plugins/releases/tag/v0.9.1
Signed-off-by: Lennart Jern <lennart.jern@est.tech>
2021-05-26 11:02:04 +03:00
Ed Bartosh
38c56883f1 e2e: hugepages: delete test pod after the test
Current test assumes that test pod is deleted when the test
namespace is deleted. However, namespace deletion is an asynchronous
operation. The pod may still be running and allocating hugepages
resources when next test case creates another pod that requests
the same hugepages resources. This can cause kubelet to fail the test
pod with this kind of error:
  OutOfhugepages-2Mi: Node didn't have enough resource: hugepages-2Mi
  requested: 6291456, used: 6291456, capacity: 10485760

Explicitly deleting test pod should fix this issue.
2021-05-25 17:09:55 +03:00
Shiming Zhang
990d0949c4 Add test, after restart dbus, should be able to gracefully shutdown 2021-05-19 10:06:06 +08:00
Jordan Liggitt
4b45d0d921 Revert "Merge pull request 101888 from kolyshkin/update-runc-rc94"
This reverts commit b1b06fe0a4, reversing
changes made to 382a33986b.
2021-05-18 09:13:47 -04:00
Kubernetes Prow Robot
4d4b530114 Merge pull request #101903 from cynepco3hahue/e2e_remote_kernel_args
e2e node: make possible to add additional kernel arguments
2021-05-17 13:39:59 -07:00
Kubernetes Prow Robot
b1b06fe0a4 Merge pull request #101888 from kolyshkin/update-runc-rc94
vendor: bump runc to rc94
2021-05-17 09:43:30 -07:00
Kubernetes Prow Robot
f35e587087 Merge pull request #99899 from hasheddan/update-node-e2e-note
Update dependencies in local node test runner
2021-05-13 03:54:25 -07:00
Artyom Lukianov
93ff47b05b e2e node: make possible to add additional kernel arguments
Add an option to configure additional kernel arguments during
the setup of e2e node environment.

The example:
cos-stable1:
image_family: cos-89-lts # docker v19.03.6, deprecated after 2021-06-24
project: cos-cloud
metadata: "user-data<test/e2e_node/jenkins/cos-init-live-restore.yaml,gci-update-strategy=update_disabled"
kernel_arguments:
- "numa=fake=2"
machine: n1-standard-4

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-05-11 13:49:32 +03:00
Kubernetes Prow Robot
0e13f93c26 Merge pull request #101461 from cynepco3hahue/fix_race_condition_under_memory_manager_test
e2e node: fix the race condition under the memory manager test
2021-05-10 18:51:36 -07:00
Giuseppe Scrivano
a460aaf41d test: adjust number of expected page faults
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-10 17:34:49 -07:00
Kubernetes Prow Robot
70481591b3 Merge pull request #98629 from wzshiming/fix-pull-image-url
Fix pull empty image URL
2021-05-05 20:21:15 -07:00
Shiming Zhang
91beb10aa4 Fix flake for GracefulNodeShutdown e2e 2021-04-29 10:52:00 +08:00
Artyom Lukianov
79dbdbb4c1 e2e node: fix the race condition under the memory manager test
Wait for kubelet to be healthy after the dynamic update
of the kubelet configuration.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-04-25 15:12:12 +03:00
wangyx1992
fda7421f24 cleanup: replace x.Sub(time.Now()) with time.Until(x) in e2e test
Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>
2021-04-23 11:27:12 +08:00
Kubernetes Prow Robot
1b08dde41f Merge pull request #101191 from tanjing2020/container_manager_test
Agnhost image's progress name is called agnhost, not test-webserver
2021-04-22 08:59:53 -07:00
Kubernetes Prow Robot
032007e007 Merge pull request #101312 from harche/ContainerLogPath_fix
Add SELinux security context to ContainerLogPath test
2021-04-21 09:31:17 -07:00
Harshal Patil
df13eebfd0 Add SELinux security context to ContainerLogPath test
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2021-04-21 13:48:32 +05:30
Shiming Zhang
e08988ba16 Fix pull empty image URL
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-04-21 15:29:46 +08:00
Kubernetes Prow Robot
92aff21558 Merge pull request #95609 from fromanirh/tm-e2e-faster-delete
e2e: topology manager: use deletePodSync for faster delete
2021-04-20 04:14:33 -07:00
Shihang Zhang
925900317e allow multiple of --service-account-issuer 2021-04-19 09:54:11 -07:00
tanjing2020
fa3956844d The process name is called agnhost, not test-webserver, after the Agnhost image is started. 2021-04-16 18:05:00 +08:00
Kubernetes Prow Robot
dd72c4534c Merge pull request #97968 from saschagrunert/apparmor-host-check
Remove check for apparmor_parser in AppArmor host validation
2021-04-13 01:58:50 -07:00
Kubernetes Prow Robot
4959cd6339 Merge pull request #100671 from Niekvdplas/spelling-mistakes
Fixed several spelling mistakes
2021-04-09 05:19:45 -07:00
Kubernetes Prow Robot
e9d7247447 Merge pull request #99072 from cynepco3hahue/e2e_fix_memory_manager_tests
e2e: fix memory manager tests
2021-04-08 14:27:40 -07:00
Niekvdplas
fec272a7b2 Fixed several spelling mistakes 2021-03-30 23:02:09 +02:00
tanjing2020
d0882e69e2 Fix the wrong judgment of oom_score_adj 2021-03-24 16:13:20 +08:00
Francesco Romani
fc0955c26a e2e: topomgr: use deletePodSync for faster delete
Previously the code used to delete pods serially.
In this patch we factor out code to do that in parallel,
using goroutines.

This shaves some time in the e2e tm test run with no intended
changes in behaviour.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-18 13:28:16 +01:00
Francesco Romani
04f091790e e2e: TM: wait for SRIOV devices in pod scope tests
The Topology Manager e2e tests wants to run on real multi-NUMA system
and want to consume real devices supported by device plugins; SRIOV
devices happen to be the most commonly available of such devices.

The tests need to wait for resource availability before to actually
run the tests, or they will fail with a false negative, also relatively
hard to debug.

An optimization was added in commit 56106439cf to minimize the restarts,
speed up the execution and make a nasty, yet not fully understood, flake
with SRIOV device plugin much less likely.

Unfortunately the pod-scope tests were mistakenly left over.
This Patch fixes that.
CI lanes did NOT fail (and will not fail) because the CI machines aren't
multi NUMA nor expose SRIOV devices, so the relevant portion of the test
will just skip, avoiding the issue.

However, this resurfaces when running the testsuite on bare metal; this
is how we noticed.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-12 11:01:56 +01:00
Francesco Romani
d7a30e1b08 podresources: getallocatable: add feature gate
Add feature gate to disable the GetAllocatableResources API.
The feature gate isd alpha stage, disabled by default.

Add e2e test to demonstrate the behaviour with feature gate disabled.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:14:56 +01:00
Francesco Romani
16d5ac3689 node: e2e: docs and fix for teardownSRIOVConfig
Document why teardownSRIOVPod has to wait for all the containers
to be gone before to end, and why is important.

Additionally, change the code to wait for all the containers to be gone,
not just the first. This is both a little cleaner and a little safer,
even though it seems the current code caused no issues so far.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
adfff27279 node: e2e: run deleteSync in parallel
speedup the cleanup after testcases deleting pods in separate
goroutines.
The post-test cleanup stage must be done carefully since pod require
exclusive allocation - so pods must take all the steps to properly
cleanup the tests to avoid to pollute the environment, but
this has a negative effect on test duration (take longer).

Hence, we add safe speedups like doing pod deletions in parallel.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
9c69db3f04 e2e: node: add tests for GetAllocatableResources
Add e2e tests for the new GetAllocatableResources API.
The tests are added in the `podresources_test` suite
created previously in this series.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
4e7434028c e2e: node: bootstrap podresources tests
Start e2e tests for the existing List() API.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:35 +01:00
Kubernetes Prow Robot
40a411a61a Merge pull request #99912 from dims/capture-logs-from-containerd-installation-service
Capture logs from containerd-installation service
2021-03-08 22:53:38 -08:00
Kubernetes Prow Robot
0df8c69731 Merge pull request #99960 from knabben/fix-runtime-config
Enabling runtime config on E2E node tests
2021-03-08 16:28:00 -08:00
Amim Knabben
0341e4c2f3 Enabling runtime config on E2E node tests 2021-03-08 15:45:06 -05:00
Kubernetes Prow Robot
eb4dafb7f1 Merge pull request #99651 from umohnani8/cri
Move CRIContainerLogRotation to GA
2021-03-08 12:07:20 -08:00
Artyom Lukianov
cff9ecd317 e2e: fix memory manager tests
Restart the kubelet and wait for hugepages resources to appear
under the node.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-03-08 15:03:28 +02:00