Commit Graph

2148 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
c3e6b66643 Merge pull request #106533 from haircommander/summary-page-fault-test
test: update major page fault values for summary test
2021-11-23 15:09:45 -08:00
Kubernetes Prow Robot
e31aafc4fd Merge pull request #106348 from endocrimes/dani/rm-gpu
e2e_node: unify device tests
2021-11-22 19:46:16 -08:00
Jonathan Lebon
3ebd93cd02 test-e2e-node: support pure SSH mode
Right now, `run_remote.go` only supports GCE instances. But actually
running the tests is completely independent of GCE and could work just
as well on any SSH-accessible machine.

This patch adds a new `--mode` switch, which defaults to `gce` for
backwards compatibility, but can be set to `ssh`. In that mode, the GCE
API is not used at all, and we simply connect to the hosts given via
`--hosts`.

This is still better than `run_local.go` because the latter mixes build
environment with test environment, which doesn't fit well with
container-optimized operating systems.

This is part of an effort to setup the e2e node tests on Fedora CoreOS
(see https://github.com/coreos/fedora-coreos-tracker/issues/990).

Patch best viewed with whitespace ignored.
2021-11-22 10:13:15 -05:00
Jonathan Lebon
591f4cdb77 run_remote.go: factor out prepareGceImages()
Mostly a pure code move. Only changed the `klog.Fatalf` to `fmt.Errorf`.
Prep for future patch.
2021-11-22 10:12:29 -05:00
Jonathan Lebon
032dbd2063 run_remote.go: move registerGceHostIP() call to testImage()
I.e. don't assume that `testHost` is called on a GCE host. Prep for
future patch.
2021-11-22 10:12:28 -05:00
Jonathan Lebon
36233b985b run_remote.go: factor out registerGceHostIP()
Prep for future patch.
2021-11-22 10:12:28 -05:00
Kubernetes Prow Robot
21d3acc787 Merge pull request #106544 from ehashman/fix-flake-restart
Deflake "Kubelet should correctly account for terminated pods after restart"
2021-11-20 00:04:59 -08:00
Elana Hashman
6ddf86d422 Set startTimeout back to 3m, restore wait loop at end of test 2021-11-19 11:30:43 -08:00
Elana Hashman
b4a8861af3 Tweak resource requests for Kubelet restart test 2021-11-18 14:57:22 -08:00
Peter Hunt
76df8acb80 test: update major page fault values for summary test
as well as use a variable instead of a constant

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-11-18 09:24:41 -05:00
Kubernetes Prow Robot
3b9bd229b2 Merge pull request #106493 from endocrimes/dani/endocrimes-test-reviewer
node e2e: endocrimes as reviewer
2021-11-17 22:41:01 -08:00
Kubernetes Prow Robot
d766ab88f7 Merge pull request #106501 from ehashman/cri-graduation-v1
Make CRI v1 the default and allow a fallback to v1alpha2
2021-11-17 19:57:01 -08:00
Kubernetes Prow Robot
91b7fb4dc9 Merge pull request #102915 from wzshiming/feat/graceful-shutdown-based-on-pod-priority
Graceful Node Shutdown Based On Pod Priority
2021-11-17 18:45:03 -08:00
Sascha Grunert
de37b9d293 Make CRI v1 the default and allow a fallback to v1alpha2
This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-11-17 11:05:05 -08:00
Sergey Kanzhelev
b7affcced1 implement :grpc probe action 2021-11-17 17:31:23 +00:00
Danielle Lancashire
e60ad8ebc6 e2e_node: add endocrimes as reviewer 2021-11-17 15:35:35 +01:00
Shiming Zhang
7c656b55e4 Update shutdown cases 2021-11-17 11:47:12 +08:00
Shiming Zhang
df7e4c1a3d Add e2e for GracefulNodeShutdownBasedOnPodPriority 2021-11-17 11:47:12 +08:00
Elana Hashman
303b05cded Fix timeout flake in restart kubelet e2e 2021-11-15 13:42:58 -08:00
Kubernetes Prow Robot
159fcbb01e Merge pull request #106408 from cynepco3hahue/e2e_node_quota_isci_test_fix_panic_nil_pointer_exception
e2e_node: fix nil pointer exception under quota lsci test
2021-11-15 11:27:02 -08:00
Antonio Ojea
5eb584d1cb Node tests fixes (#106371)
* capture loop variable

* capture the loop variable and don't fail on not found errors

* capture loop variable

* Revert "Mark restart_test as flaky"

This reverts commit 990e9506de.

* skip e2e node restart test with dockershim

* Update test/e2e_node/restart_test.go

Co-authored-by: Mike Miranda <mikemp96@gmail.com>

* capture loop using index

Co-authored-by: Mike Miranda <mikemp96@gmail.com>
2021-11-14 19:54:47 -08:00
Artyom Lukianov
cf2f21dd3e e2e_node: fix nil pointer exception under quota lsci test
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-14 11:50:19 +02:00
Kubernetes Prow Robot
5f0a94b23c Merge pull request #104743 from gjkim42/ensure-pod-uniqueness
Ensure there is one running static pod with the same full name
2021-11-12 12:36:28 -08:00
Neha Lohia
fa1b6765d5 move pkg/util/node to component-helpers/node/util (#105347)
Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>
2021-11-12 07:52:27 -08:00
Kubernetes Prow Robot
0c27c643a8 Merge pull request #106362 from ehashman/append-to-node-log
Append node e2e logs to file where possible
2021-11-11 14:16:08 -08:00
Elana Hashman
5401551d12 Append node e2e logs to file where possible
Functionality added in systemd 240:
1977d1477f/NEWS (L3919-L3921)
2021-11-11 11:16:51 -08:00
Mike Miranda
990e9506de Mark restart_test as flaky 2021-11-11 17:25:27 +00:00
Danielle Lancashire
03de802434 e2e_node: unify device tests
The device_plugin_tests have not run successfully in a very long time,
initially being marked flaky and then eventually becoming stale.

The gpu_device_plugin_tests have been used to test the same behaviour,
but are incredibly high maintenance due to external changes in behaviour
from GCP/Nvidia that we have no control over.

This commit takes the existing device plugin tests, makes them look more
like the GPU tests, and removes the cases that have been unsupported for
a long time (namely restarting containers while the plugin is
unavailable).

It also removes the GPU plugin tests, as we do not get more signal by
using real devices here.
2021-11-11 14:10:27 +01:00
Kubernetes Prow Robot
f3bf7e1ced Merge pull request #106298 from SergeyKanzhelev/fetchShareProcessTestFromOrphans
fish out ShareProcessNamespace from orphans tab
2021-11-10 10:20:02 -08:00
Kubernetes Prow Robot
ea2011d72a Merge pull request #106251 from cynepco3hahue/e2e_node_fix_hugepages
e2e_node: does not rely on Kubelet automatic restart service under hugepages tests
2021-11-10 04:31:26 -08:00
Sergey Kanzhelev
d3dd1499fc fish out ShareProcessNamespace from orphans tab 2021-11-10 07:25:02 +00:00
Kubernetes Prow Robot
b27c41f66d Merge pull request #106263 from endocrimes/dani/skip-dkc
e2e_node: Skip dynamic config tests when the feature is disabled
2021-11-09 15:30:53 -08:00
Francesco Romani
bf9bab5bc6 e2e: podresources: wait for local node ready again
Let's wait for the local node (aka the kubelet)
to be ready before to query podresources again,
to avoid false negatives.

Co-authored-by: Artyom Lukianov <alukiano@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-11-09 19:02:19 +01:00
Danielle Lancashire
caa701b7a3 e2e_node: Skip dynamic config tests when disabled
DKC is being removed and we don't want it to continue flaking the rest
of our tests. Lets disable them when dkc is disabled rather than hard
failing. This fits more in line with our other E2Es, and reduces the
maintenance load in test-infra.
2021-11-09 13:40:18 +01:00
Francesco Romani
14105c09fb e2e: node: wait for kvm plugin removal
we need to make sure the system state is completely cleaned up
again, to avoid to mess up with the shared node state, before
we transition from one test to another.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-11-09 11:43:55 +01:00
Francesco Romani
4b46c3a0d2 e2e: node: podresources: fix exclusive cpus check
Since commit 42dd01aa3f the cpuRequest is in millicores, hence
we need to properly check translating to exclusive cpus
when verifying the resource allocation.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-11-09 11:16:54 +01:00
Francesco Romani
a6e8f7530a e2e: node: podresources: add internal helpers
the intent is to make the code more readable, no intended
changes in behaviour. Now it should be a bit more explicit
why the code is checking some values.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-11-09 11:16:54 +01:00
Artyom Lukianov
61fe924208 e2e_node: do not relay on Kubelet automatic restart service under hugepages tests
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-09 10:33:48 +02:00
Kubernetes Prow Robot
cda360c59f Merge pull request #104613 from ravisantoshgudimetla/reconcile-labels
[kubelet]: Reconcile OS and arch labels periodically
2021-11-08 14:15:19 -08:00
Artyom Lukianov
117141eee3 e2e_node: fix tests after Kubelet dynamic configuration removal
- CPU manager
- Memory Manager
- Topology Manager

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-08 09:42:24 +02:00
ravisantoshgudimetla
3af5d37be7 [node][e2e test]: Make sure reconcile labels is working fine 2021-11-06 19:21:58 -04:00
Kubernetes Prow Robot
adcd2feb5e Merge pull request #104153 from cynepco3hahue/e2e_node_provide_static_kubelet_config
e2e node: provide static kubelet config
2021-11-04 17:11:53 -07:00
Kubernetes Prow Robot
27d3a9ec57 Merge pull request #104481 from AlexeyPerevalov/E2eIsKubeletConfiguration
e2e_node: Properly check for DynamicKubeletConfig
2021-11-04 16:11:53 -07:00
Artyom Lukianov
50fdcdfc59 e2e_node: refactor code to use a single method to update the kubelet config
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:44:35 +02:00
Artyom Lukianov
ca35bdb403 e2e_node: remove DynamicKubeletConfig tests from serial lane
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:26:19 +02:00
Artyom Lukianov
b6211657bf e2e_node: drop usage of DynamicKubeletConfig
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:26:19 +02:00
Artyom Lukianov
a5ed6c824a e2e_node: provide methods to update kubelet config via file
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:26:19 +02:00
David Porter
ddd0d8a3da test: fixes for graceful node shutdown test
* Bump the pod status and node status update timeouts to avoid flakes
* Add a small delay after dbus restart to ensure dbus has enough time to
  restart to startup prior to sending shutdown signal
* Change check of pod being terminated by graceful shutdown. Previously,
  the pod phase was checked to see if it was `Failed` and the pod reason
  string matched. This logic needs to change after 1.22 graceful node
  shutdown change introduced in PR #102344 which changed behavior to no
  longer put the pods into a failed phase. Instead, the test now checks
  that containers are not ready, and the pod status message and reason
  are set appropriately.

Signed-off-by: David Porter <david@porter.me>
2021-11-03 18:40:26 -07:00
Kubernetes Prow Robot
b489b03946 Merge pull request #105575 from endocrimes/dani/cleanup-launcher
Allow the e2e_node runner to receive a KubeletConfiguration rather than requiring flags
2021-11-02 18:00:10 -07:00
Kubernetes Prow Robot
359b722c19 Merge pull request #102882 from fromanirh/device-manager-checkpoints
devicemanager: checkpoint: support pre-1.20 data
2021-11-02 16:56:57 -07:00