kubernetes

Author	SHA1	Message	Date
Danielle Lancashire	b970bb5fe0	e2e_node: Update GPU tests to reflect reality In older versions of Kubernetes (at least pre-0.19, it's the earliest this test will run unmodified on), Pods that depended on devices could be restarted after the device plugin had been removed. Currently however, this isn't possible, as during ContainerManager.GetResources(), we attempt to DeviceManager.GetDeviceRunContainerOptions() which fails as there's no cached endpoint information for the plugin type. This commit therefore breaks apart the existing test into two: - One active test that validates that assignments are maintained across restarts - One skipped test that validates the behaviour after GPUs have been removed, in case we decide that this is a bug that should be fixed in the future.	2021-09-06 19:03:15 +02:00
Danielle Lancashire	3884dcb909	e2e_node: run gpu pod long enough to become ready	2021-08-26 14:24:23 +02:00
Danielle Lancashire	7d7884c0e6	e2e_node: install gpu pod with PodClient Prior to this change, the pod was not getting scheduled on the node as we don't have a running scheduler in e2e_node. PodClient solves this problem by manually assigning the pod to the node.	2021-08-26 14:22:22 +02:00
Danielle Lancashire	0cc8af82a1	e2e_node: use upstream gpu installer The current GPU installer was built in 2017, from source that no longer exists in Kubernetes ([adding commit][1]. The image was built on 2017-06-13. Unfortunately, this installer no longer appears to work. When debugging on the same node type as used by test-infra, it failed to build the driver as the kernel sha was no longer available. This lead to needing to find a new way to install GPUs. The smallest logical change was switching to [cos-gpu-installer][2] . There is a newer version of this available on [googlesource][3] that I have not yet tested as it's not clear what the state of the project is, as I couldn't find docs outside of the source itself. We install things to the same location as previously to avoid needing extra downstream changes. There are a couple of weird issues here however, like needing to run the container twice to correctly update the LD Cache. [1]: `1e77594958/cluster/gce/gci/nvidia-gpus/Dockerfile` [2]: https://github.com/GoogleCloudPlatform/cos-gpu-installer [3]: https://cos.googlesource.com/cos/tools/+/refs/heads/master/src/cmd/cos_gpu_installer/	2021-08-26 14:09:45 +02:00
Stephen Augustus	481cf6fbe7	generated: Run hack/update-gofmt.sh Signed-off-by: Stephen Augustus <foo@auggie.dev>	2021-08-24 15:47:49 -04:00
Alexey Perevalov	bb81101570	podresource: do not export NUMA topology if it's empty If device plugin returns device without topology, keep it internaly as NUMA node -1, it helps at podresources level to not export NUMA topology, otherwise topology is exported with NUMA node id 0, which is not accurate. It's imposible to unveile this bug just by tracing json.Marshal(resp) in podresource client, because NUMANodes field ID has json property omitempty, in this case when ID=0 shown as emtpy NUMANode. To reproduce it, better to iterate on devices and just trace dev.Topology.Nodes[0].ID. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-08-24 15:38:21 +00:00
Shiming Zhang	86acf3bc8f	Fix nodeShutdownReason for node shutdown e2e	2021-08-24 15:42:53 +08:00
Kubernetes Prow Robot	499a1f99a9	Merge pull request #104489 from liggitt/signal-buffer Fix buffered signal channel go vet error	2021-08-20 14:53:58 -07:00
Jordan Liggitt	322bc82777	Fix buffered signal channel go vet error	2021-08-20 16:47:56 -04:00
Alexey Perevalov	461d8f51f0	e2e_node: Check for DynamicKubeletConfig properly Even DynamicKubeletConfig is deprecated it still used in e2e_node test. The bug is hidden by forcibly enabled option TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' if this option is not enabled setKubeletConfiguration tries to set kubelet config via apiserver interface and failed with timeout. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-08-20 15:17:58 +00:00
Antonio Ojea	0cd75e8fec	run hack/update-netparse-cve.sh	2021-08-20 10:42:09 +02:00
Kubernetes Prow Robot	40a9914801	Merge pull request #102916 from odinuge/serial-tests Ensure images are pulled after eviction tests	2021-08-17 11:41:13 -07:00
Elana Hashman	c69f55519e	Revert "E2E test for kubelet exit-on-lock-contention"	2021-08-11 10:45:46 -07:00
Imran Pochi	2c2661a411	e2e test: lock-file and exit-on-lock-contention This commit adds an e2e test for the kubelet flags `--lock-file` and `exit-on-lock-contention`. Eventually we would like to move them to the kubelet configuration file rather than flags. This test is based on the premise that whenever there is a lock contention of the lock file (e.g. /var/run/kubelet.lock), the running kubelet must terminate and the waiting for the lock on the lock file to be released before starting again. In this test we simulate that behaviour of a file contention. The test would try to acquire the lock on the lock file. Success of the test is determined kubelet health check when the lock is acquired by the test and passes when the lock on the lock file is released. Signed-off-by: Imran Pochi <imran@kinvolk.io>	2021-08-09 15:27:54 +05:30
Elana Hashman	d2ed3b28b7	Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update"	2021-08-06 08:38:56 -07:00
Kubernetes Prow Robot	d4179be611	Merge pull request #104183 from SergeyKanzhelev/SergeyKanzhelev-node-e2e-approver Add SergeyKanzhelev to node e2e test approvers	2021-08-05 20:55:28 -07:00
Kubernetes Prow Robot	4d87be3ec4	Merge pull request #104121 from dims/skip-node-e2e-test-for-recovering-from-ip-leak-with-docker Skip node e2e test for recovering from ip leak with docker/ubuntu	2021-08-05 16:36:46 -07:00
Sergey Kanzhelev	023f6a90db	Add SergeyKanzhelev to node e2e test approvers	2021-08-05 21:32:55 +00:00
Kubernetes Prow Robot	7f231f899b	Merge pull request #103883 from ehashman/slow-e2es Mark "update Node.Spec.ConfigSource" node e2es as slow	2021-08-05 14:10:37 -07:00
Kubernetes Prow Robot	01cd315f3e	Merge pull request #104106 from ehashman/ehashman-node-e2e-owners Add ehashman to node e2e test approvers	2021-08-05 08:18:49 -07:00
Kubernetes Prow Robot	3b84cc9e6b	Merge pull request #104075 from kerthcet/cleanup/revert-dynamickubeconfig-metric revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update	2021-08-05 08:18:40 -07:00
Davanum Srinivas	9351b57def	Skip node e2e test for recovering from ip leak with docker Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-08-05 07:11:07 -04:00
kerthcet	8cf10d9a20	set showHiddenMetricsForVersion=1.22 in dynamicKubeletConfiguration test Signed-off-by: kerthcet <kerthcet@gmail.com>	2021-08-05 01:04:54 +08:00
Elana Hashman	ac076838c8	Add ehashman to node e2e test approvers List of files raised by matthyx in SIG Node during the 2021-08-03 meeting.	2021-08-03 10:48:06 -07:00
Davanum Srinivas	3463c2dfa9	Skip NVidia GPU test in node e2e CI jobs for containerd and other runtimes Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-08-03 08:48:44 -04:00
Kubernetes Prow Robot	9ff3b7e744	Merge pull request #104047 from ehashman/fix-node-e2e-logs Log e2e-node kubelet output directly to file	2021-08-02 12:30:19 -07:00
Davanum Srinivas	dab19517e5	Explicitly restart kubelet to stabilize serial-containerd job Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-08-02 11:24:11 -04:00
Elana Hashman	a77f4f4c29	Log e2e-node kubelet output directly to file For some reason when we send them to journald, many log lines are consistently dropped as soon as the PLEG is started. If we log directly to file, we don't have this problem. As a bonus, if the tests crash, the kubelet logs will always be available since they were already written; otherwise we normally wait until the end of the test run to collect them from journald, meaning that we often end up with empty logs.	2021-07-30 15:35:42 -07:00
Ryan Phillips	163e4974b6	e2e node server: fix crash in log line	2021-07-30 12:36:00 -05:00
Elana Hashman	59a7cc12c9	Mark failing node serial tests as flaky Tracked in: - https://github.com/kubernetes/kubernetes/issues/103690 - https://github.com/kubernetes/kubernetes/issues/103691	2021-07-28 10:39:30 -07:00
Elana Hashman	93146048b4	Mark "update Node.Spec.ConfigSource" node e2es as slow - recover to last-known-good ConfigMap.KubeletConfigKey ~12m to run in CI, 13m locally - non-nil last-known-good to a new non-nil last-known-good ~24m to run in CI - recover to last-known-good ConfigMap ~12m to run in CI - state transitions ~8m to run in CI	2021-07-23 12:40:24 -07:00
Nabarun Pal	77afa53f9d	Add e2e testing manifest bundle to e2e_node test suite Ref: https://kubernetes.slack.com/archives/C0BP8PW9G/p1627003199187100?thread_ts=1626988113.184100&cid=C0BP8PW9G Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>	2021-07-23 09:49:33 +05:30
David Porter	3af4fe8c9b	Use pointer gomega comparison for UsageNanoCores	2021-07-22 01:08:36 -07:00
Kubernetes Prow Robot	ac8dca79af	Merge pull request #103566 from wzshiming/fix/e2e-dbus-config-path Fix dbus config path for GracefulNodeShutdown e2e	2021-07-15 12:39:14 -07:00
Kubernetes Prow Robot	4f9bfb39ad	Merge pull request #102169 from odinuge/rlimit-tests Ensure node-e2e-test can open enough files	2021-07-15 10:20:45 -07:00
Kubernetes Prow Robot	b55c980279	Merge pull request #102395 from odinuge/node_container_manager_test_skip_systemd Skip node container manager test on systemd	2021-07-09 13:26:54 -07:00
Kubernetes Prow Robot	617064d732	Merge pull request #101432 from swatisehgal/smtaware node: cpumanager: add options to reject non SMT-aligned workload	2021-07-08 21:04:53 -07:00
Francesco Romani	a2fb8b0039	smtalign: e2e: add tests Add e2e tests to cover the basic flows for the `full-pcpus-only` option: negative flow to ensure rejection with proper error message, and positive flow to verify the actual cpu allocation. Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
Shiming Zhang	5d80665b0a	Fix dbus config path for GracefulNodeShutdown e2e	2021-07-08 10:41:44 +08:00
Sascha Grunert	2d0f99fba1	Fix resource metrics e2e test Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-07-05 11:16:05 +02:00
Kubernetes Prow Robot	62503f254e	Merge pull request #103413 from mgutierrez98/refactor-whitelist-blacklist Refactored files containing whitelist/blacklist to allowlist/denylist…	2021-07-01 18:12:25 -07:00
Kubernetes Prow Robot	062bc359ca	Merge pull request #102444 from sanwishe/resourceStartTime Expose container start time in kubelet /metrics/resource endpoint	2021-07-01 14:27:51 -07:00
mgutierrez98	1cfbb0aa25	remove webhook.go to revert changes to conformance test	2021-07-01 20:24:46 +00:00
Kubernetes Prow Robot	044fd6fdf6	Merge pull request #99829 from palnabarun/migrate-to-go-embed Replace go-bindata with //go:embed	2021-06-30 10:37:03 -07:00
Lee Verberne	c11041ad99	Remove ShareProcessNamespace tags from e2e tests This feature became GA in 1.17 and feature gate removed in 1.19. It should run unconditionally.	2021-06-30 18:12:30 +02:00
Kubernetes Prow Robot	f2e47502fd	Merge pull request #103076 from wzshiming/fix/flake-gracefulnodeshutdown-dbus Fix the GracefulNodeShutdown e2e test running on dbus that refuses to manually start	2021-06-29 11:19:50 -07:00
Nabarun Pal	bbccf2ecb4	e2e-node: move to embedded test manifests Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>	2021-06-29 19:16:49 +05:30
Nabarun Pal	68b334d02b	test: setup embedded file sources for manifests Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>	2021-06-29 19:16:46 +05:30
Kubernetes Prow Robot	9866f9364e	Merge pull request #103112 from fromanirh/cpumanager-e2e-fixes e2e: node: remove obsolete AlphaFeature tag	2021-06-28 19:36:39 -07:00
Kubernetes Prow Robot	ee459b8969	Merge pull request #103265 from fromanirh/e2e-node-fix-npd e2e: node: fix npd test failures bumping image	2021-06-28 17:03:50 -07:00

... 5 6 7 8 9 ...

2341 Commits