kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	8ee9b82b10	Merge pull request #115984 from tzneal/init-container-tests add more init container testing	2023-03-08 15:42:08 -08:00
Todd Neal	123ab80333	add more init container lifetime testing Add some additional init container tests that work via monitoring container lifetime based on logs written to a common file. This allows more easily writing assertions about the container lifetimes with respect to one another.	2023-03-08 14:39:10 -06:00
David Porter	9c20cee504	Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"	2023-03-07 11:50:52 -08:00
David Porter	d3214226de	test: Fix node e2e shutdown test flake Bump the timeout as the previous timeout was sometimes too short, resulting in the pod status update not sent. Also, fixed a typo in previous refactor. Signed-off-by: David Porter <david@porter.me>	2023-03-06 15:38:45 -08:00
Swati Sehgal	01a9148887	node: device-mgr: e2e: adapt to sample device plugin refactoring These updates are to adapt to the sample device plugin refactoring done here: `92e00203e0`. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 12:15:59 +00:00
Swati Sehgal	bae8a164e0	node: device-mgr: e2e: address e2e test review comments Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 12:15:58 +00:00
Swati Sehgal	674879a959	node: device-mgr: e2e: Update the e2e test to reproduce issue:109595 Breakdown of the steps implemented as part of this e2e test is as follows: 1. Create a file `registration` at path `/var/lib/kubelet/device-plugins/sample/` 2. Create sample device plugin with an environment variable with `REGISTER_CONTROL_FILE=/var/lib/kubelet/device-plugins/sample/registration` that waits for a client to delete the control file. 3. Trigger plugin registeration by deleting the abovementioned directory. 4. Create a test pod requesting devices exposed by the device plugin. 5. Stop kubelet. 6. Remove pods using CRI to ensure new pods are created after kubelet restart. 7. Restart kubelet. 8. Wait for the sample device plugin pod to be running. In this case, the registration is not triggered. 9. Ensure that resource capacity/allocatable exported by the device plugin is zero. 10. The test pod should fail with `UnexpectedAdmissionError` 11. Delete the test pod. 12. Delete the sample device plugin pod. 13. Remove `/var/lib/kubelet/device-plugins/sample/` and its content, the directory created to control registration Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 12:15:58 +00:00
Swati Sehgal	db7afc1cd8	node: device-mgr: e2e: Implement End to end test This commit reuses e2e tests implmented as part of https://github.com/kubernetes/kubernetes/pull/110729. The commit is borrowed from the aforementioned PR as is to preserve authorship. Subsequent commit will update the end to end test to simulate the problem this PR is trying to solve by reproducing the issue: 109595. Co-authored-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
Kubernetes Prow Robot	20c3a007f5	Merge pull request #115693 from bobbypage/shutdown_test test: e2e node shutdown test logging improvements	2023-03-03 15:20:57 -08:00
David Porter	8647c23c11	test: Fix path to e2e node sample device plugin The existing path is incorrect (missing `sample-device-plugin`) directory and thus causing test failures. The full path should be `test/e2e/testing-manifests/sample-device-plugin/sample-device-plugin.yaml`. Signed-off-by: David Porter <david@porter.me>	2023-03-02 19:22:59 -08:00
Kubernetes Prow Robot	78e5db0931	Merge pull request #115107 from swatisehgal/handle-device-mgr-recovery-sample-dp-changes node: device-mgr: sample device plugin: Add support to control registration process	2023-03-02 05:42:55 -08:00
Kubernetes Prow Robot	59a7e34052	Merge pull request #115442 from bobbypage/unknown_pods_test test: Add e2e node test to check for unknown pods	2023-03-01 19:08:55 -08:00
Kubernetes Prow Robot	1646ed8222	Merge pull request #116057 from bobbypage/nodee2elog test: Add log artifact for ginkgo node e2e and tune default ginkgo flags	2023-03-01 16:55:16 -08:00
Swati Sehgal	7ea35d0cd8	node: device-mgr: sample device plugin: manifest to avoid registration Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-01 10:01:34 +00:00
Kubernetes Prow Robot	0469455ff7	Merge pull request #116082 from mimowo/fix-oomkiller-test Fix the flaky OOMKiller test by sleep at start	2023-02-28 14:53:37 -08:00
David Porter	e001884594	test: Add some default flags ginkgo flags for node e2e Add the following ginkgo flags for each node e2e similar to the existing hack/ginkgo-e2e.sh script. * --no-color, colors aren't rendered properly in prow and make examining the log in text editors more difficult, so let's disable them. `hack/ginkgo-e2e.sh` (used for kind e2e tests) also disables them already. * -v, enable verbose logs. This is needed so we get more detailed info even when the tests pass. This is useful so we can compare successful runs to failed runs. Signed-off-by: David Porter <david@porter.me>	2023-02-28 00:24:40 -08:00
David Porter	e9ecdf3534	test: Emit ginkgo log for each node e2e When running multiple node e2e with multiple machine images, the tests are run separately for each node. The final build log has all of the results for each of the hosts combined together which make debugging the log difficult. To make it easier, emit a log for each host that was run. This log will be written to the results directory and uploaded as an artifact in prow jobs. Signed-off-by: David Porter <david@porter.me>	2023-02-28 00:21:34 -08:00
David Porter	0980f026c9	test: Remove tests argument from node e2e image config This was never being used, the only config that used it was deleted in https://github.com/kubernetes/test-infra/pull/26017 so we don't need this anymore, so let's delete it. Signed-off-by: David Porter <david@porter.me>	2023-02-28 00:21:03 -08:00
Kubernetes Prow Robot	015e2fa20c	Merge pull request #115953 from pohly/lint-gomega test: fixing + linting gomega usage	2023-02-27 00:56:20 -08:00
Michal Wozniak	36eef0600d	Fix the flaky OOMKiller test by sleep at start	2023-02-27 08:15:46 +01:00
Kubernetes Prow Robot	10cdaefc1f	Merge pull request #116005 from gjkim42/fix-createStaticPod Fix createStaticPod to not use container.RestartPolicy	2023-02-23 12:33:35 -08:00
Gunju Kim	f690a0ce41	Fix createStaticPod to not use container.RestartPolicy	2023-02-23 21:18:24 +09:00
Kubernetes Prow Robot	3702411ef9	Merge pull request #115926 from ffromani/e2e-node-remove-kubevirt-device-plugin e2e: node remove: kubevirt device plugin	2023-02-23 04:13:35 -08:00
Patrick Ohly	41f23f52d0	test: fix ginkgolinter issues All of these issues were reported by https://github.com/nunnatsa/ginkgolinter. Fixing these issues is useful (several expressions get simpler, using framework.ExpectNoError is better because it has additional support for failures) and a necessary step for enabling that linter in our golangci-lint invocation.	2023-02-22 19:36:05 +01:00
Francesco Romani	00b41334bf	e2e: node: podresources: fix restart wait Fix the waiting logic in the e2e test loop to wait for resources to be reported again instead of making logic on the timestamp. The idea is that waiting for resource availability is the canonical way clients should observe the desired state, and it should also be more robust than comparing timestamps, especially on CI environments. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-02-22 14:04:55 +01:00
Francesco Romani	92e00203e0	e2e: node: unify sample device plugin utilities Start to consolidate the sample device plugin utility and constants in a central place, because we need to use it in different e2e tests. Having a central dependency is better than a maze of entangled e2e tests depending on each other helpers. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-02-22 14:04:55 +01:00
Francesco Romani	871201ba64	e2e: node: remove kubevirt device plugin The podresources e2e tests want to exercise the case on which a device plugin doesn't report topology affinity. The only known device plugin which had this requirement and didn't depend on specialized hardware was the kubevirt device plugin, which was however deprecated after we started using it. So the e2e tests are now broken, and in any case they can't depend on unmaintained and liable to be obsolete code. To unblock the state and preserve some e2e signal, we switch to the sample device plugin, which is a stub implementation and which is managed in-tree, so we can maintain it and ensure it fits the e2e test usecase. This is however a regression, because e2e tests should try their hardest to use real devices and avoid any mocking or faking. The upside is that using a OS-neutral device plugin for the tests enables us to run on all the supported platform (windows!) so this could allow us to transition these tests to conformance. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-02-22 14:04:22 +01:00
Francesco Romani	aa1a0385e2	e2e: node: podresources: internal cleanup rename getPodResources for clarity. Allow to return error (and not use ginkgo expectations), so it can actually be used as intended inside `Eventually` blocks without blow up at the first failure. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-02-22 13:49:13 +01:00
Michal Wozniak	fd28f69ca4	Add e2e_node test for oom killed container reason	2023-02-20 08:15:45 +01:00
Kubernetes Prow Robot	e18fa74551	Merge pull request #115590 from swatisehgal/topology-mgr-duration-metrics node: topology-mgr: Add metric to measure topology manager admission latency	2023-02-15 07:12:25 -08:00
Swati Sehgal	cf21dcef51	node: topology-mgr: e2e: changes to validate admission latency metrics The component was previously incorrect. This patch updates to the correct component name. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-02-15 13:59:56 +00:00
ravisantoshgudimetla	d65262d1f9	Remove cgo dependency	2023-02-13 11:16:39 -05:00
David Porter	826472c99d	test: e2e node shutdown test logging improvements Since the pod names are reused across the test, searching the logs is currently difficult. Use a uuid for each pod name to make grepping the logs easier. Also, always include the pod name and pod namespace in any logs or error messages to make debugging easier. Signed-off-by: David Porter <david@porter.me>	2023-02-10 16:54:31 -08:00
Sascha Grunert	85106dc327	Allow SSH e2e node base64 key injection With the change of the CRI-O jobs to use butane, we now have a verification for base64 data urls in place. This means that the following URL is invalid: ``` data:text/plain;base64,GCE_SSH_PUBLIC_KEY_FILE_CONTENT ``` This means we have to pass valid base64 to the URL. To fix that, we now allow to inject SSH key values with both, the `GCE_SSH_PUBLIC_KEY_FILE_CONTENT` field and its base64 encoded variant. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2023-02-09 16:17:11 +01:00
Patrick Ohly	136f89dfc5	e2e: use error wrapping with %w The recently introduced failure handling in ExpectNoError depends on error wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then ExpectNoError cannot detect that the root cause is an assertion failure and then will add another useless "unexpected error" prefix and will not dump the additional failure information (currently the backtrace inside the E2E framework). Instead of manually deciding on a case-by-case basis where %w is needed, all error wrapping was updated automatically with sed -i "s/fmt.Errorf$.$: '$%s\\|%v$'\",$. err)$/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*) This may be unnecessary in some cases, but it's not wrong.	2023-02-06 15:39:13 +01:00
Patrick Ohly	9df3e2a47a	e2e: replace WaitForPodToDisappear with WaitForPodNotFoundInNamespace WaitForPodToDisappear was always called such that it listed all pods, which made it less efficient than trying to get just the one pod it was checking for. Being able to customize the poll interval in practice wasn't useful, therefore it can be replaced with WaitForPodNotFoundInNamespace.	2023-02-06 15:39:12 +01:00
Antonio Ojea	7f5ae1c0c1	Revert "e2e: wait for pods with gomega"	2023-02-06 12:08:22 +01:00
Kubernetes Prow Robot	85aa0057c6	Merge pull request #113298 from pohly/e2e-wait-for-pods-with-gomega e2e: wait for pods with gomega	2023-02-04 05:26:29 -08:00
David Porter	039a848274	test: Add e2e node test to check for unknown pods Unknown pods are pods which are unknown pods to the kubelet, but are still running in the container runtime. If kubelet detects a pod which is not in the config (i.e. not present in API-server or static pod), but running as detected in container runtime, kubelet should aggressively terminate the pod. This situation can be encountered if a pod is running, then kubelet is stopped, and while stopped, the manifest is deleted (by force deleting the API pod or deleting the static pod manifest), and then restarting the kubelet. Upon restart, kubelet will see the pod as running via the container runtime, but it will not be present in the config, thus making the pod a "unknown pod". Kubelet should then proceed to terminate these unknown pods. Add two tests that ensure that unknown pods will be terminated (1) static pods and (2) API pods. The test will start a pod, stop the kubelet, force delete the pod (by deleting the manifest or force deleting the pod), and then restarting the kubelet. The container runtime is then queried to ensure the containers are terminated by kubelet. Signed-off-by: David Porter <david@porter.me>	2023-02-03 23:04:45 -08:00
David Porter	c2923c472d	test: Move waitForAllContainerRemoval() into node e2e util This is used across multiple tests, so let's move into the util file. Also, refactor it a bit to provide a better error message in case of a failure. Signed-off-by: David Porter <david@porter.me>	2023-02-03 23:04:35 -08:00
Kubernetes Prow Robot	d415647739	Merge pull request #115441 from bobbypage/busybox-mirror-test test: Use preloaded busybox image in mirror pod test	2023-02-01 12:21:36 -08:00
Kubernetes Prow Robot	3a4cef70f2	Merge pull request #115445 from bobbypage/gh-115381 test: Fix node e2e device plugin flake	2023-02-01 02:55:06 -08:00
Kubernetes Prow Robot	bb7c9739a3	Merge pull request #114759 from my-git9/chore/k8staint chore: add k8s node-role.kubernetes.io/control-plane taint	2023-01-31 21:01:17 -08:00
David Porter	225658884b	test: Fix node e2e device plugin flake The device plugin test expects that no other pods are running prior to the test starting. However, it has been observed that in some cases some resources may still be around from previous tests. This is because the deletion of resources from other tests is handled by deleting that test's framework's namespace which is done asynchronously without waiting for the other test's namespace to be deleted. As a result, when the node e2e device plugin starts, there may still be other pods in process of termination. To work around this, add a retry to the device plugin test to account for the time it takes to delete the resources from the prior test. Signed-off-by: David Porter <david@porter.me>	2023-01-31 17:36:10 -08:00
David Porter	a3291a87d7	test: Use preloaded busybox image in mirror pod test Instead of hardcoding the busybox image, use the one that is preloaded during the test using imageutils. Signed-off-by: David Porter <david@porter.me>	2023-01-31 13:34:13 -08:00
Patrick Ohly	222f655062	e2e: use error wrapping with %w The recently introduced failure handling in ExpectNoError depends on error wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then ExpectNoError cannot detect that the root cause is an assertion failure and then will add another useless "unexpected error" prefix and will not dump the additional failure information (currently the backtrace inside the E2E framework). Instead of manually deciding on a case-by-case basis where %w is needed, all error wrapping was updated automatically with sed -i "s/fmt.Errorf$.$: '$%s\\|%v$'\",$. err)$/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*) This may be unnecessary in some cases, but it's not wrong.	2023-01-31 13:01:39 +01:00
Patrick Ohly	6eea1b2efa	e2e: replace WaitForPodToDisappear with WaitForPodNotFoundInNamespace WaitForPodToDisappear was always called such that it listed all pods, which made it less efficient than trying to get just the one pod it was checking for. Being able to customize the poll interval in practice wasn't useful, therefore it can be replaced with WaitForPodNotFoundInNamespace.	2023-01-31 13:01:39 +01:00
Kubernetes Prow Robot	981c4d59fb	Merge pull request #115155 from adrianreber/2023-01-18-checkpoint-test-result Extend checkpoint e2e test to check for results	2023-01-30 18:43:16 -08:00
Kubernetes Prow Robot	4df945853e	Merge pull request #115137 from swatisehgal/topologymgr-metrics node: topologymgr: add metrics about admission requests and errors	2023-01-30 18:43:00 -08:00
David Porter	b96290c08f	e2e node: Update runtime class handler skip logic There are two runtime class tests which required the container runtime config to include explicit configuration for `test-handler`. The current logic skips these tests in non GCE environments. This skip is too strict since the test is skipped in node e2e environments and in other environments such as kind, which support running the test and also configure `test-handler`. Instead of skipping based on provider, add a new function `NodeSupportsPreconfiguredRuntimeClassHandler` which examines the underlying container runtime config and checks if the config includes `test-handler`. The check is a bit brittle since it assumes container runtime config paths, but it is a net improvement over skipping the test entirely on non GCE environments. This results in the test working in the common test environments, namely GCE kube-up, node e2e, and kind. Signed-off-by: David Porter <david@porter.me>	2023-01-24 14:43:24 -08:00

1 2 3 4 5 ...

2397 Commits