Commit Graph

2854 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
3e9a73d558 Merge pull request #126058 from AnishShah/patch-2
Deflake kubernetes-node-swap-fedora-serial jobs
2024-07-22 15:48:42 -07:00
Kubernetes Prow Robot
d21b17264e Merge pull request #125488 from pohly/dra-1.31
DRA for 1.31
2024-07-22 11:45:55 -07:00
Patrick Ohly
d11b58efe6 DRA kubelet: refactor gRPC call timeouts
Some of the E2E node tests were flaky. Their timeout apparently was chosen
under the assumption that kubelet would retry immediately after a failed gRPC
call, with a factor of 2 as safety margin. But according to
0449cef8fd,
kubelet has a different, higher retry period of 90 seconds, which was exactly
the test timeout. The test timeout has to be higher than that.

As the tests don't use the gRPC call timeout anymore, it can be made
private. While at it, the name and documentation gets updated.
2024-07-22 18:09:34 +02:00
Patrick Ohly
0b62bfb690 DRA e2e: adapt to v1alpha3 API 2024-07-22 18:09:34 +02:00
Itamar Holder
a6df16af85 node e2e test: exclude critical pods from swapping
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-07-22 17:56:52 +03:00
Anish Shah
665df5794e wait for pod to be ready before continuing with the test
This test is flaky. I have noticed that this happens because the pod is not READY when it is being deleted at the end of the test. This fix ensures that the pod is READY before continuing with the rest of the test.
2024-07-22 05:26:59 +00:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot
f2428d66cc Merge pull request #125163 from pohly/dra-kubelet-api-version-independent-no-rest-proxy
DRA: make kubelet independent of the resource.k8s.io API version
2024-07-18 17:47:48 -07:00
Patrick Ohly
616a014347 DRA: move ResourceSlice publishing into DRA drivers
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).

The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.

As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.

While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2024-07-18 09:09:19 +02:00
Patrick Ohly
3d4bc44a2f dra e2e node: addd test case for ResourceSlice handling during kubelet startup
Any redundant object must get deleted, but not the ones of other names.
2024-07-18 09:09:19 +02:00
Kubernetes Prow Robot
b68a58d372 Merge pull request #126141 from Nordix/esotsal/fix-126135
test/e2e_node:  Fix pod_resize tests in CI
2024-07-17 16:29:25 -07:00
Peter Hunt
3d8cb4fa89 e2e_node: loosen proc mount test
the exact number of lines/ro lines is not important, just that there are more than 0 ro lines
and more than 1 line total.

this helps accomodate different architectures that implement different kernel APIs

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2024-07-17 13:26:23 -04:00
Kubernetes Prow Robot
ad72be434d Merge pull request #125417 from bitoku/splitfs
KEP-4191: Split Image Filesystem add end-to-end tests
2024-07-16 23:27:06 -07:00
Sotiris Salloumis
3a01281d2f test/e2e_node: pod_resize tests
add NodeAlphaFeature label, as the feature is in alpha to be skipped in CI
add missing Arm64 check
2024-07-17 07:55:44 +02:00
Kubernetes Prow Robot
a00c834ebf Merge pull request #123303 from haircommander/proc-mount-e2e-tests
KEP-4265: add e2e tests for ProcMountType
2024-07-16 19:37:05 -07:00
Peter Hunt
a20a8225cf e2e_node: skip proc mount tests on nodes without userns support in the runtime
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Co-authored-by: Sohan Kunkerkar <sohank2602@gmail.com>
2024-07-16 17:46:23 -04:00
Peter Hunt
d6ee9ca860 test/e2e_node: add proc mount tests
including one Alpha only test, as the feature is in alpha

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Co-authored-by: Sohan Kunkerkar <sohank2602@gmail.com>
2024-07-16 17:45:26 -04:00
Kubernetes Prow Robot
157f4b94d8 Merge pull request #125753 from SergeyKanzhelev/devicePluginFailuresTests
device plugin failure tests
2024-07-16 04:36:59 -07:00
Kubernetes Prow Robot
bfffd43108 Merge pull request #124296 from Nordix/esotsal/e2e_node_pod_resize_test
Add Pod Resize Node E2E test using framework in test/e2e_node
2024-07-15 19:27:23 -07:00
Kubernetes Prow Robot
2263f2d719 Merge pull request #124148 from cyclinder/add_flag_kubelet
kubelet: Add a TopologyManager policy option: max-allowable-numa-nodes
2024-07-15 19:27:16 -07:00
Kubernetes Prow Robot
5427708866 Merge pull request #125404 from mimowo/fix-kubelet-podip
Fix that PodIP field is temporarily removed for a terminal pod
2024-07-15 16:41:10 -07:00
Kubernetes Prow Robot
48eef1fc4f Merge pull request #125867 from zhifei92/fix-e2e-node-density
Fix the bug related to cleaning up density test pods
2024-07-15 11:55:09 -07:00
Davanum Srinivas
133c4290c7 Fix for OOMKiller test consistently failing in EC2 cgroupv1 serial jobs
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-07-13 18:44:15 -04:00
Michal Wozniak
5f1ab75d27 Fix that PodIP field is not set for terminal pod 2024-07-12 21:36:12 +02:00
Davanum Srinivas
2db4c4aaab Set ginkgo time if not specified explicitly
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-07-12 11:33:22 -04:00
Kevin Hannon
950781a342 add e2e tests for split filesystem
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2024-07-12 14:19:17 +00:00
Sotiris Salloumis
99f90934b4 Add Pod Resize Node E2E test using framework in test/e2e_node 2024-07-12 15:53:53 +02:00
zhifei92
115092b374 fix(e2e_node): density cleanup pods 2024-07-11 15:39:52 +08:00
Sergey Kanzhelev
541f2af78d device plugin failure tests 2024-07-10 20:14:59 +00:00
Kubernetes Prow Robot
672af9406e Merge pull request #125981 from dims/cleanup-pods-after-test-runs
[e2e-node] Cleanup pods after the test runs
2024-07-09 15:01:01 -07:00
Davanum Srinivas
f6836df520 [e2e-node] Cleanup pods after the test runs
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-07-09 16:53:28 -04:00
Kubernetes Prow Robot
4a214f6ad9 Merge pull request #125461 from mimowo/pod-disruption-conditions-ga
Graduate PodDisruptionConditions to stable
2024-07-09 11:08:13 -07:00
cyclinder
87129c350a kubelet: Add a TopologyManager policy options: "max-allowable-numa-nodes"
Signed-off-by: cyclidner <kuocyclinder@gmail.com>
2024-07-09 22:26:24 +08:00
Davanum Srinivas
2dccf29f33 Fix for Merged kubelet config does not match the expected configuration in cgroupv1 based jobs
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2024-07-03 10:54:09 -04:00
Sascha Grunert
2df920120a Fix kubelet AppArmor rejection test
The corresponding e2e test needs to be adjusted side by side to the
merged PR: https://github.com/kubernetes/kubernetes/pull/125776.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2024-07-03 10:54:22 +02:00
Kubernetes Prow Robot
ac9aec9f9b Merge pull request #125116 from pohly/dra-one-of-source
DRA: remove "source" indirection from v1 Pod API
2024-06-28 12:46:45 -07:00
Michal Wozniak
780191bea6 review remarks for graduating PodDisruptionConditions 2024-06-28 17:32:27 +02:00
Michal Wozniak
bf0c9885a4 Graduate PodDisruptionConditions to stable 2024-06-28 16:36:51 +02:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Kubernetes Prow Robot
25a43070ee Merge pull request #123468 from ffromani/fix-mm-metrics-test
node: memory manager: fix the metrics tests
2024-06-26 12:00:45 -07:00
Kubernetes Prow Robot
b29dce0757 Merge pull request #125627 from yt-huang/clean-up
drop deprecated PollWithContext and adopt PollUntilContextTimeout ins…
2024-06-26 10:58:55 -07:00
Kubernetes Prow Robot
2200f5ef1b Merge pull request #125446 from AkihiroSuda/rro-e2e-remove-withserial
e2e_node/mount_rro_linux_test.go: remove unneeded WithSerial
2024-06-25 14:18:12 -07:00
Kubernetes Prow Robot
0913b90809 Merge pull request #125402 from iholder101/swap/skip-e2e-test-if-no-swap
[KEP-2400]: Swap e2e tests: skip swap stress tests if swap is not provisioned
2024-06-25 14:17:58 -07:00
Francesco Romani
5b6fe2f8db e2e: node: ensure no pod leaks in the container_manager test
During the debugging of https://github.com/kubernetes/kubernetes/pull/123468
it became quite evident there are unexpected pods, leftovers from
the container_manager_test. But we need stronger isolation among test
to have good signal, so we add these safeguards (xref:
https://github.com/kubernetes/kubernetes/pull/123468#issuecomment-1977977609
)

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-06-25 07:40:41 +02:00
yintong.huang
2db1b321e0 drop deprecated PollWithContext and adopt PollUntilContextTimeout instead
Signed-off-by: yintong.huang <yintong.huang@daocloud.io>
2024-06-21 19:23:31 +08:00
Kubernetes Prow Robot
0c955f7cbb Merge pull request #124617 from bart0sh/PR144-e2e_node-DRA-test-plugin-failures
e2e_node: DRA: test plugin failures
2024-06-18 01:14:19 -07:00
Francesco Romani
a5d771c911 node: memory manager: fix the mm metrics test
fixes for the memory manager tests by correctly restoring
the kubelet config after each test. We need to do before all
the related tests run, in order to make sure to restore the
correct values.

Add more debug facilities to troubleshoot further failures.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-06-17 08:51:44 +02:00
Francesco Romani
086a500d8f e2e: node: use test owner tracking
e2e_node test depend on very specific shared state (node state).
Pod leakages between tests oftentimes cause the test preconditions
to be silently corrupted, causing hard to debug CI failures.

Use the new facility to annotate pods with test owner (= the
test code which created the test) to help debug these failures.

For more context, please check the conversation in #123468

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-06-14 13:42:14 +02:00
Akihiro Suda
8a5e476582 e2e_node/mount_rro_linux_test.go: remove unneeded WithSerial
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2024-06-12 01:41:44 +09:00
Itamar Holder
37d80518d2 skip swap stress tests if swap is not provisioned
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-06-10 10:05:17 +03:00