Commit Graph

2381 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
1646ed8222
Merge pull request #116057 from bobbypage/nodee2elog
test: Add log artifact for ginkgo node e2e and tune default ginkgo flags
2023-03-01 16:55:16 -08:00
Kubernetes Prow Robot
0469455ff7
Merge pull request #116082 from mimowo/fix-oomkiller-test
Fix the flaky OOMKiller test by sleep at start
2023-02-28 14:53:37 -08:00
David Porter
e001884594 test: Add some default flags ginkgo flags for node e2e
Add the following ginkgo flags for each node e2e similar to the
existing hack/ginkgo-e2e.sh script.

* --no-color, colors aren't rendered properly in prow and make examining
  the log in text editors more difficult, so let's disable them.
  `hack/ginkgo-e2e.sh` (used for kind e2e tests) also disables them
  already.
* -v, enable verbose logs. This is needed so we get more detailed info
  even when the tests pass. This is useful so we can compare successful
  runs to failed runs.

Signed-off-by: David Porter <david@porter.me>
2023-02-28 00:24:40 -08:00
David Porter
e9ecdf3534 test: Emit ginkgo log for each node e2e
When running multiple node e2e with multiple machine images, the tests
are run separately for each node. The final build log has all of the
results for each of the hosts combined together which make debugging the
log difficult. To make it easier, emit a log for each host that was run.
This log will be written to the results directory and uploaded as an
artifact in prow jobs.

Signed-off-by: David Porter <david@porter.me>
2023-02-28 00:21:34 -08:00
David Porter
0980f026c9 test: Remove tests argument from node e2e image config
This was never being used, the only config that used it was deleted in
https://github.com/kubernetes/test-infra/pull/26017 so we don't need
this anymore, so let's delete it.

Signed-off-by: David Porter <david@porter.me>
2023-02-28 00:21:03 -08:00
Kubernetes Prow Robot
015e2fa20c
Merge pull request #115953 from pohly/lint-gomega
test: fixing + linting gomega usage
2023-02-27 00:56:20 -08:00
Michal Wozniak
36eef0600d Fix the flaky OOMKiller test by sleep at start 2023-02-27 08:15:46 +01:00
Kubernetes Prow Robot
10cdaefc1f
Merge pull request #116005 from gjkim42/fix-createStaticPod
Fix createStaticPod to not use container.RestartPolicy
2023-02-23 12:33:35 -08:00
Gunju Kim
f690a0ce41
Fix createStaticPod to not use container.RestartPolicy 2023-02-23 21:18:24 +09:00
Kubernetes Prow Robot
3702411ef9
Merge pull request #115926 from ffromani/e2e-node-remove-kubevirt-device-plugin
e2e: node remove: kubevirt device plugin
2023-02-23 04:13:35 -08:00
Patrick Ohly
41f23f52d0 test: fix ginkgolinter issues
All of these issues were reported by https://github.com/nunnatsa/ginkgolinter.
Fixing these issues is useful (several expressions get simpler, using
framework.ExpectNoError is better because it has additional support for
failures) and a necessary step for enabling that linter in our golangci-lint
invocation.
2023-02-22 19:36:05 +01:00
Francesco Romani
00b41334bf e2e: node: podresources: fix restart wait
Fix the waiting logic in the e2e test loop to wait
for resources to be reported again instead of making logic on the
timestamp. The idea is that waiting for resource availability
is the canonical way clients should observe the desired state,
and it should also be more robust than comparing timestamps,
especially on CI environments.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:55 +01:00
Francesco Romani
92e00203e0 e2e: node: unify sample device plugin utilities
Start to consolidate the sample device plugin utility
and constants in a central place, because we need
to use it in different e2e tests.

Having a central dependency is better than a maze of
entangled e2e tests depending on each other helpers.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:55 +01:00
Francesco Romani
871201ba64 e2e: node: remove kubevirt device plugin
The podresources e2e tests want to exercise the case on which a device
plugin doesn't report topology affinity. The only known device plugin
which had this requirement and didn't depend on specialized hardware
was the kubevirt device plugin, which was however deprecated after
we started using it.

So the e2e tests are now broken, and in any case they can't depend on
unmaintained and liable to be obsolete code.

To unblock the state and preserve some e2e signal, we switch to the
sample device plugin, which is a stub implementation and which is
managed in-tree, so we can maintain it and ensure it fits the e2e test
usecase.

This is however a regression, because e2e tests should try their hardest
to use real devices and avoid any mocking or faking.
The upside is that using a OS-neutral device plugin for the tests enables
us to run on all the supported platform (windows!) so this could allow
us to transition these tests to conformance.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:22 +01:00
Francesco Romani
aa1a0385e2 e2e: node: podresources: internal cleanup
rename getPodResources for clarity. Allow to return error
(and not use ginkgo expectations), so it can actually be used
as intended inside `Eventually` blocks without blow up at the
first failure.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 13:49:13 +01:00
Michal Wozniak
fd28f69ca4 Add e2e_node test for oom killed container reason 2023-02-20 08:15:45 +01:00
Kubernetes Prow Robot
e18fa74551
Merge pull request #115590 from swatisehgal/topology-mgr-duration-metrics
node: topology-mgr: Add metric to measure topology manager admission latency
2023-02-15 07:12:25 -08:00
Swati Sehgal
cf21dcef51 node: topology-mgr: e2e: changes to validate admission latency metrics
The component was previously incorrect. This patch updates to
the correct component name.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 13:59:56 +00:00
ravisantoshgudimetla
d65262d1f9 Remove cgo dependency 2023-02-13 11:16:39 -05:00
Sascha Grunert
85106dc327
Allow SSH e2e node base64 key injection
With the change of the CRI-O jobs to use butane, we now have a
verification for base64 data urls in place. This means that the
following URL is invalid:

```
data:text/plain;base64,GCE_SSH_PUBLIC_KEY_FILE_CONTENT
```

This means we have to pass valid base64 to the URL. To fix that, we now
allow to inject SSH key values with both, the
`GCE_SSH_PUBLIC_KEY_FILE_CONTENT` field and its base64 encoded variant.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-02-09 16:17:11 +01:00
Patrick Ohly
136f89dfc5 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-02-06 15:39:13 +01:00
Patrick Ohly
9df3e2a47a e2e: replace WaitForPodToDisappear with WaitForPodNotFoundInNamespace
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.

Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
2023-02-06 15:39:12 +01:00
Antonio Ojea
7f5ae1c0c1
Revert "e2e: wait for pods with gomega" 2023-02-06 12:08:22 +01:00
Kubernetes Prow Robot
85aa0057c6
Merge pull request #113298 from pohly/e2e-wait-for-pods-with-gomega
e2e: wait for pods with gomega
2023-02-04 05:26:29 -08:00
Kubernetes Prow Robot
d415647739
Merge pull request #115441 from bobbypage/busybox-mirror-test
test: Use preloaded busybox image in mirror pod test
2023-02-01 12:21:36 -08:00
Kubernetes Prow Robot
3a4cef70f2
Merge pull request #115445 from bobbypage/gh-115381
test: Fix node e2e device plugin flake
2023-02-01 02:55:06 -08:00
Kubernetes Prow Robot
bb7c9739a3
Merge pull request #114759 from my-git9/chore/k8staint
chore: add k8s node-role.kubernetes.io/control-plane taint
2023-01-31 21:01:17 -08:00
David Porter
225658884b test: Fix node e2e device plugin flake
The device plugin test expects that no other pods are running prior to
the test starting. However, it has been observed that in some cases
some resources may still be around from previous tests. This is because
the deletion of resources from other tests is handled by deleting that
test's framework's namespace which is done asynchronously without
waiting for the other test's namespace to be deleted.

As a result, when the node e2e device plugin starts, there may still be
other pods in process of termination. To work around this, add a retry
to the device plugin test to account for the time it takes to delete the
resources from the prior test.

Signed-off-by: David Porter <david@porter.me>
2023-01-31 17:36:10 -08:00
David Porter
a3291a87d7 test: Use preloaded busybox image in mirror pod test
Instead of hardcoding the busybox image, use the one that is preloaded
during the test using imageutils.

Signed-off-by: David Porter <david@porter.me>
2023-01-31 13:34:13 -08:00
Patrick Ohly
222f655062 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-01-31 13:01:39 +01:00
Patrick Ohly
6eea1b2efa e2e: replace WaitForPodToDisappear with WaitForPodNotFoundInNamespace
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.

Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
2023-01-31 13:01:39 +01:00
Kubernetes Prow Robot
981c4d59fb
Merge pull request #115155 from adrianreber/2023-01-18-checkpoint-test-result
Extend checkpoint e2e test to check for results
2023-01-30 18:43:16 -08:00
Kubernetes Prow Robot
4df945853e
Merge pull request #115137 from swatisehgal/topologymgr-metrics
node: topologymgr: add metrics about admission requests and errors
2023-01-30 18:43:00 -08:00
David Porter
b96290c08f e2e node: Update runtime class handler skip logic
There are two runtime class tests which required the container runtime
config to include explicit configuration for `test-handler`. The current
logic skips these tests in non GCE environments. This skip is too strict
since the test is skipped in node e2e environments and in other
environments such as kind, which support running the test and also
configure `test-handler`.

Instead of skipping based on provider, add a new function
`NodeSupportsPreconfiguredRuntimeClassHandler` which examines the
underlying container runtime config and checks if the config includes
`test-handler`. The check is a bit brittle since it assumes container
runtime config paths, but it is a net improvement over skipping the test
entirely on non GCE environments.

This results in the test working in the common test environments, namely
GCE kube-up, node e2e, and kind.

Signed-off-by: David Porter <david@porter.me>
2023-01-24 14:43:24 -08:00
Kubernetes Prow Robot
765f2ef7c7
Merge pull request #114981 from adisky/revert
[Test] Revert "Fix:[Flake] [sig-node] Restart [Serial] [Slow] [Disruptive] K…
2023-01-24 03:58:15 -08:00
Adrian Reber
86b62b86d8
Extend checkpoint e2e test to check for results
When the e2e_node/checkpoint_container.go test was introduced no CRI
implementation supported the new CheckpointContainer RPC yet.

With the release of CRI-O 1.25 the CheckpointContainer is implemented
and the test has been extended to see if the content of the checkpoint
is as expected.

The test is skipped if the ContainerCheckpoint feature gate is disabled
or if the CRI implementation does not support the CheckpointContainer
RPC.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-01-23 18:07:35 +00:00
Swati Sehgal
340db7109d node: e2e: topologymgr: add tests for topology manager metrics
Add node e2e tests to verify population of topology metrics.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-01-19 14:40:37 +00:00
Swati Sehgal
51c6a1fbe7 node: e2e: cpumgr: Rename: s/getCPUManagerMetrics/getKubeletMetrics
Since we need to gather kubelet metrics for CPU Manager and Topology
Manager, renaming this function to a more generic name.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-01-19 14:18:05 +00:00
Aditi Sharma
d83c37c311 Update CNI version to 1.2.0
Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2023-01-18 13:24:40 +05:30
Aditi Sharma
0a7b2eeae0 Revert "Fix:[Flake] [sig-node] Restart [Serial] [Slow] [Disruptive] Kubelet should correctly account for terminated pods after restart"
This reverts commit 572360c5a5.
2023-01-11 15:05:49 +05:30
Ian K. Coolidge
cbb985a310 cpuset: Delete 'builder' methods
All usage of builder pattern is convertible to cpuset.New()
with the same or fewer lines of code.

Migrate Builder.Add to a private method of CPUSet, with a comment
that it is only intended for internal use to preserve immutable
propoerty of the exported interface.

This also removes 'require' library dependency, which avoids
non-standard library usage.
2023-01-06 23:32:51 +00:00
Ian K. Coolidge
f3829c4be3 cpuset: Rename 'NewCPUSet' to 'New' 2023-01-06 23:32:51 +00:00
Ian K. Coolidge
e5143d16c2 cpuset: Make 'ToSlice*' methods look like 'set' methods
In 'set', conversions to slice are done also, but with different names:

ToSliceNoSort() -> UnsortedList()
ToSlice() -> List()

Reimplement List() in terms of UnsortedList to save some duplication.
2023-01-06 23:32:51 +00:00
Ian K. Coolidge
a0c989b99a cpuset: Remove *Int64 methods
These are rarely used and can be accommodated with a trivial helper.
2023-01-06 23:32:51 +00:00
Ian K. Coolidge
67a057d4f2 cpuset: Remove 'MustParse' method
Removes exit/fatal from cpuset library.

Usage in podresources test was not necessary.

Library reference in cpu_manager_test was moved to a local function, and
converted to use e2e test framework error catching.
2023-01-06 23:32:51 +00:00
ZhangKe10140699
572360c5a5 Fix:[Flake] [sig-node] Restart [Serial] [Slow] [Disruptive] Kubelet should correctly account for terminated pods after restart 2023-01-05 09:10:57 +08:00
xin.li
10ca605cdd chroe: add k8s node-role.kubernetes.io/control-plane taint
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-01-02 21:04:43 +08:00
Antonio Ojea
e0a23577d2 pass context to gomega
Change-Id: Ibef02a52d8922984a09efa48361b9876fc91287c
2022-12-19 13:14:02 +00:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
Kubernetes Prow Robot
770b39c65b
Merge pull request #114072 from Tal-or/deflake_e2e_cpumanager_metrics_tests
e2e: cpumanager: proper test clean-up
2022-12-14 11:55:45 -08:00