Commit Graph

148 Commits

Author SHA1 Message Date
David Porter
1c75c2cda8
test: Add e2e to verify static pod termination
Add a node e2e to verify that if a static pod is terminated while the
container runtime or CRI returns an error, the pod is eventually
terminated successfully.

This test serves as a regression test for k8s.io/issue/113145 which
fixes an issue where force deleted pods may not be terminated if the
container runtime fails during a `syncTerminatingPod`.

To test this behavior, start a static pod, stop the container runtime,
and later start the container runtime. The static pod is expected to
eventually terminate successfully.

To start and stop the container runtime, we need to find the container
runtime systemd unit name. Introduce a util function
`findContainerRuntimeServiceName` which finds the unit name by getting
the pid of the container runtime from the existing
`ContainerRuntimeProcessName` flag passed into node e2e and using
systemd dbus `GetUnitNameByPID` function to convert the pid of the
container runtime to a unit name. Using the unit name, introduce helper
functions to start and stop the container runtime.

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
Kubernetes Prow Robot
59a7e34052
Merge pull request #115442 from bobbypage/unknown_pods_test
test: Add e2e node test to check for unknown pods
2023-03-01 19:08:55 -08:00
Patrick Ohly
136f89dfc5 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-02-06 15:39:13 +01:00
Antonio Ojea
7f5ae1c0c1
Revert "e2e: wait for pods with gomega" 2023-02-06 12:08:22 +01:00
David Porter
c2923c472d test: Move waitForAllContainerRemoval() into node e2e util
This is used across multiple tests, so let's move into the util file.

Also, refactor it a bit to provide a better error message in case of a
failure.

Signed-off-by: David Porter <david@porter.me>
2023-02-03 23:04:35 -08:00
Patrick Ohly
222f655062 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-01-31 13:01:39 +01:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
Swagat Bora
caa83c25ae Support otel tracing in cri remote image service
Signed-off-by: Swagat Bora <sbora@amazon.com>
2022-09-29 22:15:07 +00:00
Sally O'Malley
47e7d8034f
kubelet tracing
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Co-authored-by: David Ashpole <dashpole@google.com>
2022-08-01 12:55:02 -04:00
Dave Chen
857458cfa5 update ginkgo from v1 to v2 and gomega to 1.19.0
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters

Signed-off-by: Dave Chen <dave.chen@arm.com>
2022-07-08 10:44:46 +08:00
Francesco Romani
23147ff4b3 e2e: node: devplugin: tolerate node readiness flip
In the AfterEach check of the e2e node device plugin tests,
the tests want really bad to clean up after themselves:
- delete the sample device plugin
- restart again the kubelet
- ensure that after the restart, no stale sample devices
  (provided  by the sample device plugin) are reported anymore.

We observed that in the AfterEach block of these e2e tests
we have quite reliably a flip/flop of the kubelet readiness
state, possibly related to a race with/ a slow runtime/PLEG check.

What happens is that the kubelet readiness state is true,
but goes false for a quick interval and then goes true again
and it's pretty stable after that (observed adding more logs
to the check loop).

The key factor here is the function `getLocalNode` aborts the
test (as in `framework.ExpectNoError`) if the node state is
not ready. So any occurrence of this scenario, even if it
is transient, will cause a test failure. I believe this will
make the e2e test unnecessarily fragile without making it more
correct.

For the purpose of the test we can tolerate this kind of glitches,
with kubelet flip/flopping the ready state, granted that we meet
eventually the final desired condition on which the node reports
ready AND reports no sample devices present - which was the condition
the code was trying to check.

So, we add a variant of `getLocalNode`, which just fetches the
node object the e2e_node framework created, alongside to a flag
reporting the node readiness. The new helper does not make
implicitly the test abort if the node is not ready, just bubbles
up this information.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 14:22:25 +02:00
Stephen Benjamin
b351745c1c Replace use of Sprintf with net.JoinHostPort
On IPv6 clusters, one of the most frequent problems I encounter is
assumptions that one can build a URL with a host and port simply by
using Sprintf, like this:

```go
fmt.Sprintf("http://%s:%d/foo", host, port)
```

When `host` is an IPv6 address, this produces an invalid URL as it must
be bracketed, like this:

```
http://[2001:4860:4860::8888]:9443
```

This change fixes the occurences of joining a host and port with the
purpose built `net.JoinHostPort` function.

I encounter this problem often enough that I started to [write a linter
for it](https://github.com/stbenjam/go-sprintf-host-port).  I don't
think the linter is quite ready for wide use yet, but I did run it
against the Kube codebase and found these.  While the host portion in
some of these changes may always be an FQDN or IPv4 IP today, it's an
easy thing that can break later on.
2022-05-04 06:37:50 -04:00
Abhijit Hoskeri
49dc59873b e2e_node/{service,util}: use kubelet healthz port.
The readonly port could be disabled.

Since we are only using the /healthz endpoint,
we can use the healthz port for this.

Change-Id: Ie0e05a5ab4ec6f51e4d3c63226aa23c1b3a69956
2022-04-22 16:14:31 -07:00
ahrtr
fe95aa614c io/ioutil has already been deprecated in golang 1.16, so replace all ioutil with io and os 2022-02-03 05:32:12 +08:00
Sergey Kanzhelev
7e7bc6d53b remove DynamicKubeletConfig logic from kubelet 2022-01-19 22:38:04 +00:00
Paco Xu
f0e7025371 skip reduceAllocatableMemoryUsage if cgroup v2 is enabled 2021-12-16 14:46:50 +08:00
Artyom Lukianov
117141eee3 e2e_node: fix tests after Kubelet dynamic configuration removal
- CPU manager
- Memory Manager
- Topology Manager

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-08 09:42:24 +02:00
Kubernetes Prow Robot
adcd2feb5e
Merge pull request #104153 from cynepco3hahue/e2e_node_provide_static_kubelet_config
e2e node: provide static kubelet config
2021-11-04 17:11:53 -07:00
Kubernetes Prow Robot
27d3a9ec57
Merge pull request #104481 from AlexeyPerevalov/E2eIsKubeletConfiguration
e2e_node: Properly check for DynamicKubeletConfig
2021-11-04 16:11:53 -07:00
Artyom Lukianov
50fdcdfc59 e2e_node: refactor code to use a single method to update the kubelet config
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:44:35 +02:00
Artyom Lukianov
b6211657bf e2e_node: drop usage of DynamicKubeletConfig
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:26:19 +02:00
Francesco Romani
b382b6cd0a node: e2e: add test for the checkpoint recovery
Add a e2e test to exercise the checkpoint recovery flow.
This means we need to actually create a old (V1, pre-1.20) checkpoint,
but if we do it only in the e2e test, it's still fine.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-26 09:55:11 +02:00
Francesco Romani
baa55935f3 node: e2e: clarify findKubeletService
Add docstrings to findKubeletService and restartKubelet,
fix typos along the way.
xref: https://github.com/kubernetes/kubernetes/pull/105516#pullrequestreview-780230582

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 11:19:03 +02:00
Francesco Romani
d15bff2839 e2e: node: expose the running flag
Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.

To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.

Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.

In the happy path, no expected change in behaviour.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-07 22:15:28 +02:00
Francesco Romani
e878c20ac7 e2e: node: improve error logging
In the `restartKubelet` helper, we use `exec.Command`, whose
return value is the output as the command, but as `[]byte`.
The way we logged the output of the command was as value, making
the output, meant to be human readable, unnecessarily hard to read.

We fix this annoying behaviour converting the output to string before
to log it out, making pretty obvious to understand the outcome of
the command.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-07 22:13:49 +02:00
Alexey Perevalov
461d8f51f0 e2e_node: Check for DynamicKubeletConfig properly
Even DynamicKubeletConfig is deprecated it still used in e2e_node test.
The bug is hidden by forcibly enabled option
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
if this option is not enabled setKubeletConfiguration tries to set
kubelet config via apiserver interface and failed with timeout.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-08-20 15:17:58 +00:00
Davanum Srinivas
dab19517e5
Explicitly restart kubelet to stabilize serial-containerd job
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-08-02 11:24:11 -04:00
Artyom Lukianov
ef3e0fd02f e2e node: wait for kubelet health check to pass after kubelet restart
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-03-02 14:48:03 +02:00
Artyom Lukianov
a6b4868b8d e2e node: stop kubelet service instead of restarting it
The server service monitors the kubelet service and restart it
once the service is down, to avoid kubelet double restarting
we will stop the kubelet service and wait until the kubelet will be
restarted and the node will be ready.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-03-02 14:48:03 +02:00
Renaud Gaubert
501f7b16d9 Update podresources api e2e_node tests 2020-10-27 11:23:39 -07:00
Marek Siarkowicz
7d309e0104 Move Kubelet Summary API to staging repo 2020-09-22 18:23:28 +02:00
Kubernetes Prow Robot
fd9828b02a
Merge pull request #92632 from RenaudWasTaken/move-podresources-api
Move external facing podresources apis to staging
2020-09-15 10:04:07 -07:00
Renaud Gaubert
4eadf40448 Run gofmt
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 06:22:44 -07:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Pu Wang
40f2d1b8ff
e2e test support microk8s
microk8s run kubelet service as `snap.microk8s.daemon-kubelet.service`, instead of `kubelet.service`.
so e2e should use `systemctl list-units *kubelet* --state=running` to find out kubelet service of microk8s.
2020-09-13 16:11:50 +08:00
Artyom Lukianov
ab7acb9ee3 e2e node: fix kubelet service restart failure
Under e2e tests possible the situation when we restart the kubelet
number of times in the short time frame. When it happens the systemd
can fail the service restart with the `Failed with result 'start-limit-hit'.`
error.

To avoid this situation the code will reset the kubelet service start failures
on each call to the kubelet restart command.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2020-07-23 10:56:32 +03:00
Amim Knabben
2a392bf8fc Fetching kubelet address:port from kubelet configuration 2020-07-02 14:15:44 -04:00
Kubernetes Prow Robot
8ce1b535ee
Merge pull request #80831 from odinuge/hugetlb-pagesizes-cleanup
Add support for removing unsupported huge page sizes
2020-06-04 23:41:43 -07:00
Davanum Srinivas
5692926914
Move packages for slightly better UX for consumers
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-20 10:57:46 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Odin Ugedal
a233b9aab0
Add verbose message when more than one kubelet is running 2020-03-19 13:08:08 +01:00
Odin Ugedal
8b6160a367
Add support for stopping kubelet in node-e2e
This makes it possible to stop the kubelet, do some work, and then
start it again.
2020-03-19 13:08:08 +01:00
Francesco Romani
bb770c0325 e2e: getCurrentKubeletConfig: move in subpkg
Address review comments and move the helper function
in the `framework/kubelet` package to avoid circular deps
(see https://github.com/kubernetes/kubernetes/issues/81245)

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-14 10:51:08 +01:00
Francesco Romani
08ba240c6b e2e: e2e_node: refactor getCurrentKubeletConfig
this patch moves the helper getCurrentKubeletConfig function,
used in both e2e and e2e_node tests and previously duplicated,
in the common framework.

There are no intended changes in behaviour.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-13 12:53:15 +01:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
2637772298 some manual fixes 2020-02-07 18:17:40 -08:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
SataQiu
d2bdf89a8b fix golint issues in test/e2e_node 2019-11-26 16:26:55 +08:00
wojtekt
ccded14941 Eliminate some default conversions 2019-11-06 14:08:15 +01:00
Kenichi Omichi
ca4c349096 Move functions from e2e/framework/util.go
- SimpleGET: Moved to ingress sub package of e2e framework
- PollURL: Moved to ingress sub package of e2e framework
- ProxyMode: Moved to service e2e test package
- ListNamespaceEvents: Moved to e2e_node test package
- NewE2ETestNodePreparer: Removed since 59533f0cd1
2019-11-01 17:39:29 +00:00