Commit Graph

43 Commits

Author SHA1 Message Date
Francesco Romani
92e00203e0 e2e: node: unify sample device plugin utilities
Start to consolidate the sample device plugin utility
and constants in a central place, because we need
to use it in different e2e tests.

Having a central dependency is better than a maze of
entangled e2e tests depending on each other helpers.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-02-22 14:04:55 +01:00
David Porter
225658884b test: Fix node e2e device plugin flake
The device plugin test expects that no other pods are running prior to
the test starting. However, it has been observed that in some cases
some resources may still be around from previous tests. This is because
the deletion of resources from other tests is handled by deleting that
test's framework's namespace which is done asynchronously without
waiting for the other test's namespace to be deleted.

As a result, when the node e2e device plugin starts, there may still be
other pods in process of termination. To work around this, add a retry
to the device plugin test to account for the time it takes to delete the
resources from the prior test.

Signed-off-by: David Porter <david@porter.me>
2023-01-31 17:36:10 -08:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
Swati Sehgal
213a6edc57 node: e2e: Add descriptive messages for operation/error checks
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-12-13 14:54:48 +00:00
Swati Sehgal
a9e3689e63 node: e2e: ensure clean cluster state before e2e tests are run
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-12-12 14:50:36 +00:00
Patrick Ohly
df5d84ae81 e2e: accept context from Ginkgo
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.

This is a first automated step towards that: the additional parameter got added
with

    sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
        $(git grep -l -e framework.ConformanceIt -e ginkgo.It )
    $GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')

log_test.go was left unchanged.
2022-12-10 19:50:18 +01:00
Swati Sehgal
ef54dbb5cc node: e2e: device plugins: Add more logs for clarity
The device plugin test in https://testgrid.k8s.io/sig-node-release-blocking#node-kubelet-serial-containerd
has been flaky for a while now when it runs on the test infrastructure.
Locally running this test resulted in test passing without issues.

Based on the existing logs, it is not clear why podresource
API endpoint is returning 3 pods rather than the expected
two pods (device plugin pod and the test pod requesting
devices). For more clarity and debugaability on why an
addtional pod seems to be appearing we expose the output
from podresource API endpoint.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-10-19 13:57:47 +01:00
Patrick Ohly
dfdf88d4fa e2e: adapt to moved code
This is the result of automatically editing source files like this:

    go install golang.org/x/tools/cmd/goimports@latest
    find ./test/e2e* -name "*.go" | xargs env PATH=$GOPATH/bin:$PATH ./e2e-framework-sed.sh

with e2e-framework-sed.sh containing this:

sed -i \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecCommandInContainer(/e2epod.ExecCommandInContainer(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecCommandInContainerWithFullOutput(/e2epod.ExecCommandInContainerWithFullOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecShellInContainer(/e2epod.ExecShellInContainer(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecShellInPod(/e2epod.ExecShellInPod(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecShellInPodWithFullOutput(/e2epod.ExecShellInPodWithFullOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecWithOptions(/e2epod.ExecWithOptions(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.MatchContainerOutput(/e2eoutput.MatchContainerOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.PodClient(/e2epod.NewPodClient(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.PodClientNS(/e2epod.PodClientNS(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.TestContainerOutput(/e2eoutput.TestContainerOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.TestContainerOutputRegexp(/e2eoutput.TestContainerOutputRegexp(\1, /" \
    -e "s/framework.AddOrUpdateLabelOnNode\b/e2enode.AddOrUpdateLabelOnNode/" \
    -e "s/framework.AllNodes\b/e2edebug.AllNodes/" \
    -e "s/framework.AllNodesReady\b/e2enode.AllNodesReady/" \
    -e "s/framework.ContainerResourceGatherer\b/e2edebug.ContainerResourceGatherer/" \
    -e "s/framework.ContainerResourceUsage\b/e2edebug.ContainerResourceUsage/" \
    -e "s/framework.CreateEmptyFileOnPod\b/e2eoutput.CreateEmptyFileOnPod/" \
    -e "s/framework.DefaultPodDeletionTimeout\b/e2epod.DefaultPodDeletionTimeout/" \
    -e "s/framework.DumpAllNamespaceInfo\b/e2edebug.DumpAllNamespaceInfo/" \
    -e "s/framework.DumpDebugInfo\b/e2eoutput.DumpDebugInfo/" \
    -e "s/framework.DumpNodeDebugInfo\b/e2edebug.DumpNodeDebugInfo/" \
    -e "s/framework.EtcdUpgrade\b/e2eproviders.EtcdUpgrade/" \
    -e "s/framework.EventsLister\b/e2edebug.EventsLister/" \
    -e "s/framework.ExecOptions\b/e2epod.ExecOptions/" \
    -e "s/framework.ExpectNodeHasLabel\b/e2enode.ExpectNodeHasLabel/" \
    -e "s/framework.ExpectNodeHasTaint\b/e2enode.ExpectNodeHasTaint/" \
    -e "s/framework.GCEUpgradeScript\b/e2eproviders.GCEUpgradeScript/" \
    -e "s/framework.ImagePrePullList\b/e2epod.ImagePrePullList/" \
    -e "s/framework.KubectlBuilder\b/e2ekubectl.KubectlBuilder/" \
    -e "s/framework.LocationParamGKE\b/e2eproviders.LocationParamGKE/" \
    -e "s/framework.LogSizeDataTimeseries\b/e2edebug.LogSizeDataTimeseries/" \
    -e "s/framework.LogSizeGatherer\b/e2edebug.LogSizeGatherer/" \
    -e "s/framework.LogsSizeData\b/e2edebug.LogsSizeData/" \
    -e "s/framework.LogsSizeDataSummary\b/e2edebug.LogsSizeDataSummary/" \
    -e "s/framework.LogsSizeVerifier\b/e2edebug.LogsSizeVerifier/" \
    -e "s/framework.LookForStringInLog\b/e2eoutput.LookForStringInLog/" \
    -e "s/framework.LookForStringInPodExec\b/e2eoutput.LookForStringInPodExec/" \
    -e "s/framework.LookForStringInPodExecToContainer\b/e2eoutput.LookForStringInPodExecToContainer/" \
    -e "s/framework.MasterAndDNSNodes\b/e2edebug.MasterAndDNSNodes/" \
    -e "s/framework.MasterNodes\b/e2edebug.MasterNodes/" \
    -e "s/framework.MasterUpgradeGKE\b/e2eproviders.MasterUpgradeGKE/" \
    -e "s/framework.NewKubectlCommand\b/e2ekubectl.NewKubectlCommand/" \
    -e "s/framework.NewLogsVerifier\b/e2edebug.NewLogsVerifier/" \
    -e "s/framework.NewNodeKiller\b/e2enode.NewNodeKiller/" \
    -e "s/framework.NewResourceUsageGatherer\b/e2edebug.NewResourceUsageGatherer/" \
    -e "s/framework.NodeHasTaint\b/e2enode.NodeHasTaint/" \
    -e "s/framework.NodeKiller\b/e2enode.NodeKiller/" \
    -e "s/framework.NodesSet\b/e2edebug.NodesSet/" \
    -e "s/framework.PodClient\b/e2epod.PodClient/" \
    -e "s/framework.RemoveLabelOffNode\b/e2enode.RemoveLabelOffNode/" \
    -e "s/framework.ResourceConstraint\b/e2edebug.ResourceConstraint/" \
    -e "s/framework.ResourceGathererOptions\b/e2edebug.ResourceGathererOptions/" \
    -e "s/framework.ResourceUsagePerContainer\b/e2edebug.ResourceUsagePerContainer/" \
    -e "s/framework.ResourceUsageSummary\b/e2edebug.ResourceUsageSummary/" \
    -e "s/framework.RunHostCmd\b/e2eoutput.RunHostCmd/" \
    -e "s/framework.RunHostCmdOrDie\b/e2eoutput.RunHostCmdOrDie/" \
    -e "s/framework.RunHostCmdWithFullOutput\b/e2eoutput.RunHostCmdWithFullOutput/" \
    -e "s/framework.RunHostCmdWithRetries\b/e2eoutput.RunHostCmdWithRetries/" \
    -e "s/framework.RunKubectl\b/e2ekubectl.RunKubectl/" \
    -e "s/framework.RunKubectlInput\b/e2ekubectl.RunKubectlInput/" \
    -e "s/framework.RunKubectlOrDie\b/e2ekubectl.RunKubectlOrDie/" \
    -e "s/framework.RunKubectlOrDieInput\b/e2ekubectl.RunKubectlOrDieInput/" \
    -e "s/framework.RunKubectlWithFullOutput\b/e2ekubectl.RunKubectlWithFullOutput/" \
    -e "s/framework.RunKubemciCmd\b/e2ekubectl.RunKubemciCmd/" \
    -e "s/framework.RunKubemciWithKubeconfig\b/e2ekubectl.RunKubemciWithKubeconfig/" \
    -e "s/framework.SingleContainerSummary\b/e2edebug.SingleContainerSummary/" \
    -e "s/framework.SingleLogSummary\b/e2edebug.SingleLogSummary/" \
    -e "s/framework.TimestampedSize\b/e2edebug.TimestampedSize/" \
    -e "s/framework.WaitForAllNodesSchedulable\b/e2enode.WaitForAllNodesSchedulable/" \
    -e "s/framework.WaitForSSHTunnels\b/e2enode.WaitForSSHTunnels/" \
    -e "s/framework.WorkItem\b/e2edebug.WorkItem/" \
    "$@"

for i in "$@"; do
    # Import all sub packages and let goimports figure out which of those
    # are redundant (= already imported) or not needed.
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2edebug "k8s.io/kubernetes/test/e2e/framework/debug"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2ekubectl "k8s.io/kubernetes/test/e2e/framework/kubectl"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2enode "k8s.io/kubernetes/test/e2e/framework/node"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2eoutput "k8s.io/kubernetes/test/e2e/framework/pod/output"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2epod "k8s.io/kubernetes/test/e2e/framework/pod"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2eproviders "k8s.io/kubernetes/test/e2e/framework/providers"' "$i"
    goimports -w "$i"
done
2022-10-06 08:19:47 +02:00
Dave Chen
857458cfa5 update ginkgo from v1 to v2 and gomega to 1.19.0
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters

Signed-off-by: Dave Chen <dave.chen@arm.com>
2022-07-08 10:44:46 +08:00
Francesco Romani
f3e157d168 e2e: node: re-enable the device plugin tests
Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 16:05:13 +02:00
Francesco Romani
48b5af49e0 e2e: node: reorder imports
trivial cleanup

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 16:04:01 +02:00
Francesco Romani
98eb6db7c0 e2e: node: fix plugins directory
Previously, the e2e test was overriding the plugins socket directory to
"/var/lib/kubelet/plugins_registry". This seems wrong, and with that
setting the e2e test was already failing, because the registration
process was timing out, in turn because the kubelet was trying to call
back the device plugin in the wrong place (see below for details).

I can't explain why it worked before - or it if worked at all - but
it really seems that `pluginapi.DevicePluginPath` is the right
setting here.

+++

In a nutshell, the device plugin registration process works like this:

1. The kubelet runs and creates the device plugin socket registration
   endpoint:
	KubeletSocket = DevicePluginPath + "kubelet.sock"
	DevicePluginPath = "/var/lib/kubelet/device-plugins/"
2. Each device plugin will listen to an ENDPOINT the kubelet will connect
   backk to.  IOW the kubelet will act like a client to each device plugin,
   to perform allocation requests (and more)
   Each device plugin will serve from a endpoint.
   The endpoint name is plugin-specific, but they all must be inside a
   well-known directory: pluginapi.DevicePluginPath
3. The kubelet creates the device plugin pod, like any other pod
4. During the startup, each device plugin wants to register itself in the
   kubelet. So it sends a request through
   the registration endpoint. Key details:
	grpc.Dial(kubelet registration socket)
	registration request
	reqt := &pluginapi.RegisterRequest{
		Version:      pluginapi.Version,
		Endpoint:     endpointSocket,	<- socket relative to pluginapi.DevicePluginPath
		ResourceName: resourceName, 	<- resource name to be exposed
}
5. While handling the registration request, kubelet dial back the
   device plugin on socketDir + req.Endpoint.
   But socketDir is hardcoded in the device manager code to
   pluginapi.KubeletSocket

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 16:03:50 +02:00
Francesco Romani
23147ff4b3 e2e: node: devplugin: tolerate node readiness flip
In the AfterEach check of the e2e node device plugin tests,
the tests want really bad to clean up after themselves:
- delete the sample device plugin
- restart again the kubelet
- ensure that after the restart, no stale sample devices
  (provided  by the sample device plugin) are reported anymore.

We observed that in the AfterEach block of these e2e tests
we have quite reliably a flip/flop of the kubelet readiness
state, possibly related to a race with/ a slow runtime/PLEG check.

What happens is that the kubelet readiness state is true,
but goes false for a quick interval and then goes true again
and it's pretty stable after that (observed adding more logs
to the check loop).

The key factor here is the function `getLocalNode` aborts the
test (as in `framework.ExpectNoError`) if the node state is
not ready. So any occurrence of this scenario, even if it
is transient, will cause a test failure. I believe this will
make the e2e test unnecessarily fragile without making it more
correct.

For the purpose of the test we can tolerate this kind of glitches,
with kubelet flip/flopping the ready state, granted that we meet
eventually the final desired condition on which the node reports
ready AND reports no sample devices present - which was the condition
the code was trying to check.

So, we add a variant of `getLocalNode`, which just fetches the
node object the e2e_node framework created, alongside to a flag
reporting the node readiness. The new helper does not make
implicitly the test abort if the node is not ready, just bubbles
up this information.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 14:22:25 +02:00
Francesco Romani
56c539bff0 e2e: node: deviceplug: deepcopy the pod dev template
Let's avoid unexpected side effects

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 14:22:24 +02:00
Francesco Romani
19ae360af9 e2e: node: inline getSampleDevicePluginPod
Starting golangci-lint >= 1.45, the tool is complaining
about the function being unused:
```bash
test/e2e_node/device_plugin_test.go:82:6: func `getSampleDevicePluginPod` is unused (unused)
func getSampleDevicePluginPod() *v1.Pod {
     ^

Please review the above warnings. You can test via "./hack/verify-golangci-lint.sh"
If the above warnings do not make sense, you can exempt this warning with a comment
 (if your reviewer is okay with it).
In general please prefer to fix the error, we have already disabled specific lints
 that the project chooses to ignore.
See: https://golangci-lint.run/usage/false-positives/}
```

thing is the code is not changed lately, and manual inspection trivially
confirms it is used.
Older versions of golangci-lint (tested with
```
golangci-lint has version 1.41.1 built from a2074809 on 2021-06-19T16:01:50Z
```)
indeed do NOT complain about the function, so this seems a golangci-lint
bug.

To move forward, we can disable the warning, but this leaves a sour
taste.
Instead, since the function is pretty trivias, was used just once and the caller
was undoing some of the work done by the function, we just inline it,
which solves the linter warning and makes the code a bit better.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-04 17:05:00 +02:00
Francesco Romani
017998e889 e2e: node: explicit skip for device plugin tests
The device plugin e2e tests where failing lately and to unblock the
release a skip was added in the prow job configuration:
71cf119c84/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml (L401)

The problem here is not only the broken test which need to be
fixed, but also the fact that this is the only skip (for a specific
test) we do this way, which is surprising (xref:
https://github.com/kubernetes/kubernetes/issues/106635#issuecomment-1105627265)

As next step towards improvement, we add an explicit skip in the tests
proper. This makes at least more obvious these tests need more work,
and allow us to remove the edge case in the prow configuration.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-03 18:44:11 +02:00
Sergiusz Urbaniak
1495c9f2cd
test/e2e/*: default existing tests to privileged pod security policy
This is to ensure that all existing tests don't break when defaulting
the pod security policy to restricted in the e2e test framework.
2022-04-05 08:41:12 +02:00
Danielle Lancashire
03de802434 e2e_node: unify device tests
The device_plugin_tests have not run successfully in a very long time,
initially being marked flaky and then eventually becoming stale.

The gpu_device_plugin_tests have been used to test the same behaviour,
but are incredibly high maintenance due to external changes in behaviour
from GCP/Nvidia that we have no control over.

This commit takes the existing device plugin tests, makes them look more
like the GPU tests, and removes the cases that have been unsupported for
a long time (namely restarting containers while the plugin is
unavailable).

It also removes the GPU plugin tests, as we do not get more signal by
using real devices here.
2021-11-11 14:10:27 +01:00
Artyom Lukianov
50fdcdfc59 e2e_node: refactor code to use a single method to update the kubelet config
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-11-04 15:44:35 +02:00
Kubernetes Prow Robot
e450e3331f
Merge pull request #105482 from endocrimes/dani/kubeletconfig
e2e_node: remove unnecessary dynamic config changes
2021-10-28 07:04:27 -07:00
Francesco Romani
d15bff2839 e2e: node: expose the running flag
Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.

To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.

Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.

In the happy path, no expected change in behaviour.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-07 22:15:28 +02:00
Danielle Lancashire
8b1b06c507 e2e_node: Remove KubeletPodResources enablement as it is a default gate 2021-10-05 10:26:10 +02:00
Elana Hashman
59a7cc12c9
Mark failing node serial tests as flaky
Tracked in:
- https://github.com/kubernetes/kubernetes/issues/103690
- https://github.com/kubernetes/kubernetes/issues/103691
2021-07-28 10:39:30 -07:00
wojtekt
a74737eb03 Mark remaining e2e_node tests with [sig-*] label 2021-02-23 20:11:09 +01:00
Ikko Ashimine
5155decbbf
Fix typo in device_plugin_test.go
assignement -> assignment
2021-01-24 17:42:34 +09:00
Renaud Gaubert
501f7b16d9 Update podresources api e2e_node tests 2020-10-27 11:23:39 -07:00
Renaud Gaubert
4eadf40448 Run gofmt
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 06:22:44 -07:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Aaron Crickenberger
28768166f5 decouple testfiles from framework
This drops testfiles.ReadOrDie and updated testfiles.Exists to return an
error, forcing the caller to decide whether to call framework.Fail or do
something else.

It makes for a slightly less friendly API, but also means the package is
decoupled from framework again, as per the comments at the top of the
file
2020-06-29 14:54:09 -07:00
drfish
dfab6b637f Update .import-aliases for e2e test framework 2020-03-25 11:40:02 +08:00
Mike Danese
76f8594378 more artisanal fixes
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
2020-03-05 14:59:47 -08:00
Mike Danese
c58e69ec79 automated refactor 2020-03-05 14:59:46 -08:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
Haosdent Huang
4536ed50a0 e2e: move funs of framework/deviceplugin to e2e_node 2019-12-16 00:46:56 +08:00
Xuewei Zhang
c0db5b2562 Convert ExpectEqual(err, nil) to ExpectNoError(err) 2019-12-04 20:28:31 -08:00
Lantao Liu
32850dc47d Revert "Use ExpectEqual test/e2e_node"
This reverts commit 561ee6ece9.
2019-12-04 18:14:13 -08:00
tanjunchen
561ee6ece9 Use ExpectEqual test/e2e_node 2019-12-03 18:01:30 +08:00
SataQiu
d2bdf89a8b fix golint issues in test/e2e_node 2019-11-26 16:26:55 +08:00
carlory
d1290ffdef clean up test code 2019-09-05 23:44:19 +08:00
toyoda
c9ec5210e9 Use ExpectEqual in test/e2e_node/[a-d] 2019-08-01 09:27:52 +09:00
SataQiu
641d330f89 e2e_node: clean up non-recommended import 2019-07-28 12:49:36 +08:00
Benjamin Elder
4542cb937f fix building test/e2e_node/ with bazel 0.28.1 2019-07-26 14:32:40 -07:00