Commit Graph

2335 Commits

Author SHA1 Message Date
ZhangKe10140699
572360c5a5 Fix:[Flake] [sig-node] Restart [Serial] [Slow] [Disruptive] Kubelet should correctly account for terminated pods after restart 2023-01-05 09:10:57 +08:00
Antonio Ojea
e0a23577d2 pass context to gomega
Change-Id: Ibef02a52d8922984a09efa48361b9876fc91287c
2022-12-19 13:14:02 +00:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
Kubernetes Prow Robot
770b39c65b Merge pull request #114072 from Tal-or/deflake_e2e_cpumanager_metrics_tests
e2e: cpumanager: proper test clean-up
2022-12-14 11:55:45 -08:00
Kubernetes Prow Robot
7403090e40 Merge pull request #113309 from swatisehgal/devicemgr-e2e-remove-flakiness
node: e2e: device plugins: Deflake e2e tests
2022-12-14 10:47:34 -08:00
Swati Sehgal
213a6edc57 node: e2e: Add descriptive messages for operation/error checks
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-12-13 14:54:48 +00:00
Patrick Ohly
d4729008ef e2e: simplify test cleanup
ginkgo.DeferCleanup has multiple advantages:
- The cleanup operation can get registered if and only if needed.
- No need to return a cleanup function that the caller must invoke.
- Automatically determines whether a context is needed, which will
  simplify the introduction of context parameters.
- Ginkgo's timeline shows when it executes the cleanup operation.
2022-12-13 08:09:01 +01:00
Swati Sehgal
62e4d39c2f node: e2e: address review comments (2022/12/12)
- use `ginkgo.DeferCleanup` instead of clean up in the AfterEach block
- encourage use of ginkgo by not extending expect.go

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-12-12 16:31:40 +00:00
Swati Sehgal
a9e3689e63 node: e2e: ensure clean cluster state before e2e tests are run
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-12-12 14:50:36 +00:00
Swati Sehgal
7e880d1bab node: e2e: ensure log rotation pod is deleted after test
Some node e2e tests check for expected number of pods running
on the node to verify the correct state of that node after running
test scenarios. An example of such a check is in the device plugin
end to end test here: [1].

If the node is not left in a clean state after an e2e test finishes
running, it can lead to flaky tests because the node might have
unexpected pods running on the node.

In order to avoid that, we make sure that the test pods are
cleaned up after the test runs.

[1]: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/device_plugin_test.go#L189-L190

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-12-12 14:50:32 +00:00
Patrick Ohly
df5d84ae81 e2e: accept context from Ginkgo
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.

This is a first automated step towards that: the additional parameter got added
with

    sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
        $(git grep -l -e framework.ConformanceIt -e ginkgo.It )
    $GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')

log_test.go was left unchanged.
2022-12-10 19:50:18 +01:00
Talor Itzhak
56c5a95849 e2e: cpumanager: proper test clean-up
One of the cpumanager tests doesn't remove the pod
that got created during the test.

This causes pollution of other tests and failures
from time to time (depends on the test execution order).

In order to defalke the tests, we should delete the pod
and wait for it to be completely remove.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2022-11-22 17:25:52 +02:00
Andrew Sy Kim
6c8eacb157 test/e2e_node: set apiserver kubelet preferred addresses
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-11-21 15:09:04 -05:00
Michal Wozniak
c803892bd8 Enable the feature into beta 2022-11-09 09:02:40 +01:00
Michal Wozniak
52cd6755eb Add pod disruption conditions for kubelet initiated failures 2022-11-07 11:23:22 +01:00
Kubernetes Prow Robot
565f582c4b Merge pull request #113199 from bobbypage/node_e2e_stop_kubelet
test: Stop kubelet systemd service after node e2e
2022-11-06 01:34:16 -08:00
David Ashpole
64af1adace Second attempt: Plumb context to Kubelet CRI calls (#113591)
* plumb context from CRI calls through kubelet

* clean up extra timeouts

* try fixing incorrectly cancelled context
2022-11-05 06:02:13 -07:00
Kubernetes Prow Robot
6fe5429969 Merge pull request #113273 from bobbypage/restart_test_fix
test: Fix e2e_node restart_test flake
2022-11-04 05:14:14 -07:00
Kubernetes Prow Robot
a9f87ad6c8 Merge pull request #113384 from pohly/e2e-formatting
e2e: formatting enhancements
2022-11-02 21:40:08 -07:00
Francesco Romani
ff44dc1932 cpumanager: the FG is locked to default (ON)
hence we can remove the if() guards, the feature
is always available.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-11-02 18:41:41 +01:00
Kubernetes Prow Robot
63e40b1ed4 Merge pull request #113548 from aojea/revert_113408
Revert "plumb context from CRI calls through kubelet"
2022-11-02 10:13:14 -07:00
Kubernetes Prow Robot
e8449012e2 Merge pull request #113512 from ehashman/rm-ehashman-node
Remove ehashman from sig-node roles
2022-11-02 08:36:00 -07:00
Antonio Ojea
9c2b333925 Revert "plumb context from CRI calls through kubelet"
This reverts commit f43b4f1b95.
2022-11-02 13:37:23 +00:00
Kubernetes Prow Robot
7b84436168 Merge pull request #113408 from dashpole/kubelet_context
Plumb context to Kubelet CRI calls
2022-11-01 19:59:08 -07:00
Kubernetes Prow Robot
4a0bb39d2a Merge pull request #113282 from xmcqueen/master
Image Version Bump in Manifest for Node Perf Test tf-wide-deep
2022-11-01 19:58:45 -07:00
Elana Hashman
9d2d392802 Remove ehashman from sig-node roles 2022-11-01 12:16:43 -07:00
Patrick Ohly
5a01a52b0c test: extend gomega to use YAML for API types
Some of our API types contain fields that get rendered very poorly by
gomega.format.Object because they contain lots of internal information, for
example CreationTimestamp. As a result, dumping full API object typically gets
truncated.

What we want is a representation that is a) multi-line (in contrast to the
stringer implemented by our types) and b) drops empty fields where it
was defined that this is okay.

The normal YAML representation fits that requirement. We just need to teach
gomega how and when to do that. This cannot be done for each type through a
generated GomegaString method (lots of code, additional dependency in public
API on YAML encoder), but it can be done inside tests by adding a formatting
handler (new gomega feature).
2022-10-28 15:43:48 +02:00
David Ashpole
f43b4f1b95 plumb context from CRI calls through kubelet 2022-10-28 02:55:28 +00:00
Francesco Romani
bdc08eaa4b e2e: node: add tests for cpumanager metrics
Add tests to verify the cpumanager metrics are populated.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-27 14:40:56 +02:00
Brian McQueen
08c22d6d9a bumped version of tf-wide-deep image to 1.3 in test manifest, and removed the data download from the tf-wide-deep pod spec command 2022-10-23 10:13:13 -07:00
David Porter
efc9e73266 test: Fix e2e_node restart_test flake
In the `should correctly account for terminated pods after restart`, the
test first creates a set of `restartNever` pods, followed by a set of
`restartAlways` pods. Both the `restartNever` and `restartAlways` pods
request an entire CPU. As a result, the `restartAlways` pods will not be
admitted, if the `restartNever` pods did not terminate yet.

Depending on the timing/how fast the pods terminate, the test can pass
sometimes fail which results in flakes. To de-flake the test, the test
should wait until the `restartNever` pods enter a terminal `Succeeded`
phase, before creating the `restartAlways` pods.

To do this, generalize the function `waitForPods` to accept a pod
condition (`testutils.PodRunningReadyOrSucceeded`, or
`testutils.PodSucceeded`). Also introduce a new "Succeeded" pod
condition, so the test can explicitly wait until the pods enter the
Succeeded phase.

Signed-off-by: David Porter <david@porter.me>
2022-10-21 17:14:56 -07:00
David Porter
048ed7ddc0 test: Stop kubelet systemd service after node e2e
Currently, when running node e2e it's not possible to use the ginkgo `--repeat`
flag to run the test suite multiple times. This is useful when debugging tests
and ensuring they are not flaky by re-running them several times. Currently if
using `--repeat` ginkgo flag, the 2nd run of the test will fail due to kubelet
not starting with message like:

```
Failed to start transient service unit: Unit kubelet-20221020T040841.service already exists.
```

This is because during the test startup, kubelet is started as a transient unit
file via `systemd-run`. The unit is started with the `--remain-after-exit` flag
to ensure that the unit will remain even if the kubelet is restarted. The test
suite currently uses `systemd kill` command to stop kubelet. This works fine for
stopping the kubelet, but on the second run, when `systemd-run` is used to start
systemd unit again it will fail because the unit already exists. This is because
`systemd kill` will not delete the systemd unit, only send SIGTERM signal to it.

To fix this, add `unitName` as a field to the `server` struct. When
kubelet server is constructed, set the unit name. As part of e2e test
termination, in `E2EServices.Stop()``, stop the kubelet systemd unit. By
stopping the kubelet systemd unit, systemd will delete the systemd
transient unit, allowing it to be created and started again in a
subsequent e2e run.

Signed-off-by: David Porter <david@porter.me>
2022-10-20 13:31:23 -07:00
Kubernetes Prow Robot
45636684a4 Merge pull request #112897 from fromanirh/podresources-metrics-e2e-tests
register podresources metrics
2022-10-19 13:57:18 -07:00
Kubernetes Prow Robot
42c1f881cc Merge pull request #113165 from swatisehgal/e2e-deviceplugin-logs
node: e2e: device plugins: Add more logs for clarity
2022-10-19 08:59:14 -07:00
Swati Sehgal
ef54dbb5cc node: e2e: device plugins: Add more logs for clarity
The device plugin test in https://testgrid.k8s.io/sig-node-release-blocking#node-kubelet-serial-containerd
has been flaky for a while now when it runs on the test infrastructure.
Locally running this test resulted in test passing without issues.

Based on the existing logs, it is not clear why podresource
API endpoint is returning 3 pods rather than the expected
two pods (device plugin pod and the test pod requesting
devices). For more clarity and debugaability on why an
addtional pod seems to be appearing we expose the output
from podresource API endpoint.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-10-19 13:57:47 +01:00
Kubernetes Prow Robot
e1812683e3 Merge pull request #113042 from swatisehgal/memorymgr-fix-rejection-test
node: e2e: memorymgr: Fix test failure
2022-10-17 04:39:07 -07:00
Kubernetes Prow Robot
6f579d3ceb Merge pull request #111616 from ndixita/credential-api-ga
Move the Kubelet Credential Provider feature to GA and Update the Credential Provider API to GA
2022-10-15 07:53:09 -07:00
Swati Sehgal
6c6865af28 node: e2e: memorymgr: Fix test failure
The change made in https://github.com/kubernetes/kubernetes/pull/112644
resulted in an update to the rejection message. In the memory manager
node e2e test, we still checked against the old expected error message
giving the impression that the pod succeeded to run even though it failed
as expected mainly because the check wasn't performed correctly.

In this patch, we update to the correct rejection message to make sure
that the memory manager is no longer failing.

NOTE: This test is supposed to run on multi NUMA systems and if the
underlying node does not have multi NUMA nodes, the test is skipped
which is what happens in upstream test infrastructure as it is mainly
composed of single NUMA nodes. Because of this, this test failure
wasn't evident via testgrid.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-10-13 12:45:14 +01:00
Francesco Romani
3c60c1a10c node: e2e: add podresources metrics tests
add tests to ensure the podresources metrics are exposed,
and basic sanity tests for their values.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-06 15:14:56 +02:00
Patrick Ohly
dfdf88d4fa e2e: adapt to moved code
This is the result of automatically editing source files like this:

    go install golang.org/x/tools/cmd/goimports@latest
    find ./test/e2e* -name "*.go" | xargs env PATH=$GOPATH/bin:$PATH ./e2e-framework-sed.sh

with e2e-framework-sed.sh containing this:

sed -i \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecCommandInContainer(/e2epod.ExecCommandInContainer(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecCommandInContainerWithFullOutput(/e2epod.ExecCommandInContainerWithFullOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecShellInContainer(/e2epod.ExecShellInContainer(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecShellInPod(/e2epod.ExecShellInPod(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecShellInPodWithFullOutput(/e2epod.ExecShellInPodWithFullOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.ExecWithOptions(/e2epod.ExecWithOptions(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.MatchContainerOutput(/e2eoutput.MatchContainerOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.PodClient(/e2epod.NewPodClient(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.PodClientNS(/e2epod.PodClientNS(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.TestContainerOutput(/e2eoutput.TestContainerOutput(\1, /" \
    -e "s/\(f\|fr\|\w\w*\.[fF]\w*\)\.TestContainerOutputRegexp(/e2eoutput.TestContainerOutputRegexp(\1, /" \
    -e "s/framework.AddOrUpdateLabelOnNode\b/e2enode.AddOrUpdateLabelOnNode/" \
    -e "s/framework.AllNodes\b/e2edebug.AllNodes/" \
    -e "s/framework.AllNodesReady\b/e2enode.AllNodesReady/" \
    -e "s/framework.ContainerResourceGatherer\b/e2edebug.ContainerResourceGatherer/" \
    -e "s/framework.ContainerResourceUsage\b/e2edebug.ContainerResourceUsage/" \
    -e "s/framework.CreateEmptyFileOnPod\b/e2eoutput.CreateEmptyFileOnPod/" \
    -e "s/framework.DefaultPodDeletionTimeout\b/e2epod.DefaultPodDeletionTimeout/" \
    -e "s/framework.DumpAllNamespaceInfo\b/e2edebug.DumpAllNamespaceInfo/" \
    -e "s/framework.DumpDebugInfo\b/e2eoutput.DumpDebugInfo/" \
    -e "s/framework.DumpNodeDebugInfo\b/e2edebug.DumpNodeDebugInfo/" \
    -e "s/framework.EtcdUpgrade\b/e2eproviders.EtcdUpgrade/" \
    -e "s/framework.EventsLister\b/e2edebug.EventsLister/" \
    -e "s/framework.ExecOptions\b/e2epod.ExecOptions/" \
    -e "s/framework.ExpectNodeHasLabel\b/e2enode.ExpectNodeHasLabel/" \
    -e "s/framework.ExpectNodeHasTaint\b/e2enode.ExpectNodeHasTaint/" \
    -e "s/framework.GCEUpgradeScript\b/e2eproviders.GCEUpgradeScript/" \
    -e "s/framework.ImagePrePullList\b/e2epod.ImagePrePullList/" \
    -e "s/framework.KubectlBuilder\b/e2ekubectl.KubectlBuilder/" \
    -e "s/framework.LocationParamGKE\b/e2eproviders.LocationParamGKE/" \
    -e "s/framework.LogSizeDataTimeseries\b/e2edebug.LogSizeDataTimeseries/" \
    -e "s/framework.LogSizeGatherer\b/e2edebug.LogSizeGatherer/" \
    -e "s/framework.LogsSizeData\b/e2edebug.LogsSizeData/" \
    -e "s/framework.LogsSizeDataSummary\b/e2edebug.LogsSizeDataSummary/" \
    -e "s/framework.LogsSizeVerifier\b/e2edebug.LogsSizeVerifier/" \
    -e "s/framework.LookForStringInLog\b/e2eoutput.LookForStringInLog/" \
    -e "s/framework.LookForStringInPodExec\b/e2eoutput.LookForStringInPodExec/" \
    -e "s/framework.LookForStringInPodExecToContainer\b/e2eoutput.LookForStringInPodExecToContainer/" \
    -e "s/framework.MasterAndDNSNodes\b/e2edebug.MasterAndDNSNodes/" \
    -e "s/framework.MasterNodes\b/e2edebug.MasterNodes/" \
    -e "s/framework.MasterUpgradeGKE\b/e2eproviders.MasterUpgradeGKE/" \
    -e "s/framework.NewKubectlCommand\b/e2ekubectl.NewKubectlCommand/" \
    -e "s/framework.NewLogsVerifier\b/e2edebug.NewLogsVerifier/" \
    -e "s/framework.NewNodeKiller\b/e2enode.NewNodeKiller/" \
    -e "s/framework.NewResourceUsageGatherer\b/e2edebug.NewResourceUsageGatherer/" \
    -e "s/framework.NodeHasTaint\b/e2enode.NodeHasTaint/" \
    -e "s/framework.NodeKiller\b/e2enode.NodeKiller/" \
    -e "s/framework.NodesSet\b/e2edebug.NodesSet/" \
    -e "s/framework.PodClient\b/e2epod.PodClient/" \
    -e "s/framework.RemoveLabelOffNode\b/e2enode.RemoveLabelOffNode/" \
    -e "s/framework.ResourceConstraint\b/e2edebug.ResourceConstraint/" \
    -e "s/framework.ResourceGathererOptions\b/e2edebug.ResourceGathererOptions/" \
    -e "s/framework.ResourceUsagePerContainer\b/e2edebug.ResourceUsagePerContainer/" \
    -e "s/framework.ResourceUsageSummary\b/e2edebug.ResourceUsageSummary/" \
    -e "s/framework.RunHostCmd\b/e2eoutput.RunHostCmd/" \
    -e "s/framework.RunHostCmdOrDie\b/e2eoutput.RunHostCmdOrDie/" \
    -e "s/framework.RunHostCmdWithFullOutput\b/e2eoutput.RunHostCmdWithFullOutput/" \
    -e "s/framework.RunHostCmdWithRetries\b/e2eoutput.RunHostCmdWithRetries/" \
    -e "s/framework.RunKubectl\b/e2ekubectl.RunKubectl/" \
    -e "s/framework.RunKubectlInput\b/e2ekubectl.RunKubectlInput/" \
    -e "s/framework.RunKubectlOrDie\b/e2ekubectl.RunKubectlOrDie/" \
    -e "s/framework.RunKubectlOrDieInput\b/e2ekubectl.RunKubectlOrDieInput/" \
    -e "s/framework.RunKubectlWithFullOutput\b/e2ekubectl.RunKubectlWithFullOutput/" \
    -e "s/framework.RunKubemciCmd\b/e2ekubectl.RunKubemciCmd/" \
    -e "s/framework.RunKubemciWithKubeconfig\b/e2ekubectl.RunKubemciWithKubeconfig/" \
    -e "s/framework.SingleContainerSummary\b/e2edebug.SingleContainerSummary/" \
    -e "s/framework.SingleLogSummary\b/e2edebug.SingleLogSummary/" \
    -e "s/framework.TimestampedSize\b/e2edebug.TimestampedSize/" \
    -e "s/framework.WaitForAllNodesSchedulable\b/e2enode.WaitForAllNodesSchedulable/" \
    -e "s/framework.WaitForSSHTunnels\b/e2enode.WaitForSSHTunnels/" \
    -e "s/framework.WorkItem\b/e2edebug.WorkItem/" \
    "$@"

for i in "$@"; do
    # Import all sub packages and let goimports figure out which of those
    # are redundant (= already imported) or not needed.
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2edebug "k8s.io/kubernetes/test/e2e/framework/debug"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2ekubectl "k8s.io/kubernetes/test/e2e/framework/kubectl"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2enode "k8s.io/kubernetes/test/e2e/framework/node"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2eoutput "k8s.io/kubernetes/test/e2e/framework/pod/output"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2epod "k8s.io/kubernetes/test/e2e/framework/pod"' "$i"
    sed -i -e '/"k8s.io.kubernetes.test.e2e.framework"/a e2eproviders "k8s.io/kubernetes/test/e2e/framework/providers"' "$i"
    goimports -w "$i"
done
2022-10-06 08:19:47 +02:00
Patrick Ohly
92047da152 e2e: make import blocks consistent 2022-10-06 08:16:47 +02:00
Patrick Ohly
5614a9d064 e2e framework: eliminate interim sub packages
The "todo" packages were necessary while moving code around to avoid hitting
cyclic dependencies. Now that any sub package can depend on the framework, they
are no longer needed and the code can be moved into the normal sub packages.
2022-10-06 08:16:47 +02:00
Patrick Ohly
802451b6ca e2e framework: move metrics gathering into sub package
This reduces the size of the test/e2e/framework itself. Because it does not
gather metrics data anymore by default, E2E test suites must set their
callbacks function or set the original one by importing
"k8s.io/kubernetes/test/e2e/framework/todo/metrics/init".
2022-10-06 08:16:47 +02:00
Patrick Ohly
b8d28cb6c3 e2e framework: move node helper code into sub package
This reduces the size of the test/e2e/framework itself. Because it does not
check nodes anymore by default, E2E test suites must set their own check
function or set the original one by importing
"k8s.io/kubernetes/test/e2e/framework/todo/node/init".
2022-10-06 08:16:47 +02:00
Patrick Ohly
c45a924c5e e2e framework: move dumping of information into sub package
This reduces the size of the test/e2e/framework itself. Because it does not
dump anything anymore by default, E2E test suites must set their own dump
function or set the original one by importing
"k8s.io/kubernetes/test/e2e/framework/debug/init".
2022-10-06 08:16:47 +02:00
Kubernetes Prow Robot
98233be715 Merge pull request #112709 from swagatbora90/kubelet-tracing
Support otel tracing in cri remote image service
2022-10-04 14:12:00 -07:00
Tim Hockin
70c1c795e8 Remove generated file rules in make
This is all covered by update-codegen.sh now.

The old `make generated_files` rule still exists, but just prints a
warning.
2022-10-04 08:50:30 -07:00
Dixita Narang
d6ab1da1b5 Update test to validate against v1 kubelet APIs 2022-10-03 17:57:25 +00:00
Dixita Narang
a016b06bbd Update test plugin to use v1 kubelet APIs 2022-10-03 17:57:14 +00:00
Dixita Narang
1ac4fc779b Update kubelet credential provider tests to use new v1 APIs 2022-09-30 20:51:39 +00:00