The test creates a Service exposing two protocols on the same port
and a backend that replies on both protocols.
1. Test that Service with works for both protocol
2. Update Service to expose only the TCP port
3. Verify that TCP works and UDP does not work
4. Update Service to expose only the UDP port
5. Verify that TCP does not work and UDP does work
Change-Id: Ic4f3a6509e332aa5694d20dfc3b223d7063a7871
Test 2 scenarios:
- pod can connect to a terminating pods
- terminating pod can connect to other pods
Change-Id: Ia5dc4e7370cc055df452bf7cbaddd9901b4d229d
v1.Container is still changing a log which caused the test to fail each time a
new field was added. To test loading, let's better use something that is
unlikely to change. The runtimev1.VersionResponse gets logged by kubelet and
seems to be stable.
The benchmarks and unit tests were written so that they used custom APIs for
each log format. This made them less realistic because there were subtle
differences between the benchmark and a real Kubernetes component. Now all
logging configuration is done with the official
k8s.io/component-base/logs/api/v1.
To make the different test cases more comparable, "messages/s" is now reported
instead of the generic "ns/op".
When trying again with recent log files from the CI job, it was found that some
JSON messages get split across multiple lines, both in container logs and in
the systemd journal:
2022-12-21T07:09:47.914739996Z stderr F {"ts":1671606587914.691,"caller":"rest/request.go:1169","msg":"Response ...
2022-12-21T07:09:47.914984628Z stderr F 70 72 6f 78 79 10 01 1a 13 53 ... \".|\n","v":8}
Note the different time stamp on the second line. That first line is
long (17384 bytes). This seems to happen because the data must pass through a
stream-oriented pipe and thus may get split up by the Linux kernel.
The implication is that lines must get merged whenever the JSON decoder
encounters an incomplete line. The benchmark loader now supports that. To
simplifies this, stripping the non-JSON line prefixes must be done before using
a log as test data.
The updated README explains how to do that when downloading a CI job
result. The amount of manual work gets reduced by committing symlinks under
data to the expected location under ci-kubernetes-kind-e2e-json-logging and
ignoring them when the data is not there.
Support for symlinks gets removed and path/filepath is used instead of path
because it has better Windows support.
A Service can use multiple EndpointSlices for its backend, when
using custom Endpoint Slices, the data plane should forward traffic
to any of the endpoints in the Endpointslices that belong to the
Service.
Change-Id: I80b42522bf6ab443050697a29b94d8245943526f
* migrating pkg/controller/serviceaccount to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
* small nit
Signed-off-by: Naman <namanlakhwani@gmail.com>
* capitalising first letter of error
Signed-off-by: Naman <namanlakhwani@gmail.com>
* addressed review comments
Signed-off-by: Naman <namanlakhwani@gmail.com>
* small nit to add key
Signed-off-by: Naman <namanlakhwani@gmail.com>
---------
Signed-off-by: Naman <namanlakhwani@gmail.com>
* migrated controller/replicaset to contextual logging
Signed-off-by: Naman <namanlakhwani@gmail.com>
* small nits
Signed-off-by: Naman <namanlakhwani@gmail.com>
* addressed changes
Signed-off-by: Naman <namanlakhwani@gmail.com>
* small nit
Signed-off-by: Naman <namanlakhwani@gmail.com>
* taking t as input
Signed-off-by: Naman <namanlakhwani@gmail.com>
---------
Signed-off-by: Naman <namanlakhwani@gmail.com>
Bump the timeout as the previous timeout was sometimes too short,
resulting in the pod status update not sent. Also, fixed a typo in
previous refactor.
Signed-off-by: David Porter <david@porter.me>
Breakdown of the steps implemented as part of this e2e test is as follows:
1. Create a file `registration` at path `/var/lib/kubelet/device-plugins/sample/`
2. Create sample device plugin with an environment variable with
`REGISTER_CONTROL_FILE=/var/lib/kubelet/device-plugins/sample/registration` that
waits for a client to delete the control file.
3. Trigger plugin registeration by deleting the abovementioned directory.
4. Create a test pod requesting devices exposed by the device plugin.
5. Stop kubelet.
6. Remove pods using CRI to ensure new pods are created after kubelet restart.
7. Restart kubelet.
8. Wait for the sample device plugin pod to be running. In this case,
the registration is not triggered.
9. Ensure that resource capacity/allocatable exported by the device plugin is zero.
10. The test pod should fail with `UnexpectedAdmissionError`
11. Delete the test pod.
12. Delete the sample device plugin pod.
13. Remove `/var/lib/kubelet/device-plugins/sample/` and its content, the directory
created to control registration
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
This commit reuses e2e tests implmented as part of https://github.com/kubernetes/kubernetes/pull/110729.
The commit is borrowed from the aforementioned PR as is to preserve
authorship. Subsequent commit will update the end to end test to
simulate the problem this PR is trying to solve by reproducing
the issue: 109595.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Add node e2e test to verify that static pods can be started after a
previous static pod with the same config temporarily failed termination.
The scenario is:
1. Static pod is started
2. Static pod is deleted
3. Static pod termination fails (internally `syncTerminatedPod` fails)
4. At later time, pod termination should succeed
5. New static pod with the same config is (re)-added
6. New static pod is expected to start successfully
To repro this scenario, setup a pod using a NFS mount. The NFS server is
stopped which will result in volumes failing to unmount and
`syncTerminatedPod` to fail. The NFS server is later started, allowing
the volume to unmount successfully.
xref:
1. https://github.com/kubernetes/kubernetes/pull/113145#issuecomment-1289587988
2. https://github.com/kubernetes/kubernetes/pull/113065
3. https://github.com/kubernetes/kubernetes/pull/113093
Signed-off-by: David Porter <david@porter.me>
Add a node e2e to verify that if a static pod is terminated while the
container runtime or CRI returns an error, the pod is eventually
terminated successfully.
This test serves as a regression test for k8s.io/issue/113145 which
fixes an issue where force deleted pods may not be terminated if the
container runtime fails during a `syncTerminatingPod`.
To test this behavior, start a static pod, stop the container runtime,
and later start the container runtime. The static pod is expected to
eventually terminate successfully.
To start and stop the container runtime, we need to find the container
runtime systemd unit name. Introduce a util function
`findContainerRuntimeServiceName` which finds the unit name by getting
the pid of the container runtime from the existing
`ContainerRuntimeProcessName` flag passed into node e2e and using
systemd dbus `GetUnitNameByPID` function to convert the pid of the
container runtime to a unit name. Using the unit name, introduce helper
functions to start and stop the container runtime.
Signed-off-by: David Porter <david@porter.me>
The existing path is incorrect (missing `sample-device-plugin`)
directory and thus causing test failures. The full path should be
`test/e2e/testing-manifests/sample-device-plugin/sample-device-plugin.yaml`.
Signed-off-by: David Porter <david@porter.me>
Update go-jose from v2.2.2 to v2.6.0.
This is to make the kubernetes code compatible with newer go-jose versions that have a small breaking change (`jwt.NewNumericDate()` returns a pointer).
Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
This improves performance of the text formatting and ktesting.
Because ktesting no longer buffers messages by default, one unit
test needs to ask for that explicitly.
When running as part of the scheduler_perf benchmark testing, we want to print
less information by default, so we should use V to limit verbosity
Pretty-printing doesn't belong into "application" code. I am moving that into
the ktesting formatting (https://github.com/kubernetes/kubernetes/pull/116180).
Update the sample device plugin to enable the e2e node tests (or any
other entity with full access to the node filesystem) to control the
registration process. We add a new environment variable `REGISTER_CONTROL_FILE`.
The value of this variable must be a file which prevents the plugin
to register itself while it's present. Once removed, the plugin will
go on and complete the registration. The plugin will automatically
detect the parent directory on which the file resides and detect
deletions, unblocking the registration process. If the file is specified
but unaccessible, the plugin will fail. If the file is not specified,
the registration process will progress as usual and never pause.
The plugin will need read access to the parent directory.
This feature is useful because it is not possible to control the order
in which the pods are recovered after node reboot/kubelet restart.
In this approach, the testing environment will create a directory and
then a empty file to pause the registration process of the plugin.
Once pointed to that file, the plugin will start and wait for it to
be deleted. Only after the directory has been deleted,
the plugin would proceed to registration.
This feature is used in #114640 where e2e test is implemented to
simulate scenarios where application pods requesting devices come up before
the device plugin pod on node reboot/ kubelet restart.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
This replaces the pretty useless us/op metric (useless because it includes
setup and teardown times) with the same values that also get stored in the JSON
file.
The main advantage is that benchstat can be used to analyze and compare
results.
The upstream ktesting has to be very flexible to accommodate different ways of
using it. In Kubernetes, we can be opinionated and make certain choices, like
using klog flags, and only those.
In contrast to EtcdMain, it can be called by individual tests or benchmarks and
each caller will get a fresh etcd instance. However, it uses the same
underlying code and the same port for all instances, so tests cannot run in
parallel.
Add the following ginkgo flags for each node e2e similar to the
existing hack/ginkgo-e2e.sh script.
* --no-color, colors aren't rendered properly in prow and make examining
the log in text editors more difficult, so let's disable them.
`hack/ginkgo-e2e.sh` (used for kind e2e tests) also disables them
already.
* -v, enable verbose logs. This is needed so we get more detailed info
even when the tests pass. This is useful so we can compare successful
runs to failed runs.
Signed-off-by: David Porter <david@porter.me>
When running multiple node e2e with multiple machine images, the tests
are run separately for each node. The final build log has all of the
results for each of the hosts combined together which make debugging the
log difficult. To make it easier, emit a log for each host that was run.
This log will be written to the results directory and uploaded as an
artifact in prow jobs.
Signed-off-by: David Porter <david@porter.me>
This was never being used, the only config that used it was deleted in
https://github.com/kubernetes/test-infra/pull/26017 so we don't need
this anymore, so let's delete it.
Signed-off-by: David Porter <david@porter.me>
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.
Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
1. Define ContainerResizePolicy and add it to Container struct.
2. Add ResourcesAllocated and Resources fields to ContainerStatus struct.
3. Define ResourcesResizeStatus and add it to PodStatus struct.
4. Add InPlacePodVerticalScaling feature gate and drop disabled fields.
5. ResizePolicy validation & defaulting and Resources mutability for CPU/Memory.
6. Various fixes from code review feedback (originally committed on Apr 12, 2022)
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
All wrappers except for ExpectNoError are identical to their gomega
counterparts. The only advantage that they have is that their invocations are
shorter.
That advantage does not outweigh their disadvantages:
- cannot be used in combination with gomega.Eventually/Consistently
- not a full replacement for gomega, so we just end up using both
- don't support passing a stack offset and thus cannot be used in helper
functions
- ginkgolinter does not work for them, so sub-optimal calls like this one
are not reported:
framework.ExpectEqual(len(items), 0)
->
gomega.Expect(items).To(gomega.BeEmpty())
- developers try to make do with what's available in the framework, leading
to sub-optimal checks like this:
framework.ExpectEqual(true, strings.Contains(event.Message, expectedEventError), "Event error should indicate non-root policy caused container to not start")
->
gomega.Expect(event.Message).To(gomega.ContainSubstring(expectedEventError), "Event error should indicate non-root policy caused container to not start")
So let's remove these wrappers. As a first step they get marked as deprecated.
This enables stricter
linting (https://github.com/kubernetes/kubernetes/pull/109728), once enabled,
to report new code which uses them.
All of these issues were reported by https://github.com/nunnatsa/ginkgolinter.
Fixing these issues is useful (several expressions get simpler, using
framework.ExpectNoError is better because it has additional support for
failures) and a necessary step for enabling that linter in our golangci-lint
invocation.
* refine: the server-side http Request Body is always non-nil
* revert changes under vendor
* Update staging/src/k8s.io/pod-security-admission/cmd/webhook/server/server.go
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
* Update main.go
---------
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Fix the waiting logic in the e2e test loop to wait
for resources to be reported again instead of making logic on the
timestamp. The idea is that waiting for resource availability
is the canonical way clients should observe the desired state,
and it should also be more robust than comparing timestamps,
especially on CI environments.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Start to consolidate the sample device plugin utility
and constants in a central place, because we need
to use it in different e2e tests.
Having a central dependency is better than a maze of
entangled e2e tests depending on each other helpers.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The podresources e2e tests want to exercise the case on which a device
plugin doesn't report topology affinity. The only known device plugin
which had this requirement and didn't depend on specialized hardware
was the kubevirt device plugin, which was however deprecated after
we started using it.
So the e2e tests are now broken, and in any case they can't depend on
unmaintained and liable to be obsolete code.
To unblock the state and preserve some e2e signal, we switch to the
sample device plugin, which is a stub implementation and which is
managed in-tree, so we can maintain it and ensure it fits the e2e test
usecase.
This is however a regression, because e2e tests should try their hardest
to use real devices and avoid any mocking or faking.
The upside is that using a OS-neutral device plugin for the tests enables
us to run on all the supported platform (windows!) so this could allow
us to transition these tests to conformance.
Signed-off-by: Francesco Romani <fromani@redhat.com>
rename getPodResources for clarity. Allow to return error
(and not use ginkgo expectations), so it can actually be used
as intended inside `Eventually` blocks without blow up at the
first failure.
Signed-off-by: Francesco Romani <fromani@redhat.com>
`f framework.Framework` does not need to be global, it's used only on a few
places.
This fixes vSphereDriver.PrepareTest() in in_tree.go that schedules
ginkgo.DeferCleanup() that uses the global `f` variable, but its value is not
valid at the time of ginkgo cleanup.
we should only use this env var for `arm`, since `arm64` is fully
supported by etcd folks, let us drop this!
(ex - https://github.com/etcd-io/etcd/releases/tag/v3.5.6)
ppc64le comment should be dropped as well
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The check for "resources available on a node" must treat nodes that are not
listed as "no resources available". The previous logic only worked because all
nodes were listed during E2E testing. The upcoming integration testing is
covering additional scenarios and triggered this broken case.
It provides more readable output and has additional APIs for using it inside a
unit test. goleak.IgnoreCurrent is needed to filter out the goroutine that gets
started when importing go.opencensus.io/stats/view.
In order to handle background goroutines that get created on demand and cannot
be stopped (like the one for LogzHealth), a helper function ensures that those
are running before calling goleak.IgnoreCurrent. Keeping those goroutines
running is not a problem and thus not worth the effort of adding new APIs to
stop them.
Other goroutines are genuine leaks for which no fix is available. Those get
suppressed via IgnoreTopFunction, which works as long as that function
is unique enough.
Example output for the leak fixed in https://github.com/kubernetes/kubernetes/pull/115423:
E0202 09:30:51.641841 74789 etcd.go:205] "EtcdMain goroutine check" err=<
found unexpected goroutines:
[Goroutine 4889 in state chan receive, with k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop on top of the stack:
goroutine 4889 [chan receive]:
k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc0076183c0)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/watch/mux.go:268 +0x65
created by k8s.io/apimachinery/pkg/watch.NewBroadcaster
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/watch/mux.go:77 +0x116
>