* It is observed in some of the periodic job results that the kubelet along with few other logs
are not getting copied to the artifacts directory once the node e2e tests are executed
* Following is the sample error log that is displayed once the tests are run
```
I1031 13:15:49.056897 40204 ssh.go:146] Running the command ssh, with args: [-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o CheckHostIP=no -o StrictHostKeyChecking=no -o ServerAliveInterval=30 -o LogLevel=ERROR -i /home/svanka/.ssh/google_compute_engine core@35.185.108.51 -- sudo ls core@35.185.108.51:/tmp/node-e2e-20231031T125637/results/*.log]
E1031 13:16:15.346641 40204 ssh.go:149] failed to run SSH command: out: ls: cannot access 'core@35.185.108.51:/tmp/node-e2e-20231031T125637/results/*.log': No such file or directory
, err: exit status 2
```
* This change fixes the above issue and helps in gathering the required test artifacts once the tests execution is completed
Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
This test depends on CDI support in a runtime and doesn't work
with the out-of-the box Containerd. Marking it as a NodeSpecialFeature
should fix Containerd CI job failures.
update pod replacement policy feature flag comment and refactor the e2e test for pod replacement policy
minor fixes for pod replacement policy and e2e test
fix wrong assertions for pod replacement policy e2e test
more fixes to pod replacement policy e2e test
refactor PodReplacementPolicy e2e test to use finalizers
fix unit tests when pod replacement policy feature flag is promoted to beta
fix podgc controller unit tests when pod replacement feature is enabled
fix lint issue in pod replacement policy e2e test
assert no error in defer function for removing finalizer in pod replacement policy e2e test
implement test using a sh trap for pod replacement policy
reduce sleep after SIGTERM in pod replacement policy e2e test to 5s
Linting together with an upcoming klog update finds this problem:
test/images/sample-device-plugin/sampledeviceplugin.go:165:4: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
^
It looks like the test or the branch is never executed, because it wouldn't
pass: a []v1.NodeIP is value is never the same as []string. Found by the
upcoming ginkgolinter update.
ERROR: test/e2e_node/pod_host_ips.go:167:45: ginkgo-linter: use Equal with different types: Comparing []k8s.io/api/core/v1.HostIP with []string; either change the expected value type if possible, or use the BeEquivalentTo() matcher, instead of Equal() (ginkgolinter)
ERROR: gomega.Expect(p.Status.HostIPs).Should(gomega.Equal(nodeIPs))
ERROR: ^
It looks like the test is never executed, because it wouldn't pass: an int32
value is never the same as an int 0. Found by the upcoming ginkgolinter update.
Container runtimes like CRI-O actually show the image identifier in the
`ImageID` field rather than the repo digest. For the digest we already
have the `Image` field. We still allow the digest in the `ImageID` field
for historic reasons.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Currently, the downward API tests flake on Windows with a failure
to allocate memory when starting the agnhost binary used in these
tests. The tests are spawning pods with a memory limit of 64MB,
which is a bit on the low side for a Windows Pod, even if it's
a nanoserver-based image.
Increases the memory limit to 128MB, the primary goal of the tests
is not to enforce and test the limits, but to check if these details
are projected into the Pod.
* cleanup: refactor pod replacement policy integration test into staged assertion
* cleanup: remove typo in job_test.go
* refactor PodReplacementPolicy test and remove test for defaulting the policy
* fix issue with missing update in job controller for terminating status and refactor pod replacement policy integration test
* use t.Cleanup instead of defer in PodReplacementPolicy integration tests
* revert t.Cleanup to defer for reseting feature flag in PodReplacementPolicy integration tests
Currently in the tests there is ambiguity in terms of host setup
when it comes to cpus or cores. This commit disambiguates that.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
* Promote plugin resolution to beta
* Not use plugin for kubectl create -f command execution
`kubectl create -f` is legitimate command execution and we shouldn't
search plugins if user invokes this.
* Add integration test for plugin resolution for create command
* Reintroduce feature flag to ability to disable it explicitly
There are some tests which want to insert a tag before the main Describe text,
for example:
sigDescribe("[Feature:Windows] Cpu Resources [Serial]",
skipUnlessWindows(func() { ... })
In order to support this without change existing test names, it must be
possible to do this instead:
sigDescribe(feature.Windows, "Cpu Resources", framework.WithSerial(),
skipUnlessWindows(func() { ... })
There are similar examples for the other functions.
While at it, replace one left-over panic with ReportBug and add the missing
`NodeFeature:` prefix.
The sysctl tests have to be skipped when the node components are running in UserNS,
because the tests fail due to `open /proc/sys/kernel/shm_rmid_forced: permission denied`
(as expected).
Can be verified with Rootless kind (https://kind.sigs.k8s.io/docs/user/rootless/):
```
dockerd-rootless-setuptool.sh install
: The following steps are added because 'kubetest2 kind --build' does not seem to build e2e.test and ginkgo
make WHAT=test/e2e/e2e.test
make ginkgo
cp -f _output/bin/{e2e.test,ginkgo} _output/dockerized/bin/linux/amd64
kubetest2 kind --build --up --down --test=ginkgo -- \
--use-built-binaries \
--focus-regex='\[NodeConformance\]' \
--skip-regex='\[Environment:NotInUserNS\]'
```
Test with the following host environment:
- kubernetes-sigs/kind@ac28d7fb19 (main)
- kubernetes-sigs/kubetest2@89f09b65e8 (master)
- Docker 24.0.6
- Ubuntu 22.04 amd64, kernel 5.15
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Using the feature gate has the advantage that the stability tag gets added
automatically and no changes are needed when graduating it. Once it reaches GA,
the tag needs to be removed together with the feature gate.
In fact, the current `[ALPHA]` is already wrong: the feature has already
graduated to beta...
After a CRD or an APIService was deleted, the corresponding group was
never unregistered. It caused a stale entry to remain in the root path
and could potentially lead to memory leak as the groupDiscoveryHandler
was never released and the handledGroups was never cleaned up.
The commit implements the cleanup. It tracks each group's usage and
unregister the a group when there is no version for this group.
Signed-off-by: Quan Tian <qtian@vmware.com>
* Job: Handle error returned from AddEventHandler function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Use the error message the similar to CronJob
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Clean up error messages
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the tesing.T on the second place in the args for the newControllerFromClient function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Call t.Helper()
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Adapt TestFinializerCleanup to the eventhandler error
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
---------
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
TestSampleAPIServer tried to validate APIService deletion, but it used
an unmatched selector to delete and list APIServices, which essentially
validated nothing.
Signed-off-by: Quan Tian <qtian@vmware.com>
Some test cases can make nodes not ready and use DeferCleanup to bring
nodes back online. Checking if all nodes are online would fail
in such cases as AfterEach runs before DeferCleanup.
Scheduling nodes readines check to DeferCleanup should solve this
issue as nodes would be brought back to a `Ready` state before the
check.
This is a workaround for the issue that the kubelet cannot differentiate
the container statuses of the previous podSandbox from the current one.
If the node is rebooted, all containers will be in the exited state and
the kubelet will try to recreate a new podSandbox. In this case, the
kubelet should not mistakenly think that the newly created podSandbox
has been initialized.
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.
This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).
In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>