* It is observed in some of the periodic job results that the kubelet along with few other logs
are not getting copied to the artifacts directory once the node e2e tests are executed
* Following is the sample error log that is displayed once the tests are run
```
I1031 13:15:49.056897 40204 ssh.go:146] Running the command ssh, with args: [-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o CheckHostIP=no -o StrictHostKeyChecking=no -o ServerAliveInterval=30 -o LogLevel=ERROR -i /home/svanka/.ssh/google_compute_engine core@35.185.108.51 -- sudo ls core@35.185.108.51:/tmp/node-e2e-20231031T125637/results/*.log]
E1031 13:16:15.346641 40204 ssh.go:149] failed to run SSH command: out: ls: cannot access 'core@35.185.108.51:/tmp/node-e2e-20231031T125637/results/*.log': No such file or directory
, err: exit status 2
```
* This change fixes the above issue and helps in gathering the required test artifacts once the tests execution is completed
Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
Add retry logic to the `assertConsistentConnectivity` function from
the `test/e2e/windows/hybrid_network.go` file.
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
This test depends on CDI support in a runtime and doesn't work
with the out-of-the box Containerd. Marking it as a NodeSpecialFeature
should fix Containerd CI job failures.
update pod replacement policy feature flag comment and refactor the e2e test for pod replacement policy
minor fixes for pod replacement policy and e2e test
fix wrong assertions for pod replacement policy e2e test
more fixes to pod replacement policy e2e test
refactor PodReplacementPolicy e2e test to use finalizers
fix unit tests when pod replacement policy feature flag is promoted to beta
fix podgc controller unit tests when pod replacement feature is enabled
fix lint issue in pod replacement policy e2e test
assert no error in defer function for removing finalizer in pod replacement policy e2e test
implement test using a sh trap for pod replacement policy
reduce sleep after SIGTERM in pod replacement policy e2e test to 5s
Linting together with an upcoming klog update finds this problem:
test/images/sample-device-plugin/sampledeviceplugin.go:165:4: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
^
It looks like the test or the branch is never executed, because it wouldn't
pass: a []v1.NodeIP is value is never the same as []string. Found by the
upcoming ginkgolinter update.
ERROR: test/e2e_node/pod_host_ips.go:167:45: ginkgo-linter: use Equal with different types: Comparing []k8s.io/api/core/v1.HostIP with []string; either change the expected value type if possible, or use the BeEquivalentTo() matcher, instead of Equal() (ginkgolinter)
ERROR: gomega.Expect(p.Status.HostIPs).Should(gomega.Equal(nodeIPs))
ERROR: ^
It looks like the test is never executed, because it wouldn't pass: an int32
value is never the same as an int 0. Found by the upcoming ginkgolinter update.
Container runtimes like CRI-O actually show the image identifier in the
`ImageID` field rather than the repo digest. For the digest we already
have the `Image` field. We still allow the digest in the `ImageID` field
for historic reasons.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Currently, the downward API tests flake on Windows with a failure
to allocate memory when starting the agnhost binary used in these
tests. The tests are spawning pods with a memory limit of 64MB,
which is a bit on the low side for a Windows Pod, even if it's
a nanoserver-based image.
Increases the memory limit to 128MB, the primary goal of the tests
is not to enforce and test the limits, but to check if these details
are projected into the Pod.
* cleanup: refactor pod replacement policy integration test into staged assertion
* cleanup: remove typo in job_test.go
* refactor PodReplacementPolicy test and remove test for defaulting the policy
* fix issue with missing update in job controller for terminating status and refactor pod replacement policy integration test
* use t.Cleanup instead of defer in PodReplacementPolicy integration tests
* revert t.Cleanup to defer for reseting feature flag in PodReplacementPolicy integration tests
Currently in the tests there is ambiguity in terms of host setup
when it comes to cpus or cores. This commit disambiguates that.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
* Promote plugin resolution to beta
* Not use plugin for kubectl create -f command execution
`kubectl create -f` is legitimate command execution and we shouldn't
search plugins if user invokes this.
* Add integration test for plugin resolution for create command
* Reintroduce feature flag to ability to disable it explicitly
There are some tests which want to insert a tag before the main Describe text,
for example:
sigDescribe("[Feature:Windows] Cpu Resources [Serial]",
skipUnlessWindows(func() { ... })
In order to support this without change existing test names, it must be
possible to do this instead:
sigDescribe(feature.Windows, "Cpu Resources", framework.WithSerial(),
skipUnlessWindows(func() { ... })
There are similar examples for the other functions.
While at it, replace one left-over panic with ReportBug and add the missing
`NodeFeature:` prefix.
The sysctl tests have to be skipped when the node components are running in UserNS,
because the tests fail due to `open /proc/sys/kernel/shm_rmid_forced: permission denied`
(as expected).
Can be verified with Rootless kind (https://kind.sigs.k8s.io/docs/user/rootless/):
```
dockerd-rootless-setuptool.sh install
: The following steps are added because 'kubetest2 kind --build' does not seem to build e2e.test and ginkgo
make WHAT=test/e2e/e2e.test
make ginkgo
cp -f _output/bin/{e2e.test,ginkgo} _output/dockerized/bin/linux/amd64
kubetest2 kind --build --up --down --test=ginkgo -- \
--use-built-binaries \
--focus-regex='\[NodeConformance\]' \
--skip-regex='\[Environment:NotInUserNS\]'
```
Test with the following host environment:
- kubernetes-sigs/kind@ac28d7fb19 (main)
- kubernetes-sigs/kubetest2@89f09b65e8 (master)
- Docker 24.0.6
- Ubuntu 22.04 amd64, kernel 5.15
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Using the feature gate has the advantage that the stability tag gets added
automatically and no changes are needed when graduating it. Once it reaches GA,
the tag needs to be removed together with the feature gate.
In fact, the current `[ALPHA]` is already wrong: the feature has already
graduated to beta...
After a CRD or an APIService was deleted, the corresponding group was
never unregistered. It caused a stale entry to remain in the root path
and could potentially lead to memory leak as the groupDiscoveryHandler
was never released and the handledGroups was never cleaned up.
The commit implements the cleanup. It tracks each group's usage and
unregister the a group when there is no version for this group.
Signed-off-by: Quan Tian <qtian@vmware.com>
* Job: Handle error returned from AddEventHandler function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Use the error message the similar to CronJob
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Clean up error messages
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the tesing.T on the second place in the args for the newControllerFromClient function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Call t.Helper()
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Adapt TestFinializerCleanup to the eventhandler error
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
---------
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
TestSampleAPIServer tried to validate APIService deletion, but it used
an unmatched selector to delete and list APIServices, which essentially
validated nothing.
Signed-off-by: Quan Tian <qtian@vmware.com>
Some test cases can make nodes not ready and use DeferCleanup to bring
nodes back online. Checking if all nodes are online would fail
in such cases as AfterEach runs before DeferCleanup.
Scheduling nodes readines check to DeferCleanup should solve this
issue as nodes would be brought back to a `Ready` state before the
check.
This is a workaround for the issue that the kubelet cannot differentiate
the container statuses of the previous podSandbox from the current one.
If the node is rebooted, all containers will be in the exited state and
the kubelet will try to recreate a new podSandbox. In this case, the
kubelet should not mistakenly think that the newly created podSandbox
has been initialized.
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.
This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).
In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
These were found with a modified klog that enables "go vet" to check klog call
parameters:
cmd/kubeadm/app/features/features.go:149:4: printf: k8s.io/klog/v2.Warningf format %t has arg v of wrong type string (govet)
klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v)
test/images/sample-device-plugin/sampledeviceplugin.go:147:5: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("error: %w", err)
test/images/sample-device-plugin/sampledeviceplugin.go:155:3: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
staging/src/k8s.io/code-generator/cmd/prerelease-lifecycle-gen/prerelease-lifecycle-generators/status.go:207:5: printf: k8s.io/klog/v2.Fatalf does not support error-wrapping directive %w (govet)
klog.Fatalf("Package %v: unsupported %s value: %q :%w", i, tagEnabledName, ptag.value, err)
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:286:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider cache, trying node informer")
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:302:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider caches, trying the API server")
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.
https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ
Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
* Add warning handler callback function in shortcut expander
Currently, errors in client-go are propagated back to the callers via
function returns. However, there is no elegant way for just warning users.
For example, when user wants to get a resource with it's short name format
and if there are multiple resources belonging to this short name, we need to
warn user about this ambugity which one is picked and which ones are discarded.
Not only to overcome this particular case mentioned above, but also propose a
way for the possible warnings in the future, this commit adds a warningHandler
callback function in shortcutExpander.
* Add warningPrinter functionality in ConfigFlags
ConfigFlags has neither warning user in a standardized
format functionality nor passing warning callback functions to other upper level
libraries such as client-go.
This commit adds an ability that user can set warningPrinters
according to their IOStreams and this warningPrinters will be used
to raise possible warnings happening not only in cli-runtime but
also in client-go.
* Pass warning callback function in ConfigFlags to shortcutExpander
This commit passes warning callback function to print possible
warnings happened in shortcut expander to warn user in a
standardized format.
* Add integration test for CRDs having ambiguous short names
This commit adds integration test to assure that warning message
related to this ambiguity is printed when resources are being retrieved via their short name
representations in cases where multiple resources have same
short names.
This integration test also ensures that the logic behind which resource
will be selected hasn't been changed which may cause disperancies in
clusters.
* Remove defaultConfigFlag global variable
* Move default config flags initialization into function
* Skip warning for versions of same group/resource
* Run update-vendor
* Warn only once when there are multiple versions registered for ambiguous resource
* Apply gocritic review
* Add multi-resource multi-version ambiguity unit test
The kubelet restarts working pods with an exponential back-off delay,
with a maximum cap of 5 minutes. The waiting 1 minutes may happen to be
in back-off time.
Signed-off-by: Ruquan Zhao <ruquan.zhao@arm.com>
framework.SIGDescribe is better because:
- Ginkgo uses the source code location of the test, not of the wrapper,
when reporting progress.
- Additional annotations can be passed.
To make this a drop-in replacement, framework.SIGDescribe generates a function
that can be used instead of the former SIGDescribe functions.
windows.SIGDescribe contained some additional code to ensure that tests are
skipped when not running with a suitable node OS. This gets moved into a
separate wrapper generator, to allow using framework.SIGDescribe as intended.
To ensure that all callers were modified, the windows.sigDescribe isn't
exported anymore (wasn't necessary in the first place!).
These wrapper functions set labels in addition to injecting the annotation into
the test text. It then becomes possible to select tests in different ways:
ginkgo -v --focus="should respect internalTrafficPolicy.*\[FeatureGate:ServiceInternalTrafficPolicy\]"
ginkgo -v --label-filter="FeatureGate:ServiceInternalTrafficPolicy"
ginkgo -v --label-filter="Beta"
When a test runs, ginkgo shows it as:
[It] should respect internalTrafficPolicy=Local Pod to Pod [FeatureGate:ServiceInternalTrafficPolicy] [Beta] [FeatureGate:ServiceInternalTrafficPolicy, Beta]
The test name and the labels at the end are in different colors. Embedding the
annotations inside the text is redundant and only done because users of the e2e
suite might expect it. Also, our tooling that consumes test results currently
doesn't know about ginkgo labels.
Environments, features and node features as described by
https://github.com/kubernetes/enhancements/tree/master/keps/sig-testing/3041-node-conformance-and-features
are also supported.
The framework and thus (at the moment) test/e2e do not have any pre-defined
environments and features. Adding those and modifying tests will follow in
a separate commit.
If something goes wrong during the test registration phase, the only solution
so far was to panic. This is not user-friendly and only allows to report one
problem at a time.
If initialization can continue, then a better solution is to record a bug,
continue, and then report all bugs together.
This also works when just listing tests. The new verify-e2e-suites.sh uses that
to check all test suites (identified as "packages that call
framework.AfterReadingAllFlags", with some exceptions) as part of
pull-kubernetes-verify.
Example output for a fake
framework.RecordBug(framework.NewBug("fake bug during SIGDescribe", 0))
in test/e2e/storage/volume_metrics.go:
```
$ hack/verify-e2e-suites.sh
go version go1.21.1 linux/amd64
ERROR: E2E test suite invocation failed for test/e2e.
ERROR: E2E suite initialization was faulty, these errors must be fixed:
ERROR: test/e2e/storage/volume_metrics.go:49: fake bug during SIGDescribe
E2E suite test/e2e_kubeadm passed.
E2E suite test/e2e_node passed.
```
-list-tests is a more concise alternative for `ginkgo --dry-run` with one line
per test. In contrast to `--dry-run`, it really lists all tests. `--dry-run`
without additional parameters uses the default skip expression from the E2E
context, which filters out flaky and feature-gated tests. The output includes
the source code location where each test is defined. It is sorted by test
name (not source code location) because that order is independent of
reorganizing the source code and ordering by location can be achieved with
"sort".
-list-labels has no corresponding feature in Ginkgo.
One possible usage is to figure out what values might make sense for
-focus/skip/label-filter.
Unit tests will follow in a future commit.
Always printing "Enabling in-tree volume drivers" whenever the E2E suite is
initializing doesn't provide any useful information and makes output of the
upcoming -list-tests look weird.
Rate limitter.go file is a generic file implementing
grpc Limiter interface. This file can be reuse by other gRPC
API not only by podresource.
Change-Id: I905a46b5b605fbb175eb9ad6c15019ffdc7f2563