With the change of the CRI-O jobs to use butane, we now have a
verification for base64 data urls in place. This means that the
following URL is invalid:
```
data:text/plain;base64,GCE_SSH_PUBLIC_KEY_FILE_CONTENT
```
This means we have to pass valid base64 to the URL. To fix that, we now
allow to inject SSH key values with both, the
`GCE_SSH_PUBLIC_KEY_FILE_CONTENT` field and its base64 encoded variant.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.
Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
The device plugin test expects that no other pods are running prior to
the test starting. However, it has been observed that in some cases
some resources may still be around from previous tests. This is because
the deletion of resources from other tests is handled by deleting that
test's framework's namespace which is done asynchronously without
waiting for the other test's namespace to be deleted.
As a result, when the node e2e device plugin starts, there may still be
other pods in process of termination. To work around this, add a retry
to the device plugin test to account for the time it takes to delete the
resources from the prior test.
Signed-off-by: David Porter <david@porter.me>
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.
Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
There are two runtime class tests which required the container runtime
config to include explicit configuration for `test-handler`. The current
logic skips these tests in non GCE environments. This skip is too strict
since the test is skipped in node e2e environments and in other
environments such as kind, which support running the test and also
configure `test-handler`.
Instead of skipping based on provider, add a new function
`NodeSupportsPreconfiguredRuntimeClassHandler` which examines the
underlying container runtime config and checks if the config includes
`test-handler`. The check is a bit brittle since it assumes container
runtime config paths, but it is a net improvement over skipping the test
entirely on non GCE environments.
This results in the test working in the common test environments, namely
GCE kube-up, node e2e, and kind.
Signed-off-by: David Porter <david@porter.me>
When the e2e_node/checkpoint_container.go test was introduced no CRI
implementation supported the new CheckpointContainer RPC yet.
With the release of CRI-O 1.25 the CheckpointContainer is implemented
and the test has been extended to see if the content of the checkpoint
is as expected.
The test is skipped if the ContainerCheckpoint feature gate is disabled
or if the CRI implementation does not support the CheckpointContainer
RPC.
Signed-off-by: Adrian Reber <areber@redhat.com>
Since we need to gather kubelet metrics for CPU Manager and Topology
Manager, renaming this function to a more generic name.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
All usage of builder pattern is convertible to cpuset.New()
with the same or fewer lines of code.
Migrate Builder.Add to a private method of CPUSet, with a comment
that it is only intended for internal use to preserve immutable
propoerty of the exported interface.
This also removes 'require' library dependency, which avoids
non-standard library usage.
In 'set', conversions to slice are done also, but with different names:
ToSliceNoSort() -> UnsortedList()
ToSlice() -> List()
Reimplement List() in terms of UnsortedList to save some duplication.
Removes exit/fatal from cpuset library.
Usage in podresources test was not necessary.
Library reference in cpu_manager_test was moved to a local function, and
converted to use e2e test framework error catching.
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
ginkgo.DeferCleanup has multiple advantages:
- The cleanup operation can get registered if and only if needed.
- No need to return a cleanup function that the caller must invoke.
- Automatically determines whether a context is needed, which will
simplify the introduction of context parameters.
- Ginkgo's timeline shows when it executes the cleanup operation.
- use `ginkgo.DeferCleanup` instead of clean up in the AfterEach block
- encourage use of ginkgo by not extending expect.go
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Some node e2e tests check for expected number of pods running
on the node to verify the correct state of that node after running
test scenarios. An example of such a check is in the device plugin
end to end test here: [1].
If the node is not left in a clean state after an e2e test finishes
running, it can lead to flaky tests because the node might have
unexpected pods running on the node.
In order to avoid that, we make sure that the test pods are
cleaned up after the test runs.
[1]: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/device_plugin_test.go#L189-L190
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.
This is a first automated step towards that: the additional parameter got added
with
sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
$(git grep -l -e framework.ConformanceIt -e ginkgo.It )
$GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')
log_test.go was left unchanged.
One of the cpumanager tests doesn't remove the pod
that got created during the test.
This causes pollution of other tests and failures
from time to time (depends on the test execution order).
In order to defalke the tests, we should delete the pod
and wait for it to be completely remove.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>