The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
Instead of pod responses being printed to the log each time polling fails, we
get a consolidated failure message with all unexpected pod responses if (and
only if) the check times out or a progress report gets produced.
This renames PodsResponding to WaitForPodsResponding for the sake of
consistency and adds a timeout parameter. That is necessary because some other
users of NewProxyResponseChecker used a much lower timeout (2min vs. 15min).
Besides simplifying some code, it also makes it easier to rewrite
ProxyResponseChecker because it only gets used in WaitForPodsResponding.
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.
Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
WaitForPods is now a generic function which lists pods and then checks the pods
that it found against some provided condition. A parameter determines how many
pods must be found resp. match the condition for the check to succeed.
The code becomes simpler (78 insertions, 91 deletions), easier to read (all
code entirely inside WaitForPodsRunningReady, no need to declare and later
overwrite variables) and possibly more correct (if all API calls failed,
the resulting error was ignored when allowedNotReadyPods > 0).
None of the users of the functions passed anything other than nil or an empty
map and the implementation ignore the parameter - it seems like a candidate for
simplification.
When a Gomega failure is converted to an error, the stack at the time when the
failure occurs may be useful: error wrapping provides some bread crumbs that
can be followed to determine where the failure really occurred, but error
wrapping may be missing or ambiguous.
To provide the additional information, a FailureError now includes a full stack
backtrace. The backtrace intentionally makes no attempt to exclude framework
functions besides the gomega support itself because helpers like
e2e/framework/pod may be relevant.
That backtrace is not included in the failure message for the sake of
brevity. Instead, it gets logged as part of the test's output.
gomega.Eventually provides better progress reports: instead of filling up the
log with rather useless one-line messages that are not enough to to understand
the current state, it integrates with Gingko's progress reporting (SIGUSR1,
--poll-progress-after) and then dumps the same complete failure message as
after a timeout. That makes it possible to understand why progress isn't
getting made without having to wait for the timeout.
The other advantage is that the failure message for some unexpected pod state
becomes more readable: instead of encapsulating it as "observed object" inside
an error, it directly gets rendered by gomega.
Calling gomega.Expect/Eventually/Consistently deep inside a helper call chain
has several challenges:
- the stack offset must be tracked correctly, otherwise the callstack
for the failure starts at some helper code, which is often not informative
- augmenting the failure message with additional information from each
caller implies that each caller must pass down a string and/or format
string plus arguments
Both challenges can be solved by returning errors:
- the stacktrace is taken at that level where the error is
treated as a failure instead of passing back an error, i.e.
inside the It callback
- traditional error wrapping can add additional information, if
desirable
What was missing was some easy way to generate an error via a gomega
assertion. The new infrastructure achieves that by mirroring the
Gomega/Assertion/AsyncAssertion interfaces with errors as return values instead
of calling a fail handler.
It is intentionally less flexible than the gomega APIs:
- A context must be passed to Eventually/Consistently as first
parameter because that is needed for proper timeout handling.
- No additional text can be added to the failure through this
API because error wrapping is meant to be used for this.
- No need to adjust the callstack offset because no backtrace
is recorded when a failure occurs.
To avoid the useless "unexpected error" log message when passing back a gomega
failure, ExpectNoError gets extended to recognize such errors and then skips
the logging.
Calling WaitForPodTerminatedInNamespace after testFlexVolume is useless because
the client pod that it waits for always gets deleted by testVolumeClient:
0fcc3dbd55/test/e2e/framework/volume/fixtures.go (L541-L546)
Worse, because WaitForPodTerminatedInNamespace treats "not found" as "must keep
polling", these two tests always kept waiting for 5 minutes:
Kubernetes e2e suite: [It] [sig-storage] Flexvolumes should be mountable
when non-attachable 6m4s
The only reason why these tests passed is that WaitForPodTerminatedInNamespace
used to return the "not found" API error. That is not guaranteed and about to
change.
* Add integration tests for StatefulSetStartOrdinal feature
* Move expensive test setup (apiserver and running controller) to be run once in StatefulSetStartOrdinal parameterized tests
Image versions are already explicitly set in our manifests
configuration, so tests should not be setting these values
to ensure we're using the same versions across the board.
Adds integration tests for the following scenarios with
MultiCIDRRangeAllocator enabled:
- ClusterCIDR is released when an associated node is deleted.
- ClusterCIDR delete when a node is associated, validate the finalizer
behavior, make sure that deleted ClusterCIDR is cleaned up after the
associated node is deleted.
- ClusterCIDR marked as terminating due to deletion must not be used for
allocating PodCIDRs to new nodes.
- Tie break behavior when multiple ClusterCIDRs are eligible to
allocate PodCIDRs to a node.
The device plugin test expects that no other pods are running prior to
the test starting. However, it has been observed that in some cases
some resources may still be around from previous tests. This is because
the deletion of resources from other tests is handled by deleting that
test's framework's namespace which is done asynchronously without
waiting for the other test's namespace to be deleted.
As a result, when the node e2e device plugin starts, there may still be
other pods in process of termination. To work around this, add a retry
to the device plugin test to account for the time it takes to delete the
resources from the prior test.
Signed-off-by: David Porter <david@porter.me>
In the `NodeSupportsPreconfiguredRuntimeClassHandler`, update the check
for the runtime handler to return a failure if the
`/etc/containerd/config.toml` or `/etc/crio/crio.conf` config files do
not exist. If an error is returned, then the underlying test will be
skipped.
Test manually with starting a kind cluster and moving the containerd
config file and verifying that the test is skipped:
```
$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
```
```
make WHAT="test/e2e/e2e.test"
$ ./_output/bin/e2e.test -kubeconfig /tmp/kubeconfig_kind -ginkgo.focus=".*should run a Pod requesting a RuntimeClass with a configured handler.*" --num-nodes=1 2>&1 -ginkgo.v=1 | tee -i "/tmp/build-log.txt"
[sig-node] RuntimeClass [It] should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler]
test/e2e/common/node/runtimeclass.go:85
[SKIPPED] Skipping test as node does not have E2E runtime class handler preconfigured in container runtime config: command terminated with exit code 1
```
Signed-off-by: David Porter <david@porter.me>
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
Instead of pod responses being printed to the log each time polling fails, we
get a consolidated failure message with all unexpected pod responses if (and
only if) the check times out or a progress report gets produced.
This renames PodsResponding to WaitForPodsResponding for the sake of
consistency and adds a timeout parameter. That is necessary because some other
users of NewProxyResponseChecker used a much lower timeout (2min vs. 15min).
Besides simplifying some code, it also makes it easier to rewrite
ProxyResponseChecker because it only gets used in WaitForPodsResponding.
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.
Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
The test validates the following endpoints
- deleteApiregistrationV1CollectionAPIService
- patchApiregistrationV1APIServiceStatus
- replaceApiregistrationV1APIService
- replaceApiregistrationV1APIServiceStatus
WaitForPods is now a generic function which lists pods and then checks the pods
that it found against some provided condition. A parameter determines how many
pods must be found resp. match the condition for the check to succeed.
The code becomes simpler (78 insertions, 91 deletions), easier to read (all
code entirely inside WaitForPodsRunningReady, no need to declare and later
overwrite variables) and possibly more correct (if all API calls failed,
the resulting error was ignored when allowedNotReadyPods > 0).
None of the users of the functions passed anything other than nil or an empty
map and the implementation ignore the parameter - it seems like a candidate for
simplification.
When a Gomega failure is converted to an error, the stack at the time when the
failure occurs may be useful: error wrapping provides some bread crumbs that
can be followed to determine where the failure really occurred, but error
wrapping may be missing or ambiguous.
To provide the additional information, a FailureError now includes a full stack
backtrace. The backtrace intentionally makes no attempt to exclude framework
functions besides the gomega support itself because helpers like
e2e/framework/pod may be relevant.
That backtrace is not included in the failure message for the sake of
brevity. Instead, it gets logged as part of the test's output.
gomega.Eventually provides better progress reports: instead of filling up the
log with rather useless one-line messages that are not enough to to understand
the current state, it integrates with Gingko's progress reporting (SIGUSR1,
--poll-progress-after) and then dumps the same complete failure message as
after a timeout. That makes it possible to understand why progress isn't
getting made without having to wait for the timeout.
The other advantage is that the failure message for some unexpected pod state
becomes more readable: instead of encapsulating it as "observed object" inside
an error, it directly gets rendered by gomega.
Calling gomega.Expect/Eventually/Consistently deep inside a helper call chain
has several challenges:
- the stack offset must be tracked correctly, otherwise the callstack
for the failure starts at some helper code, which is often not informative
- augmenting the failure message with additional information from each
caller implies that each caller must pass down a string and/or format
string plus arguments
Both challenges can be solved by returning errors:
- the stacktrace is taken at that level where the error is
treated as a failure instead of passing back an error, i.e.
inside the It callback
- traditional error wrapping can add additional information, if
desirable
What was missing was some easy way to generate an error via a gomega
assertion. The new infrastructure achieves that by mirroring the
Gomega/Assertion/AsyncAssertion interfaces with errors as return values instead
of calling a fail handler.
It is intentionally less flexible than the gomega APIs:
- A context must be passed to Eventually/Consistently as first
parameter because that is needed for proper timeout handling.
- No additional text can be added to the failure through this
API because error wrapping is meant to be used for this.
- No need to adjust the callstack offset because no backtrace
is recorded when a failure occurs.
To avoid the useless "unexpected error" log message when passing back a gomega
failure, ExpectNoError gets extended to recognize such errors and then skips
the logging.
Calling WaitForPodTerminatedInNamespace after testFlexVolume is useless because
the client pod that it waits for always gets deleted by testVolumeClient:
0fcc3dbd55/test/e2e/framework/volume/fixtures.go (L541-L546)
Worse, because WaitForPodTerminatedInNamespace treats "not found" as "must keep
polling", these two tests always kept waiting for 5 minutes:
Kubernetes e2e suite: [It] [sig-storage] Flexvolumes should be mountable
when non-attachable 6m4s
The only reason why these tests passed is that WaitForPodTerminatedInNamespace
used to return the "not found" API error. That is not guaranteed and about to
change.
The `runPausePod` timeout was 1 minute previously which appears to be
too short and timing out in some tests.
Switch to `f.Timeouts.PodStartShort` which is the common timeout used to wait
for pods to start which defaults to 5min.
Also refactor to remove `runPausePodWithoutTimeout` and instead rely on
`runPausePod` since we do not make the timeout customizable directly
(it can be changed via the test framework if desired).
Signed-off-by: David Porter <david@porter.me>
Adds integration tests for the following scenarios with
MultiCIDRRangeAllocator enabled:
- ClusterCIDR is released when an associated node is deleted.
- ClusterCIDR delete when a node is associated, validate the finalizer
behavior, make sure that deleted ClusterCIDR is cleaned up after the
associated node is deleted.
- ClusterCIDR marked as terminating due to deletion must not be used for
allocating Pod CIDRs to new nodes.
- Tie break behavior when multiple ClusterCIDRs are eligible to
allocate Pod CIDRs to a node.
PVCs using the ReadWriteOncePod access mode can only be referenced by a
single pod. When a pod is scheduled that uses a ReadWriteOncePod PVC,
return "Unschedulable" if the PVC is already in-use in the cluster.
To support preemption, the "VolumeRestrictions" scheduler plugin
computes cycle state during the PreFilter phase. This cycle state
contains the number of references to the ReadWriteOncePod PVCs used by
the pod-to-be-scheduled.
During scheduler simulation (AddPod and RemovePod), we add and remove
reference counts from the cycle state if they use any of these
ReadWriteOncePod PVCs.
In the Filter phase, the scheduler checks if there are any PVC reference
conflicts, and returns "Unschedulable" if there is a conflict.
This is a required feature for the ReadWriteOncePod beta. See for more context:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2485-read-write-once-pod-pv-access-mode#beta
Node E2E tests do not run a scheduler, so the host exec pod must have
the `spec.nodeName` set explicitly.
Signed-off-by: David Porter <david@porter.me>
The ClusterIP allocator tries to reserve on part of the ServiceCIDR
to allocate static IPs to the Services.
The heuristic of the allocator to obtain the offset was taking into
account the whole range size, not the IPs available in the range, the
subnet address and the broadcast address for IPv4 are not available.
This caused that for CIDRs with 16 hosts, /28 for IPv4 and /124 for
IPv6, the offset calculated was higher than the max number of available
addresses on the allocator, causing this to panic.
Change-Id: I6c6f527b0a600b3612be37769e405b8fb3dd33a8
There are two runtime class tests which required the container runtime
config to include explicit configuration for `test-handler`. The current
logic skips these tests in non GCE environments. This skip is too strict
since the test is skipped in node e2e environments and in other
environments such as kind, which support running the test and also
configure `test-handler`.
Instead of skipping based on provider, add a new function
`NodeSupportsPreconfiguredRuntimeClassHandler` which examines the
underlying container runtime config and checks if the config includes
`test-handler`. The check is a bit brittle since it assumes container
runtime config paths, but it is a net improvement over skipping the test
entirely on non GCE environments.
This results in the test working in the common test environments, namely
GCE kube-up, node e2e, and kind.
Signed-off-by: David Porter <david@porter.me>
When the e2e_node/checkpoint_container.go test was introduced no CRI
implementation supported the new CheckpointContainer RPC yet.
With the release of CRI-O 1.25 the CheckpointContainer is implemented
and the test has been extended to see if the content of the checkpoint
is as expected.
The test is skipped if the ContainerCheckpoint feature gate is disabled
or if the CRI implementation does not support the CheckpointContainer
RPC.
Signed-off-by: Adrian Reber <areber@redhat.com>
we are saving this information in an env variable `KUBE_INTEGRATION_ETCD_URL`
So just pick it up from there when needed. Currently when someone uses
framework.RunCustomEtcd directly, the global variable is *not* set and the
code that uses `GetEtcdURL` returns empty string.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Since we need to gather kubelet metrics for CPU Manager and Topology
Manager, renaming this function to a more generic name.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
The GlogSetter method is used by three components to change verbosity at
runtime through HTTP APIs. This used to work only for text output with klog
calls, but not for text output through the klog logger or for JSON output.
Now loggers can also provide a callback for changing their verbosity at
runtime. Implementing that implies that the Create factory method has to be
extended, which is an API break for the Go package, but not an API break for
the configuration file and command line flags, which is what matters for the
"api/v1" component API.
The condition methods will eventually all take a context. Since we
have been provided one, alter the accepted condition type and
change the four references in tree.
Collers of ExponentialBackoffWithContext should use a condition
aware function (ConditionWithContextFunc). If the context can be
ignored the helper ConditionFunc.WithContext can be used to convert
an existing function to the new type.
Using WaitTimeoutForPodRunningInNamespace followed by ExpectError was not very
precise (any error passed the check, not just the expected timeout) and
hard to read. Now the test's expectation is spelled out explicitly: the pod
must stay in pending.
These helper functions can be used in combination with
omega.Eventually/Consistently to implement polling of objects that is aware of
Kubernetes apiserver conventions:
- retry on certain errors instead of giving up,
with "not found" handling decided by the caller (may or may not
be fatal, depending on the test)
- sleep if requested by apiserver
The E2E framework contains several functions which only differ in how they get
name and namespace: from an API object (WaitForPodRunningInNamespace) or as
separate parameters (WaitTimeoutForPodRunningInNamespace).
NamespacedName and the NamedObject interface enable writing helper functions
that can be called with both an API object (like *v1.Pod, which implements
Object and thus NamedObject) and name+namespace string (via
NamespacedName).
The other advantage of NamespacedName is that the order of name and namespace
parameter cannot be mixed up.
NamespacedName was derived from k8s.io/apimachinery/pkg/types.NamespacedName. A
separate type in the framework package was chosen a) to avoid additional
imports in test code and b) because the interface might not be suitable for
k8s.io/apimachinery.