The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
The `runPausePod` timeout was 1 minute previously which appears to be
too short and timing out in some tests.
Switch to `f.Timeouts.PodStartShort` which is the common timeout used to wait
for pods to start which defaults to 5min.
Also refactor to remove `runPausePodWithoutTimeout` and instead rely on
`runPausePod` since we do not make the timeout customizable directly
(it can be changed via the test framework if desired).
Signed-off-by: David Porter <david@porter.me>
There are two runtime class tests which required the container runtime
config to include explicit configuration for `test-handler`. The current
logic skips these tests in non GCE environments. This skip is too strict
since the test is skipped in node e2e environments and in other
environments such as kind, which support running the test and also
configure `test-handler`.
Instead of skipping based on provider, add a new function
`NodeSupportsPreconfiguredRuntimeClassHandler` which examines the
underlying container runtime config and checks if the config includes
`test-handler`. The check is a bit brittle since it assumes container
runtime config paths, but it is a net improvement over skipping the test
entirely on non GCE environments.
This results in the test working in the common test environments, namely
GCE kube-up, node e2e, and kind.
Signed-off-by: David Porter <david@porter.me>
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
ginkgo.DeferCleanup has multiple advantages:
- The cleanup operation can get registered if and only if needed.
- No need to return a cleanup function that the caller must invoke.
- Automatically determines whether a context is needed, which will
simplify the introduction of context parameters.
- Ginkgo's timeline shows when it executes the cleanup operation.
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.
This is a first automated step towards that: the additional parameter got added
with
sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
$(git grep -l -e framework.ConformanceIt -e ginkgo.It )
$GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')
log_test.go was left unchanged.
The "pause" pods that are being run in the scheduling tests are
sometimes launched in system namespaces. Therefore even if a test
is considered to be running on a "baseline" Pod Security admission
level, its "baseline" pods would fail to run if the global PSa
enforcement policy is set to "restricted" - the system namespaces
have no PSa labels.
The "pause" pods run by this test can actually easily run with
"restricted" security context, and so this patch turns them
into just that.
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters
Signed-off-by: Dave Chen <dave.chen@arm.com>
* De-share the Handler struct in core API
An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.
This never should have been shared. Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.
In the future I can also see adding lifecycle hooks that don't make
sense as probes. E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.
* Run update scripts
The test "validates that there is no conflict between pods with same
hostPort but different hostIP and protocol" was testing the scheduler
capability to schedule pods on the same node with hostPorts, however,
it wasn´t validating that the HostPorts was working, causing false
positives, because the pods were scheduled, but the HostPort exposed
wasn´t working.
In order to test the HostPort functionality, we have to use HostNetwork
pods, that are incompatible with Windows platforms. Also, since this
is touching both network and scheduling, there is no clear the ownership,
but sig-network is happy to adopt it.
We also add a new test for scheduling only under "scheduling", so Windows
folks can use it to test the scheduled in that platform.
The test is not cleaning all pods it created.
Memory balancing pods are deleted once the test namespace is.
Thus, leaving the pods running or in terminating state when a new test is run.
In case the next test is "[sig-scheduling] SchedulerPredicates [Serial] validates resource limits of pods that are allowed to run",
the test can fail.
The e2e test, included as part of Conformance,
"validates that there is no conflict between
pods with same hostPort but different hostIP and protocol"
was only testing that the pods were scheduled without conflict
but was never testing the functionality.
The test should check that pods with containers forwarding the same
hostPort can be scheduled without conflict, and that those exposed
HostPort are forwarding the ports to the corresponding pods.
the predicate tests were using loopback addresses for the the
hostPort test, however, those have different semantics depending
on the IP family, i.e. you can not bind to ::1 and ::2 simultanously,
in addition, IP forwarding from localhost to localhost in IPv6 is
not working since it doesn't have the kernel route_localnet hack.
Currently when checking for unscheduled pods an exception will be raised
if a pod is not scheduled and the status is unknown. This update modifies
the logic to include any pod without a NodeName in the not scheduled
pods returned.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
When node scheduling tests were updated to use worker instead of master
nodes the GetPodsScheduled function, which is tasked with getting all
scheduled and not scheduled pods inadvertently was changed to ignore all
pods that have an empty NodeName before checking whether pods had been
scheduled or not. This updates the function to include pods without a
NodeName in the check for unscheduled pods.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Ready schedulable nodes are being inserted into an unitialized string
set, causing an assignment to entry in nil map in the underlying data
structure. This initializes the string set before attempting to insert
nodes.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
WaitForStableCluster() checks all pods run on worker nodes, and the
function used to refer master nodes to skip checking controller plane
pods.
GetMasterAndWorkerNodes() was used for getting master nodes, but the
implementation is not good because it usesDeprecatedMightBeMasterNode().
This makes WaitForStableCluster() refer worker nodes directly to avoid
using GetMasterAndWorkerNodes().
Conformance tests must not rely on the kubelet API in order to
pass. SchedulerPredicates tests attempt to use the kubelet API
in their BeforeEach, some of which are tagged as Conformance.
Is there a compelling reason to use the kubelet's view of pods
for a given node instead of the apiserver's view of the pods?
This is gross but because NewDeleteOptions is used by various parts of
storage that still pass around pointers, the return type can't be
changed without significant refactoring within the apiserver. I think
this would be good to cleanup, but I want to minimize apiserver side
changes as much as possible in the client signature refactor.