e2e_node test depend on very specific shared state (node state).
Pod leakages between tests oftentimes cause the test preconditions
to be silently corrupted, causing hard to debug CI failures.
Use the new facility to annotate pods with test owner (= the
test code which created the test) to help debug these failures.
For more context, please check the conversation in #123468
Signed-off-by: Francesco Romani <fromani@redhat.com>
e2e_node test depend on very specific shared state (node state).
Pod leakages between tests oftentimes cause the test preconditions
to be silently corrupted, causing hard to debug CI failures.
We add the option to add an annotation to pods which records
the code line (source code:line) which triggered the pod creation,
so it becomes easier to track which test needs better cleanup.
The relevant e2e framework code is used in all the e2e suites,
so to minimize any unwanted consequences we make the feature
opt-in, planning to enable it initially (and likely only)
in the e2e_node tests.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This should avoid the 30s delay caused by shell not responding to SIGTERM, and can only be killed by SIGKILL.
If the pod is deleted with the namespace during cleanup, this also makes cleanup faster, and frees up the resources for the next test cases faster.
The cancellation of the context happened after the cleanup of the apiserver, so
clients using that context were kept running. That wasn't the intent and causes
a slow shutdown because the apiserver delays its shutdown when it has active
clients.
The fix is to create a new cancellation context and to use that for the
clients. The automatic cancellation of it then happens before the apiserver
cleanup.
It previously assumed that pod-to-other-node-nodeIP would be
unmasqueraded, but this is not the case for most network plugins. Use
a HostNetwork exec pod to avoid problems.
This also requires putting the client and endpoint on different nodes,
because with most network plugins, a node-to-same-node-pod connection
will end up using the internal "docker0" (or whatever) IP as the
source address rather than the node's public IP, and we don't know
what that IP is.
Also make it work with IPv6.
The existing test had two problems:
- It only made connections from within the cluster, so for VIP-type
LBs, the connections would always be short-circuited and so this
only tested kube-proxy's LBSR implementation, not the cloud's.
- For non-VIP-type LBs, it would only work if pod-to-LB connections
were not masqueraded, which is not the case for most network
plugins.
Fix this by (a) testing connectivity from the test binary, so as to
test filtering external IPs, and ensure we're testing the cloud's
behavior; and (b) using both pod and node IPs when testing the
in-cluster case.
Also some general cleanup of the test case.