The original logic was that dumping can stop (for example, due to
loosing the connection to the apiserver) and then will start again as
long as the container exists. That it duplicates output on restarts
is better than skipping output that might not have been dumped yet.
But that logic then also dumped the output of containers that have
terminated multiple times:
- logging is started, dumps all output and stops because the
container has terminated
- next check finds the container again, sees no active logger,
repeats
This wasn't a problem for short-lived logging in a custom
namespace (the way how it is done for CSI drivers in Kubernetes E2E),
but other testsuites (like the one from PMEM-CSI) keep logging running
for the entire test suite duration: there duplicate output became a
problem when adding driver redeployment as part of the suite's run.
To avoid duplicated output for terminated containers, which containers
have been handled is now stored permanently. For terminated containers,
restarting of dumping is prevented. This comes with the risk that if
the previous dumping ended before capturing all output, some output
will get lost.
Marking the start and stop of the log was also useful when streaming
to a single writer and thus gets enabled.
There were several sshPort values in e2e test packages because
we've migrated code from e2e framework by copying and pastting.
This adds common SSHPort on e2essh package to reduce such duplicated
code.
Conformance tests must not rely on the kubelet API in order to
pass. In this case, I think it's unnecessary to verify that a
kubelet observes the deletion within gracePeriod seconds. The
remaining checks in this test verify that pod deletion happens,
and that the pod is removed.
Conformance tests must not rely on the kubelet API in order to
pass. SchedulerPredicates tests attempt to use the kubelet API
in their BeforeEach, some of which are tagged as Conformance.
Is there a compelling reason to use the kubelet's view of pods
for a given node instead of the apiserver's view of the pods?
we print yaml, so you can use yaml tools like `yq`:
```
e2e.test --list-conformance-tests | yq r - --collect *.testname
```
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
it turns out that the e2e test was not using the timeout used to
hold the CLOSE_WAIT status, hence the test was flake depending
on how fast it checked the conntrack table.
This PR replaces the dependency on ssh using a pod to check the conntrack
entries on the host in a loop, to make the test more robust
and reduce the flakiness due to race conditions and/or ssh issues.
It also fixes a bug trying to grep the conntrack entry, where
the error was swallowed if a conntrack entry wasn't found.