The e2e that create/deletes pods rapidly and verifies their status
was reporting a very long timing:
timings total=12.211347385s t=491ms run=2s execute=450402h8m25s
in a few scenarios. Add error checks that clarify when this happens
and why. Report p50/75/90/99 latencies on teardown as observed from
the test for baseline for future changes.
A previous commit created a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
The kubelet would attempt to create a new sandbox for a pod whose
RestartPolicy is OnFailure even after all container succeeded. It caused
unnecessary CRI and CNI calls, confusing logs and conflicts between the
routine that creates the new sandbox and the routine that kills the Pod.
This patch checks the containers to start and stops creating sandbox if
no container is supposed to start.
WaitForPod*() are just wrapper functions for e2epod package, and they
made an invalid dependency to sub e2e framework from the core framework.
So this replaces WaitForPodRunning() with the e2epod function.
The condition was not part of the message and so would not
match:
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/128aea1f-bde3-43d5-8b5f-dd86b9a5ef33/volumes/kubernetes.io~secret/default-token-v55hm\\\" to rootfs \\\"/var/lib/docker/overlay2/813487ba91d534ded546ae34f2a05e7d94c26bd015d356f9b2641522d8f0d6da/merged\\\" at \\\"/var/run/secrets/kubernetes.io/serviceaccount\\\" caused \\\"stat /var/lib/kubelet/pods/128aea1f-bde3-43d5-8b5f-dd86b9a5ef33/volumes/kubernetes.io~secret/default-token-v55hm: no such file or directory\\\"\"": unknown
Updated the check and regex.
The kubelet can race when a pod is deleted and report that a container succeeded
when it instead failed, and thus the pod is reported as succeeded. Create an e2e
test that demonstrates this failure.
This is currently the top flake against PRs, so I'm tagging it
as [Flaky]. Flaky tests can't be conformance tests, so I'm
removing it from [Conformance] as well until this is resolved.
There are some functions of e2e test framework and it is useful to
read the test code by using these functions.
This replaces gomega calls with these functions under test/e2e/node/
This is part of the transition to using framework/log instead
of the Logf inside the framework package. This will help with
import size/cycles when importing the framework or subpackages.