- Use preemptor pod's Status.NominatedNodeName to signal success of the Preemption behavior
- Optimize the test to eliminate unnecessary Pods creation
- Increase timeout from 1 minute to 2 minutes
GetPodLogs always fails when the tests fail, which is because the tests
specify wrong container names when getting logs.
When creating a client Pod, it specifies "<podName>-container" as
container name and "<podName>-" as Pod GenerateName. For instance,
podName "client-a" will result in "client-a-container" as the container
name and "client-a-vx5sv" as the actual Pod name, but it always uses the
actual Pod name to construct the container name when getting logs, e.g.
"client-a-vx5sv-container".
This patch fixes it by specifying the same static container name when
creating Pod and getting logs.
ServiceStartTimeout is defined at e2e core framework and the one of
the core is used at many places, but the one of this endpoints/ports.go
is not used at the other places.
So this removes the one of endpoints/ports.go for the cleanup.
The manifest list is stateful, which means that the same list will get amended
with each successive image published. That's unintended, and can lead to the
wrong image being pulled from the manifest list.
Resets the manifest list before amending new images into it.
The test was flaky because it required the job succeeds 3 times with
pseudorandom 50% failure chance within 15 minutes, while there is an
exponential back-off delay (10s, 20s, 40s …) capped at 6 minutes before
recreating failed pods. As 7 consecutive failures (1/128 chance) could
take 20+ minutes, exceeding the timeout, the test failed intermittently
because of "timed out waiting for the condition".
This PR forces the Pods of a Job to be scheduled to a single node and
uses a hostPath volume instead of an emptyDir to persist data across new
Pods.
The e2e test "Kubectl Port forwarding With a server listening .."
is failed sometimes due to the difference between expected data and
received data. To receive the data, the test does CloseWrite() but
it didn't have the corresponding error handling.
This adds it to investigate the issue more.
Not all errors will happen in sync during Instances.Insert(...).Do(), so
it is important to verify the operation object to see why insert fails.
An example is when exceeding the resource quota.
Eg.
could not create instance test-cos-beta-80-12739-29-0: [&{Code:QUOTA_EXCEEDED Location: Message:Quota 'CPUS' exceeded. Limit: 24.0 in region europe-west6. ForceSendFields:[] NullFields:[]}
This fixes the issue where tests will fail "silently" when instance
insert fails.