Many times an e2e test fails with an unexpected error,
"timed out waiting for the condition".
Useful information may be in the test logs, but debugging e2e test
failures will be much faster if we add context to errors when they
happen.
This change makes sure we add context to all errors returned from
helpers like wait.Poll().
A number of tests were using hardcoded image paths instead of
going through the imageutils package. The reason for centralizing
the logic there is to keep an eye on what images we use and where
they come from.
Moving pod related functions from e2e/framework/pv_util.go to
e2e/framework/pod in order to allow refactoring of pv_util.go into its
own package.
Signed-off-by: alejandrox1 <alarcj137@gmail.com>
This is part of the transition to using framework/log instead
of the Logf inside the framework package. This will help with
import size/cycles when importing the framework or subpackages.
Other tests that check for default storageclass also
check for cloudprovider such as gce, aws and openstack
and hence are already skipped in bare metal environments.
But this particular test keeps failing because no such check exists.
- moves these helper functions into e2e/framework/auth
- removes logging from helper functions
- in some cases explicitly returns errors that were implicitly
ignored/logged. In the situations where they should be ignored,
we explicitly check that the condition is met before ignoring it.
- fixes references of these methods to use the right package and
return values
The GlusterDynamicProvisioner test will not work on GKE because master
node is in a different project and cannot talk to the pod running on
node which is used for gluster provisioner. So add the code to skip the
test on GKE
e2e test "[It] should provision storage with different parameters"
depends on cloud providers as gce/gke, aws, openstack, vsphere and
azure. If the other cloud providers like local, the test is skipped.
However getRandomClusterZone() was called before the above cloud
provider check, and if the zone label is not found the test was failed.
This makes the test call getRandomClusterZone() only if necessary
to avoid such unnecessary failures.
Fixes: #73771
CreateDriver (now called SetupTest) is a potentially expensive
operation, depending on the driver. Creating and tearing down a
framework instance also takes time (measured at 6 seconds on a fast
machine) and produces quite a bit of log output.
Both can be avoided for tests that skip based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.
The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.
To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).
Because we are changing the API already, also other changes are made:
- some function prototypes get simplified
- the naming of functions is changed to match their purpose
(tests aren't executed by the test suite, they only get defined
for later execution)
- unused methods get removed (TestSuite.skipUnsupportedTest is redundant)
Whether the read test after writing was done on the same node was
random for drivers that weren't locked onto a single node. Now it is
deterministic: it always happens on the same node.
The case with reading on another node is covered separately for test
configurations that support it (not locked onto a single node, more
than one node in the test cluster).
As before, the TestConfig.ClientNodeSelector is ignored by the
provisioning testsuite.
TestDynamicProvisioning had multiple ways of choosing additional
checks:
- the PvCheck callback
- the builtin write/read check controlled by a boolean
- the snapshot testing
Complicating matters further, that builtin write/read test had been
more customizable with new fields `NodeSelector` and
`ExpectUnschedulable` which were only set by one particular test (see
https://github.com/kubernetes/kubernetes/pull/70941).
That is confusing and will only get more confusing when adding more
checks in the future. Therefore the write/read check is now a separate
function that must be enabled explicitly by tests that want to run it.
The snapshot checking is also defined only for the snapshot test.
The test that expects unschedulable pods now also checks for that
particular situation itself. Instead of testing it with two pods (the
behavior from the write/read check) that both fail to start, only a
single unschedulable pod is created.
Because node name, node selector and the `ExpectUnschedulable` were
only used for checking, it is possible to simplify `StorageClassTest`
by removing all of these fields.
Expect(err).NotTo(HaveOccurred()) is an anti-pattern in Ginkgo testing
because a test failure doesn't explain what failed (see
https://github.com/kubernetes/kubernetes/issues/34059). We avoid it
now by making the check function itself responsible for checking
errors and including more information in those checks.