test/e2e/storage/testsuites creates volumes dynamically. Initially, the size of those volumes was
hard-coded in the test, which prevented using the tests with storage backends that couldn't support
that hard-coded size
The assumption so far was that all drivers support read/write
volumes. That might not necessarily be true, so we have to let the
test driver specify it and then test accordingly.
Another aspect that is worth testing is whether the driver correctly
creates a new volume for each pod even if the volume attributes are
the same. However, drivers are not required to do that, so again we
have to let the test driver specify that.
Using a "normal" CSI driver for an inline ephemeral volume may have
unexpected and potentially harmful effects when the driver gets a
NodePublishVolume call that it isn't expecting. To prevent that mistake,
driver deployments for a driver that supports such volumes must:
- deploy a CSIDriver object for the driver
- set CSIDriver.Spec.VolumeLifecycleModes such that it contains "ephemeral"
The default for that field is "persistent", so existing deployments
continue to work and are automatically protected against incorrect
usage.
For the E2E tests we need a way to specify the driver mode. The
existing cluster-driver-registrar doesn't support that and also was
deprecated, so we stop using it altogether and instead deploy and
patch a CSIDriver object.
The conceptual change is that the mode in which a volume gets handled
is derived from it's spec, not from the ability of the driver. In
practice, that is already how the code worked because it didn't
actually look at CSIDriver.Spec.Mode at all.
Therefore the code change itself is mostly just renaming "driver mode"
to "volume mode". In some places (CanDeviceMount, CanAttach) the
feature check that was used elsewhere seemed to be missing. Now their
code path for ephemeral volumes are also only entered if that feature
is enabled.
The sanity check whether a CSI driver is being used correctly still
needs to be implemented.
Related-to: https://github.com/kubernetes/kubernetes/issues/79624
One previously undocumented expectation is that
GetDynamicProvisionStorageClass can be called more than once per test
and then each time returns a new, unique storage class. The in-memory
implementation in driveroperations.go:GetStorageClass ensured that,
but loading from a .yaml file didn't. This caused the multivolume tests
to fail when applied to an already installed GCE driver with the
-storage.testdriver parameter.
It is useful to apply the storage testsuite also to "external" (=
out-of-tree) storage drivers. One way of doing that is setting up a
custom E2E test suite, but that's still quite a bit of work.
An easier alternative is to parameterize the Kubernetes e2e.test
binary at runtime so that it instantiates the testsuite for one or
more drivers. Some parameters have to be provided before starting the
test because they define configuration and capabilities of the driver
and its storage backend that cannot be discovered at runtime. This is
done by populating the DriverDefinition with the content of the file
that the new -storage.testdriver parameters points to.
The universal .yaml and .json decoder from Kubernetes is used. It's
flexible, but has some downsides:
- currently ignores unknown fields (see https://github.com/kubernetes/kubernetes/pull/71589)
- poor error messages when fields have the wrong type
Storage drivers have to be installed in the test cluster before
starting e2e.test. Only tests involving dynamically provisioned
volumes are currently supported.
CreateDriver (now called SetupTest) is a potentially expensive
operation, depending on the driver. Creating and tearing down a
framework instance also takes time (measured at 6 seconds on a fast
machine) and produces quite a bit of log output.
Both can be avoided for tests that skip based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.
The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.
To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).
Because we are changing the API already, also other changes are made:
- some function prototypes get simplified
- the naming of functions is changed to match their purpose
(tests aren't executed by the test suite, they only get defined
for later execution)
- unused methods get removed (TestSuite.skipUnsupportedTest is redundant)
This increases type safety and makes the code easier to read because
it becomes obvious that the "test resource" passed to some functions
must be the result of a previous CreateVolume.
This makes it possible to remove:
- functions that never did anything (the DeleteVolume methods in
drivers that never create a volume)
- type casts (in the DeleteVolume implementation)
- the unused DeleteVolume parameters
- the stand-alone DeleteVolume function (which would be just a non-nil
check)
GetPersistentVolumeSource and GetVolumeSource could also become
methods on more specific interfaces - they don't actually use anything
from TestDriver instance which provides them.
The main motivation however is to reduce the number of methods which
might need an explicit test config parameter.
Exposing framework.VolumeTestConfig as part of the testsuite package
API was confusing because it was unclear which of the values in it
really have an effect. How it was set also was a bit awkward: a test
driver had a copy that had to be overwritten at test runtime and then
might have been updated and/or overwritten again by the driver.
Now testsuites has its own test config structure. It contains the
values that might have to be set dynamically at runtime. Instead of
overwriting a copy of that struct inside the test driver, the test
driver takes some common defaults (specifically, the framework pointer
and the prefix) when it gets initialized and then manages its own
copy. For example, the hostpath driver has to lock the pods to a
single node.
framework.VolumeTestConfig is still used internally and test drivers
can decide to run tests with a fully populated instance if needed (for
example, after setting up an NFS server).
This makes it possible to use the testsuites package out-of-tree
without pulling in unnecessary dependencies and code (in
test/e2e/storage/vsphere) that defines tests that are not wanted in a
custom test suite.