In our current mock CSI driver e2e test, we are not waiting
for the CSI driver register successfully to perform test
including provision PVC. This can lead to timeout when the
csi driver takes longer to register the socket.
This change adds the waiting part so that the system will
wait for up to 10 minutes for the driver to be ready. This
normally won't take this long. However, under a resource
constraint environment it can take longer than expected time.
https://github.com/kubernetes/kubernetes/issues/93358
The way gingko handles interrupts is:
- It starts running AfterSuite hooks in a separate goroutine (this includes cleanupAction hooks)
- Once AfterSuite hook is done executing it calls
os.Exit(1) on test suite.
So how cleanupFunc() that runs via defer in test can be interrupted
is:
- cleanupFunc starts running via defer (or AfterEach hook) but first
thing that function does is to remove cleanupHandle from
framework.RemoveCleanupAction.
- Test suite receives interrupt from user and AfterSuite block
starts executing
- remember that while cleanupFunc is running in goroutine#1,
AfterSuite is running concurrently in goroutine#2.
- AfterSuite hook has bunch of CleanupActions it needs to run which
were registered via framework.AddCleanupAction(cleanupFunc) but
once cleanupFunc starts executing via defer in the test, it will
remove the cleanupHandle from framework's aftersuite hooks.
- So if AfterSuite did not had anything to run (because
those actions were removed via framework.RemoveCleanupAction
then it will simply go to the last framework.AfterEach action and call os.Exit(1)
- So if os.Exit(1) is called before cleanupFunc has a chance to finish in defer, it will not complete.
The mock driver gets instructed to return a ResourceExhausted error
for the first CreateVolume invocation via the storage class
parameters.
How this should be handled depends on the situation: for normal
volumes, we just want external-scheduler to retry. For late binding,
we want to reschedule the pod. It also depends on topology support.
Especially related to "uncertain" global mounts. A large refactoring of CSI
mock tests were necessary:
- to be able to script the driver to return errors as required by the test
- to parse the CSI driver logs to check kubelet called the right CSI calls
Many times an e2e test fails with an unexpected error,
"timed out waiting for the condition".
Useful information may be in the test logs, but debugging e2e test
failures will be much faster if we add context to errors when they
happen.
This change makes sure we add context to all errors returned from
helpers like wait.Poll().
- add csi pd driver manifests
- modify snapshottable test case
- fix tests of pod has to be created first for delay-binding PVC, otherwise PVC won't be bound
test/e2e/storage/testsuites creates volumes dynamically. Initially, the size of those volumes was
hard-coded in the test, which prevented using the tests with storage backends that couldn't support
that hard-coded size
The assumption so far was that all drivers support read/write
volumes. That might not necessarily be true, so we have to let the
test driver specify it and then test accordingly.
Another aspect that is worth testing is whether the driver correctly
creates a new volume for each pod even if the volume attributes are
the same. However, drivers are not required to do that, so again we
have to let the test driver specify that.
We need the 1.2.0 driver for that because that has support for
detecting the volume mode dynamically, and we need to deploy a
CSIDriver object which enables pod info (for the dynamic detection)
and both modes (to satisfy the new mode sanity check).
This ensures that the files are in sync with:
hostpath: v1.2.0-rc3
external-attacher: v2.0.1
external-provisioner: v1.3.0
external-resizer: v0.2.0
external-snapshotter: v1.2.0
driver-registrar/rbac.yaml is obsolete because only
node-driver-registrar is in use now and does not need RBAC rules.
mock/e2e-test-rbac.yaml was not used anywhere.
The README.md files were updated to indicate that these really are
files copied from elsewhere. To avoid the need to constantly edit
these files on each update, <version> is used as placeholder in the URL.
Using a "normal" CSI driver for an inline ephemeral volume may have
unexpected and potentially harmful effects when the driver gets a
NodePublishVolume call that it isn't expecting. To prevent that mistake,
driver deployments for a driver that supports such volumes must:
- deploy a CSIDriver object for the driver
- set CSIDriver.Spec.VolumeLifecycleModes such that it contains "ephemeral"
The default for that field is "persistent", so existing deployments
continue to work and are automatically protected against incorrect
usage.
For the E2E tests we need a way to specify the driver mode. The
existing cluster-driver-registrar doesn't support that and also was
deprecated, so we stop using it altogether and instead deploy and
patch a CSIDriver object.