By creating CSIStorageCapacity objects in advance, we get the
FailedScheduling pod event if (and only if!) the test is expected to
fail because of insufficient or missing capacity. We can use that as
indicator that waiting for pod start can be stopped early. However,
because we might not get to see the event under load, we still need
the timeout.
Setting testParameters.scName had no effect because
StorageClassTest.StorageClassName isn't used anywhere. Instead, the
storage class name is generated dynamically.
kubelet sometimes calls NodeStageVolume an NodePublishVolume too
often, which breaks this test and leads to flakiness. The test isn't
about that, so we can relax the checking and it still covers what it
was meant to cover.
The timeout for the two loops inside the test itself are now bounded
by an upper limit for the duration of the entire test instead of
having their own, rather arbitrary timeouts.
The "error waiting for expected CSI calls" is redundant because it's
immediately followed by checking that error with:
framework.ExpectNoError(err, "while waiting for all CSI calls")
The mock driver gets instructed to return a ResourceExhausted error
for the first CreateVolume invocation via the storage class
parameters.
How this should be handled depends on the situation: for normal
volumes, we just want external-scheduler to retry. For late binding,
we want to reschedule the pod. It also depends on topology support.
The code became obsolete with the introduction of parseMockLogs
because that will retrieve the log itself. For debugging of a running
test the normal pod output logging is sufficient.
parseMockLogs is called potentially multiple times while waiting for
output. Dumping all CSI calls each time is quite verbose and
repetitive. To verify what the driver has done already, the normal
capturing of the container log can be used instead:
csi-mockplugin-0/mock@127.0.0.1: gRPCCall: {"Method":"/csi.v1.Node/NodePublishVolume","Request"...
As seen in some test
runs (https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041),
retrieving output can fail with "the server rejected our request for
an unknown reason (get pods csi-mockplugin-0)".
If this truly an intermittent error, then the existing retry logic in
the callers can deal with this.
Especially related to "uncertain" global mounts. A large refactoring of CSI
mock tests were necessary:
- to be able to script the driver to return errors as required by the test
- to parse the CSI driver logs to check kubelet called the right CSI calls
When deleting fails, the tests should be considered as failed,
too. Ignoring the error caused a wrong return code in the CSI mock
driver to go unnoticed (see
https://github.com/kubernetes-csi/csi-test/pull/250). The v3.1.0
release of the CSI mock driver fixes that.
The function is for persistent volumes and it doesn't have any
reason why it stays in core test framework. So this moves the
function into e2epv package for reducing e2e/framework/util.go
code.
We don't want to set the name directly because then starting the pod
can fail when the node is temporarily out of resources
(https://github.com/kubernetes/kubernetes/issues/87855).
For CSI driver deployments, we have three options:
- modify the pod spec with custom code, similar
to how the NodeSelection utility code does it
- add variants of SetNodeSelection and SetNodeAffinity which
work with a pod spec instead of a pod
- change their parameter from pod to pod spec and then use
them also when patching a pod spec
The last approach is used here because it seems more general. There
might be other cases in the future where there's only a pod spec that
needs to be modified.
In order to promote the volume limits e2e test (from CSI Mock driver)
to Conformance, we can't rely on specific output of optional Condition
fields. Thus, this commit changes the test to only check the presence
of the right condition and verify that the optional fields are not empty.
Many times an e2e test fails with an unexpected error,
"timed out waiting for the condition".
Useful information may be in the test logs, but debugging e2e test
failures will be much faster if we add context to errors when they
happen.
This change makes sure we add context to all errors returned from
helpers like wait.Poll().
While this is looser check than original check, I do not think
we can quite expect NodePublish and NodeUnpublish call counts to match
NodePublishvolume call count may not be same as NodeUnpublishVolume
call count because reconciler may have a mount operation queued up
while previous one is finishing. So, it is not unusual to have more than one
NodePublishVolume call for same pod+volume combination, similarly
unmount may also run more than once.
Once we have deleted the pod and the volume, we want to be sure that
NodeUnpublishVolume was called for it. The main motivation was to
check this for inline ephemeral volumes, but the same additional check
also makes sense for other volumes.
The feature is complete and supported by an increasing number of CSI
drivers, but before it can be really used, it should be moved out of
alpha into beta.
Moving pod related functions from e2e/framework/pv_util.go to
e2e/framework/pod in order to allow refactoring of pv_util.go into its
own package.
Signed-off-by: alejandrox1 <alarcj137@gmail.com>
The PodInfo tests can be extended to also cover the new
csi.storage.k8s.io/ephemeral flag. However, the presence of that flag
depends on whether inline volume support is enabled, so tests that run
with and without the feature have to detect that at runtime.
Other tests have a feature tag and thus can assume that they only run
when that feature is enabled. However, we need a newer csi-mock driver
before we can actually ask it to publish an ephemeral inline volume.