This extends the existing "ephemeral volume" tests to also cover
generic ephemeral inline volumes. They get instantiated for all
drivers (CSI and others) which support persistent volume provisioning,
for several different filesystems.
Configuring the number of inline volumes via a flag with a computed
name had been identified as problematic before and now gets removed
because re-using the tests as a stress test with a higher number of
volumes should be added and configured separately.
The test steps are as follows:
1. Write some data
2. Take a snapshot
3. Write more data
4. Create a new volume from snapshot
5. Validate data is the old data
1. Use ginkgo before each to do common setup
2. Use volume resource to create SC, PV, PVC and handle cleanup
3. Add SnapshotResource to handle creating and cleanup of VS, VSC, VSClass
4. Add test pattern for deletion policy: Delete vs Retain
5. Use test pattern to determine test behaviour
6. Add test pattern for preprovisioned snapshot (not implemented)
These changes are made to consolidate common setup steps and stop resource
leaks by waiting for objects to be deleted.
By creating CSIStorageCapacity objects in advance, we get the
FailedScheduling pod event if (and only if!) the test is expected to
fail because of insufficient or missing capacity. We can use that as
indicator that waiting for pod start can be stopped early. However,
because we might not get to see the event under load, we still need
the timeout.
Setting testParameters.scName had no effect because
StorageClassTest.StorageClassName isn't used anywhere. Instead, the
storage class name is generated dynamically.
When using the entire test name as file name, the name became too
long (> 256 characters, which wasn't supported by all file systems)
and the artifact directory got cluttered.
The original reason (a limitation in Gubernator) no longer applies
because Spyglass is used now for log viewing.
This drops testfiles.ReadOrDie and updated testfiles.Exists to return an
error, forcing the caller to decide whether to call framework.Fail or do
something else.
It makes for a slightly less friendly API, but also means the package is
decoupled from framework again, as per the comments at the top of the
file
As its name, DeprecatedMightBeMasterNode is deprecated.
In e2e metrics, the function was used for knowing master node name to
get metrics from kube-scheduler and kube-controller-manager pods.
This make e2e metrics get these metrics directly by getting those pod
names without calling DeprecatedMightBeMasterNode().
There were nits in invokeStaleDummyVMTestWithStoragePolicy() like
- The error message didn't contain necessary space
- IsVMPresent() can return an error, but lack of the error handling
- IsVMPresent() returns true/false, but didn't use ExpectEqual() and
less code readability
This fixes those things.
WaitForStableCluster() checks all pods run on worker nodes, and the
function used to refer master nodes to skip checking controller plane
pods.
GetMasterAndWorkerNodes() was used for getting master nodes, but the
implementation is not good because it usesDeprecatedMightBeMasterNode().
This makes WaitForStableCluster() refer worker nodes directly to avoid
using GetMasterAndWorkerNodes().
The behavior for exec'ing a backgrounded command is not specified
with CRI so modify the test to run the command directly instead
of using exec.
Signed-off-by: Mrunal Patel <mpatel@redhat.com>
The way gingko handles interrupts is:
- It starts running AfterSuite hooks in a separate goroutine (this includes cleanupAction hooks)
- Once AfterSuite hook is done executing it calls
os.Exit(1) on test suite.
So how cleanupFunc() that runs via defer in test can be interrupted
is:
- cleanupFunc starts running via defer (or AfterEach hook) but first
thing that function does is to remove cleanupHandle from
framework.RemoveCleanupAction.
- Test suite receives interrupt from user and AfterSuite block
starts executing
- remember that while cleanupFunc is running in goroutine#1,
AfterSuite is running concurrently in goroutine#2.
- AfterSuite hook has bunch of CleanupActions it needs to run which
were registered via framework.AddCleanupAction(cleanupFunc) but
once cleanupFunc starts executing via defer in the test, it will
remove the cleanupHandle from framework's aftersuite hooks.
- So if AfterSuite did not had anything to run (because
those actions were removed via framework.RemoveCleanupAction
then it will simply go to the last framework.AfterEach action and call os.Exit(1)
- So if os.Exit(1) is called before cleanupFunc has a chance to finish in defer, it will not complete.
ginkgo has a weird bug that - AfterEach does not get called when
testsuite exits with certain kind of interrupt (Ctrl-C for example).
More info - https://github.com/onsi/ginkgo/issues/222
We workaround this issue in Kubernetes by adding a special hook into
AfterSuite call, but AfterSuite can not be used to peforms certain
kind of cleanup because it can race with AfterEach hook and
framework.AfterEach hook will set framework.ClientSet to nil.
This presents a problem in cleaning up CSI driver and testpods. This
PR removes cleanup of driver manifest via CleanupAction because that
is not safe and racy (such as f.ClientSet may disappear!) and makes
AfterSuite hooks run in a ordered fashion