Commit Graph

1157 Commits

Author SHA1 Message Date
Davanum Srinivas
b1742f19ef Switch kube-controller-manager to distroless image
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-21 22:33:54 -04:00
Kubernetes Prow Robot
bded41a817 Merge pull request #90689 from aojea/nfsv6
add ipv6 support to the e2e nfs tests
2020-05-21 03:30:36 -07:00
Kubernetes Prow Robot
0e8a2d2244 Merge pull request #90793 from pohly/flaky-mount-volume-calls
mock e2e test: reduce flakiness by not testing all calls
2020-05-19 15:22:19 -07:00
Davanum Srinivas
07d88617e5 Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
9978c281ec Merge pull request #90773 from gnufied/fix-csi-e2e-orphans
Fix CSI e2e leaving pods in terminating state
2020-05-13 22:14:21 -07:00
Hemant Kumar
da941d8d3e Create mock CSI driver resources in different namespace 2020-05-13 11:16:00 -04:00
Hemant Kumar
708261e06c Make AfterSuite hooks ordered
ginkgo has a weird bug that - AfterEach does not get called when
testsuite exits with certain kind of interrupt (Ctrl-C for example).
More info - https://github.com/onsi/ginkgo/issues/222

We workaround this issue in Kubernetes by adding a special hook into
AfterSuite call, but AfterSuite can not be used to peforms certain
kind of cleanup because it can race with AfterEach hook and
framework.AfterEach hook will set framework.ClientSet to nil.

This presents a problem in cleaning up CSI driver and testpods. This
PR removes cleanup of driver manifest via CleanupAction because that
is not safe and racy (such as f.ClientSet may disappear!) and makes
AfterSuite hooks run in a ordered fashion
2020-05-13 11:15:27 -04:00
Kubernetes Prow Robot
620b7720e6 Merge pull request #90828 from gaurav1086/fix_data_race_storage
Fix date race in storage tests
2020-05-13 00:18:40 -07:00
Gaurav Singh
af74fbabf4 Remove unused err variable 2020-05-08 14:20:35 -04:00
Saikat Roychowdhury
dcfaaefc60 Pickup Snapshot Provisioner from the snapshot class "driver" info.
When using FromFile or FromExisitingClass options, snapshot provisioner
should be picked up from the "driver" tag of VolumeSnapshotClass object.
2020-05-08 05:45:36 +00:00
Kubernetes Prow Robot
7f78048594 Merge pull request #90781 from msau42/increase-timeout
Increase timeout waiting for driver to start on nodes
2020-05-06 22:23:08 -07:00
Gaurav Singh
37458b350e Fix date race in storage 2020-05-06 22:57:08 -04:00
Patrick Ohly
5aa3805a5f mock e2e test: reduce flakiness by not testing all calls
kubelet sometimes calls NodeStageVolume an NodePublishVolume too
often, which breaks this test and leads to flakiness. The test isn't
about that, so we can relax the checking and it still covers what it
was meant to cover.
2020-05-06 11:43:16 +02:00
Michelle Au
fc08f74157 Increase timeout waiting for driver to start on nodes to reduce test flakiness
Change-Id: Id553943e4473b387bf0ae14a18a90cb3a1bcd5c1
2020-05-05 18:10:10 -07:00
Kubernetes Prow Robot
fbacb6e264 Merge pull request #90335 from pohly/cleanup-late-binding
e2e storage: wait for PV deletion also for late binding
2020-05-04 18:05:07 -07:00
Antonio Ojea
26a00f9032 add ipv6 support to the e2e nfs tests
nfs mount command need to use the IP enclosed with square brackets
if is an IPv6 address
2020-05-03 11:06:10 +02:00
Patrick Ohly
e3d258d6ca e2e storage: wait for PV deletion also for late binding
When a test pattern or storage class uses late binding, the cleanup
code didn't know about the PV that may have been created for the PVC
since setting it up and thus then also didn't wait for PV deletion.

This is problematic for test isolation because the next test was
allowed to be started before fully cleaning up. Worse, it the driver
gets removed after the test, the volume might never get deleted.
2020-04-27 10:34:50 +02:00
Di Xu
3f5e09b6e2 add e2e tests for HostPathType and mark as slow 2020-04-23 10:52:40 +08:00
Kubernetes Prow Robot
fc9d174102 Merge pull request #88248 from claudiubelu/tests/reduce-to-agnhost-mounttest
tests: Replaces mounttest images used with agnhost (part 4)
2020-04-22 04:53:52 -07:00
Kubernetes Prow Robot
07179d0207 Merge pull request #87998 from msau42/e2e-reattach-stress
Add stress test to repeatedly restart Pods with PVCs in parallel
2020-04-21 03:04:57 -07:00
Kubernetes Prow Robot
7c53c1eb91 Merge pull request #89819 from pohly/enhance-podlogs-master
tests: enhance podlogs
2020-04-16 22:19:07 -07:00
Kubernetes Prow Robot
ae8d30631d Merge pull request #90214 from pohly/stop-pod-master
storage tests: really wait for pod to disappear
2020-04-16 20:49:07 -07:00
Michelle Au
6596e20b18 Make stress test parameters configurable
Change-Id: Ia062f3433b6043825a51a54c7c07eb4cdf809631
2020-04-16 14:18:21 -07:00
Kubernetes Prow Robot
aa0665dfee Merge pull request #90147 from gnufied/use-random-node-zone-for-inline-e2e
Use random zone for inline volume e2e tests
2020-04-16 13:59:08 -07:00
Patrick Ohly
0cdd5365a1 storage tests: really wait for pod to disappear
As seen in one case (https://github.com/intel/pmem-csi/issues/587), a
pod can reach the "not running" state although its ephemeral volumes
are still being torn down by kubelet and the CSI driver. What happens
then is that the test returns too early and even deleting the
namespace and thus the pod succeeds before the NodeVolumeUnpublish
really finishes.

To avoid this, StopPod now waits for the pod to really disappear.
2020-04-16 21:10:56 +02:00
Michelle Au
e132b77ae4 Add stress test to repeatedly restart Pods with PVCs in parallel
Change-Id: I499571cc86b1058d0e16d79e5e998d1dedfd9a4a
2020-04-15 18:10:35 -07:00
Hemant Kumar
7d6712632c Use random zone for inline volume e2e tests 2020-04-14 23:37:21 -04:00
Patrick Ohly
2ae6cf5984 mock tests: per-test timeout for ResourceExhausted
The timeout for the two loops inside the test itself are now bounded
by an upper limit for the duration of the entire test instead of
having their own, rather arbitrary timeouts.
2020-04-14 09:11:42 +02:00
Patrick Ohly
48f8e398fb mock tests: remove redundant wrapping of error
The "error waiting for expected CSI calls" is redundant because it's
immediately followed by checking that error with:

   framework.ExpectNoError(err, "while waiting for all CSI calls")
2020-04-07 13:09:31 +02:00
Patrick Ohly
2550051f3b mock tests: add timeout
The for loop that waited for the signal to delete pod had no timeout,
so if something went wrong, it would wait for the entire test suite to
time out.
2020-04-07 13:09:31 +02:00
Patrick Ohly
f117849582 mock tests: ResourceExhausted error handling in external-provisioner
The mock driver gets instructed to return a ResourceExhausted error
for the first CreateVolume invocation via the storage class
parameters.

How this should be handled depends on the situation: for normal
volumes, we just want external-scheduler to retry. For late binding,
we want to reschedule the pod. It also depends on topology support.
2020-04-07 13:09:31 +02:00
Patrick Ohly
367a23e4d9 mock tests: remove redundant retrieval of log output
The code became obsolete with the introduction of parseMockLogs
because that will retrieve the log itself. For debugging of a running
test the normal pod output logging is sufficient.
2020-04-07 13:07:09 +02:00
Patrick Ohly
d06589e4b6 mock tests: less verbose log output checking
parseMockLogs is called potentially multiple times while waiting for
output. Dumping all CSI calls each time is quite verbose and
repetitive. To verify what the driver has done already, the normal
capturing of the container log can be used instead:

csi-mockplugin-0/mock@127.0.0.1: gRPCCall: {"Method":"/csi.v1.Node/NodePublishVolume","Request"...
2020-04-07 13:07:09 +02:00
Patrick Ohly
981aae35dd mock tests: do not give up immediately for pod output errors
As seen in some test
runs (https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041),
retrieving output can fail with "the server rejected our request for
an unknown reason (get pods csi-mockplugin-0)".

If this truly an intermittent error, then the existing retry logic in
the callers can deal with this.
2020-04-06 15:03:44 +02:00
Jan Safranek
e23a26a380 Update to new javascript 2020-04-06 15:03:22 +02:00
Jan Safranek
a4f080861f Test NodeStage error cases
Especially related to "uncertain" global mounts. A large refactoring of CSI
mock tests were necessary:
- to be able to script the driver to return errors as required by the test
- to parse the CSI driver logs to check kubelet called the right CSI calls
2020-04-06 15:03:22 +02:00
Patrick Ohly
b9c5c55c09 podlogs: avoid dumping a terminated container more than once
The original logic was that dumping can stop (for example, due to
loosing the connection to the apiserver) and then will start again as
long as the container exists. That it duplicates output on restarts
is better than skipping output that might not have been dumped yet.

But that logic then also dumped the output of containers that have
terminated multiple times:
- logging is started, dumps all output and stops because the
  container has terminated
- next check finds the container again, sees no active logger,
  repeats

This wasn't a problem for short-lived logging in a custom
namespace (the way how it is done for CSI drivers in Kubernetes E2E),
but other testsuites (like the one from PMEM-CSI) keep logging running
for the entire test suite duration: there duplicate output became a
problem when adding driver redeployment as part of the suite's run.

To avoid duplicated output for terminated containers, which containers
have been handled is now stored permanently. For terminated containers,
restarting of dumping is prevented. This comes with the risk that if
the previous dumping ended before capturing all output, some output
will get lost.

Marking the start and stop of the log was also useful when streaming
to a single writer and thus gets enabled.
2020-04-03 14:45:00 +02:00
Patrick Ohly
dbac2a369a podlogs: adapt to modified error message
Commit 8a495cb5e4 changed the spelling of the error message that we
want to ignore. In case of version skew we suppress both the old and
new spelling.
2020-04-03 14:43:52 +02:00
Kubernetes Prow Robot
7bd48eb3f6 Merge pull request #89784 from oomichi/sshPort
Add common SSHPort on e2essh
2020-04-02 21:40:40 -07:00
Kenichi Omichi
48fdb95a82 Add common SSHPort on e2essh
There were several sshPort values in e2e test packages because
we've migrated code from e2e framework by copying and pastting.
This adds common SSHPort on e2essh package to reduce such duplicated
code.
2020-04-02 17:41:49 +00:00
Kubernetes Prow Robot
8d773421ee Merge pull request #80973 from xiaoanyunfei/bugfix/orphan-volume
fix orphaned pod flexvolume  can not be cleaned up
2020-04-01 20:50:23 -07:00
Kubernetes Prow Robot
ed00f42848 Merge pull request #89563 from oomichi/RestartControllerManager
Separate RestartControllerManager() as e2ekubesystem
2020-04-01 13:42:23 -07:00
Kubernetes Prow Robot
8d257ad315 Merge pull request #88118 from pohly/wait-for-persistent-volume-deleted-error
e2e/storage: check result of WaitForPersistentVolumeDeleted
2020-03-30 07:01:55 -07:00
Kubernetes Prow Robot
4e9dd8fd36 Merge pull request #89454 from gavinfish/import-aliases
Update .import-aliases for e2e test framework
2020-03-27 14:35:54 -07:00
Kenichi Omichi
42bb845f40 Separate RestartControllerManager() as e2ekubesystem
RestartControllerManager() is kube-controller specific function
and it is better to separate the function as subpackage of e2e
test framework.
In addition, the function made invalid dependency into e2essh.
So this separates the function into e2ekubesystem subpackage.
2020-03-27 03:52:17 +00:00
Patrick Ohly
c9004e704d e2e/storage: check result of WaitForPersistentVolumeDeleted
When deleting fails, the tests should be considered as failed,
too. Ignoring the error caused a wrong return code in the CSI mock
driver to go unnoticed (see
https://github.com/kubernetes-csi/csi-test/pull/250). The v3.1.0
release of the CSI mock driver fixes that.
2020-03-26 13:22:04 +01:00
Kubernetes Prow Robot
3afbcad669 Merge pull request #89436 from oomichi/RestartKubelet
Move RestartKubelet() into e2e/storage/vsphere
2020-03-25 02:03:17 -07:00
drfish
dfab6b637f Update .import-aliases for e2e test framework 2020-03-25 11:40:02 +08:00
Kenichi Omichi
2158989d6f Move WaitForPersistentVolumeDeleted() to e2epv
The function is for persistent volumes and it doesn't have any
reason why it stays in core test framework. So this moves the
function into e2epv package for reducing e2e/framework/util.go
code.
2020-03-24 22:54:07 +00:00