Tests should accept a context from Ginkgo and pass it through to all functions
which may block for a longer period of time. In particular all Kubernetes API
calls through client-go should use that context. Then if a timeout occurs,
the test returns immediately because everything that it could block on will
return.
Cleanup code then needs to run in a separate Ginkgo node, typically
DeferCleanup, which ensures that it gets a separate context which has not timed
out yet.
Adds two tests for the enforcement of the ReadWriteOncePod
PersistentVolume access mode.
1. Tests that when two Pods are scheduled that reference the same
ReadWriteOncePod PVC, the latter-scheduled Pod will be marked
unschedulable because the PVC is in-use.
2. Tests that when two Pods are scheduled on the same node (setting
Pod.Spec.NodeName to bypass scheduling for the second Pod), the
latter Pod will fail to start because the PVC is already mounted on
the Node.
Included are changes to update the hostpath CSI driver to accept new CSI
access modes. Its sidecar containers are already at supported versions
for ReadWriteOncePod and don't need updating. The GCP PD CSI driver does
not yet support the new CSI access modes, but its sidecar containers are
at supported versions and so the feature will work.
To support ReadWriteOncePod, the following CSI sidecars must be updated
to these versions or greater:
- csi-provisioner:v3.0.0+
- csi-attacher:v3.3.0+
- csi-resizer:v1.3.0+
For more details, see:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/2485-read-write-once-pod-pv-access-mode/README.md
It turned out to be unreliable (see
https://github.com/kubernetes/kubernetes/issues/112834). Encoding the data
inside the command as input for base64 is a workaround that is fine for small
amounts of data. It becomes less efficient and/or unusable for large amounts.
The "todo" packages were necessary while moving code around to avoid hitting
cyclic dependencies. Now that any sub package can depend on the framework, they
are no longer needed and the code can be moved into the normal sub packages.
This is useful for running a driver on a subset of all ready nodes:
- use e2enode.GetBoundedReadySchedulableNodes with a suitable
maximum number of nodes to determine how much nodes are available
for a test
- define pod anti-affinity in the PodTemplate:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: xxxxxxx
topologyKey: kubernetes.io/hostname
- set the ReplicaSetSpec.Replicas value to the number of nodes
The framework.AddCleanupAction API was a workaround for Ginkgo v1 not invoking
AfterEach callbacks after a test failure. Ginkgo v2 not only fixed that, but
also added a DeferCleanup API which can be used to run some code if (and only
if!) the corresponding setup code ran. In several cases that makes the test
cleanup simpler.
For cleanup purposes the ginkgo.DeferCleanup is a better replacement for
f.AddAfterEach:
- the cleanup only gets executed when the corresponding setup code ran
and can use the same local variables
- the callback runs after the test and before the framework
deletes namespaces (as before)
- if one callback fails, the others still get executed
For the original purpose (https://github.com/kubernetes/kubernetes/pull/86177 "This is
very useful for custom gathering scripts.") it is now possible to use
ginkgo.AfterEach because it will always get executed. Just beware that its
callbacks run in first-in-first-out order.
After updating gRPC in node-driver-registrar from v1.40.0 to v1.47.0 the
behavior of gRPC change in a way such that it no longer detected the
single-sided closing of the stream as a loss of connection. This caused gRPC in
the e2e.test to get stuck, possibly in a Read or Write for the HTTP stream
because those have neither a context nor a timeout.
Changing the connection handling so that all active connections are tracking in
the listener and closing them when the listener gets closed fixed this problem.
Followup on https://github.com/kubernetes/kubernetes/pull/111846. This
particular test was left out from that PR because once it was enabled it
started failing. It was desired to merge
https://github.com/kubernetes/kubernetes/pull/111846 irrespective of
this particular test.
The failure in the test was caused due to the
`createFSGroupRequestPreHook` mock CSI driver hook function assuming
that the request object passed to it is an instance of the respective
struct, but it's actually a pointer instead. This resulted in the hook
function not fulfilling its purpose, and the so the test failed.
Fixes instances of #98213 (to ultimately complete #98213 linting is
required).
This commit fixes a few instances of a common mistake done when writing
parallel subtests or Ginkgo tests (basically any test in which the test
closure is dynamically created in a loop and the loop doesn't wait for
the test closure to complete).
I'm developing a very specific linter that detects this king of mistake
and these are the only violations of it it found in this repo (it's not
airtight so there may be more).
In the case of Ginkgo tests, without this fix, only the last entry in
the loop iteratee is actually tested. In the case of Parallel tests I
think it's the same problem but maybe a bit different, iiuc it depends
on the execution speed.
Waiting for the CI to confirm the tests are still passing, even after
this fix - since it's likely it's the first time those test cases are
executed - they may be buggy or testing code that is buggy.
Another instance of this is in `test/e2e/storage/csi_mock_volume.go` and
is still failing so it has been left out of this commit and will be
addressed in a separate one