On HA API server hiccups we saw a prow job with error:
fail [k8s.io/kubernetes@v1.19.0/test/e2e/e2e.go:284]: Sep 30 17:06:08.313: Error waiting for all pods to be running and ready: 0 / 0 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
The failure should include the last error from API server, if it's
available:
fail [k8s.io/kubernetes@v1.19.0/test/e2e/e2e.go:284]: Oct 1 11:29:45.220: Error waiting for all pods to be running and ready: 0 / 0 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
Last error: Get "https://localhost:6443/api/v1/namespaces/kube-system/replicationcontrollers": dial tcp [::1]:6443: connect: connection refused
POD NODE PHASE GRACE CONDITIONS
This fixes a problem when using the framework helper
e2epod.CreateExecPodOrFail() more than once in the same test.
The problem is that this function was creating pods with both the
pod.Name and pod.GenerateName set.
The second pod failed to be created with a 500 with Reason ServerTimeout
indicating a unique name could not be found in the time allotted,
xref https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/
For testing certain features, the BusyBox image does not provide all the
tools that are needed. Notably 'dd' from BusyBox does not support
direct-io that is required for skipping caches while doing writes and
reads on a Block-mode PVC attached to different nodes.
Creates a few agnhost related functions that creates agnhost
pods / containers for general purposes.
The following commits will refactor tests to use those functions.
Some of the tests are negative test cases which are supposed to ensure that those
invalid usecases are handled properly.
However, some of the tests are false positives, they can pass due to various reasons.
One such example is: "should fail substituting values in a volume subpath with absolute path".
This test can pass if:
- the Pod cannot start due to various reasons (e.g.: the container image cannot be pulled or does
not exist).
- the Pod ran to completion, even though the container was not supposed to start in the first place.
This PR fixes a few things for e2e storage suite to run on Windows
cluster.
1. increaes timeout due to longer pod startup time for windows
2. Only set SELinuxOptions or fsGroup if os is not windows
3. Add VolumeSnapshot delete policy for windows3. Add VolumeSnapshot
delete policy for windows3. Add VolumeSnapshot delete policy for windows
This is useful in case that the pod owns some resources, because then
waiting for the pod ensures that those resources also were removed.
This should not matter at the moment because pods typically are not
owners of any other object, but that will change with the introduction
of generic ephemeral inline
volumes (https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1698-generic-ephemeral-volumes).
We don't want to set the name directly because then starting the pod
can fail when the node is temporarily out of resources
(https://github.com/kubernetes/kubernetes/issues/87855).
For CSI driver deployments, we have three options:
- modify the pod spec with custom code, similar
to how the NodeSelection utility code does it
- add variants of SetNodeSelection and SetNodeAffinity which
work with a pod spec instead of a pod
- change their parameter from pod to pod spec and then use
them also when patching a pod spec
The last approach is used here because it seems more general. There
might be other cases in the future where there's only a pod spec that
needs to be modified.