When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
All dependencies of VolumeBinding plugin from
"k8s.io/kubernetes/pkg/controller/volume/scheduling" package moved to
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding" package:
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache.go
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache_test.go
- whole file pkg/controller/volume/scheduling/scheduler_binder.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_fake.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_test.go
Package "k8s.io/kubernetes/pkg/controller/volume/scheduling/metrics" moved
to "k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/metrics"
because it only used in VolumeBinding plugin and (e2e) tests.
More described in issue #89930 and PR #102953.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
The previous approach with grabbing via a nginx proxy had some
drawbacks:
- it did not work when the pods only listened on localhost (as
configured by kubeadm) and the proxy got deployed on a different
node
- starting the proxy raced with starting the pods, causing
sporadic test failures because the proxy was not set up
properly unless it saw all pods when starting the e2e.test
- the proxy was always started, whether it is needed or not
- the proxy was left running after a test and then the next
test run triggered potentially confusing messages when
it failed to create objects for the proxy
The new approach is similar to "kubectl port-forward" + "kubectl get
--raw". It uses the port forwarding feature to establish a TCP
connection via a custom dialer, then lets client-go handle TLS and
credentials.
Somehow verifying the server certificate did not work. As this
shouldn't be a big concern for E2E testing, certificate checking gets
disabled on the client side instead of investigating this further.
This replaces embedding of JavaScript code into the mock driver that
runs inside the cluster with Go callbacks which run inside the
e2e.test suite itself. In contrast to the JavaScript hooks, they have
direct access to all parameters and can fabricate arbitrary responses,
not just error codes.
Because the callbacks run in the same process as the test itself, it
is possible to set up two-way communication via shared variables or
channels. This opens the door for writing better tests. Some of the
existing tests that poll mock driver output could be simplified, but
that can be addressed later.
For now, only tests using hooks use embedding. How gRPC calls are
retrieved is abstracted behind the CSIMockTestDriver interface, so
tests don't need to be modified when switching between embedding
and remote mock driver.
Extract TestSuite, TestDriver, TestPattern, TestConfig
and VolumeResource, SnapshotVolumeResource from testsuite
package and put them into a new package called api.
The ultimate goal here is to make the testsuites as clean
as possible. And only testsuites in the package.
As discussed in https://github.com/kubernetes/kubernetes/pull/93658,
relying on a watch to deliver all events is not acceptable, not even
for a test, because it can and did (at least for OpenShift testing) to
test flakes.
The solution in #93658 was to replace log flooding with a clear test
failure, but that didn't solve the flakiness.
A better solution is to use a RetryWatcher which "under normal
circumstances" (https://github.com/kubernetes/kubernetes/pull/93777#discussion_r467932080)
should always deliver all changes.
In our current mock CSI driver e2e test, we are not waiting
for the CSI driver register successfully to perform test
including provision PVC. This can lead to timeout when the
csi driver takes longer to register the socket.
This change adds the waiting part so that the system will
wait for up to 10 minutes for the driver to be ready. This
normally won't take this long. However, under a resource
constraint environment it can take longer than expected time.
https://github.com/kubernetes/kubernetes/issues/93358
By creating CSIStorageCapacity objects in advance, we get the
FailedScheduling pod event if (and only if!) the test is expected to
fail because of insufficient or missing capacity. We can use that as
indicator that waiting for pod start can be stopped early. However,
because we might not get to see the event under load, we still need
the timeout.
Setting testParameters.scName had no effect because
StorageClassTest.StorageClassName isn't used anywhere. Instead, the
storage class name is generated dynamically.
kubelet sometimes calls NodeStageVolume an NodePublishVolume too
often, which breaks this test and leads to flakiness. The test isn't
about that, so we can relax the checking and it still covers what it
was meant to cover.
The timeout for the two loops inside the test itself are now bounded
by an upper limit for the duration of the entire test instead of
having their own, rather arbitrary timeouts.
The "error waiting for expected CSI calls" is redundant because it's
immediately followed by checking that error with:
framework.ExpectNoError(err, "while waiting for all CSI calls")