Close outbound connections when using a cert callback and certificates rotate. This means that we won't get into a situation where we have open TLS connections using expires certs, which would get unauthorized errors at the apiserver
Attempt to retrieve a new certificate if open connections near expiry, to prevent the case where the cert expires but we haven't yet opened a new TLS connection and so GetClientCertificate hasn't been called.
Move certificate rotation logic to a separate function
Rely on generic transport approach to handle closing TLS client connections in exec plugin; no need to use a custom dialer as this is now the default behaviour of the transport when faced with a cert callback. As a result of handling this case, it is now safe to apply the transport approach even in cases where there is a custom Dialer (this will not affect kubelet connrotation behaviour, because that uses a custom transport, not just a dialer).
Check expiry of the full TLS certificate chain that will be presented, not only the leaf. Only do this check when the certificate actually rotates. Start the certificate as a zero value, not nil, so that we don't see a rotation when there is in fact no client certificate
Drain the timer when we first initialize it, to prevent immediate rotation. Additionally, calling Stop() on the timer isn't necessary.
Don't close connections on the first 'rotation'
Remove RotateCertFromDisk and RotateClientCertFromDisk flags.
Instead simply default to rotating certificates from disk whenever files are exclusively provided.
Add integration test for client certificate rotation
Simplify logic; rotate every 5 mins
Instead of trying to be clever and checking for rotation just before an
expiry, let's match the logic of the new apiserver cert rotation logic
as much as possible. We write a controller that checks for rotation
every 5 mins. We also check on every new connection.
Respond to review
Fix kubelet certificate rotation logic
The kubelet rotation logic seems to be broken because it expects its
cert files to end up as cert data whereas in fact they end up as a
callback. We should just call the tlsConfig GetCertificate callback
as this obtains a current cert even in cases where a static cert is
provided, and check that for validity.
Later on we can refactor all of the kubelet logic so that all it does is
write files to disk, and the cert rotation work does the rest.
Only read certificates once a second at most
Respond to review
1) Don't blat the cert file names
2) Make it more obvious where we have a neverstop
3) Naming
4) Verbosity
Avoid cache busting
Use filenames as cache keys when rotation is enabled, and add the
rotation later in the creation of the transport.
Caller should start the rotating dialer
Add continuous request rotation test
Rebase: use context in List/Watch
Swap goroutine around
Retry GETs on net.IsProbableEOF
Refactor certRotatingDialer
For simplicity, don't affect cert callbacks
To reduce change surface, lets not try to handle the case of a changing
GetCert callback in this PR. Reverting this commit should be sufficient
to handle that case in a later PR.
This PR will focus only on rotating certificate and key files.
Therefore, we don't need to modify the exec auth plugin.
Fix copyright year
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- mounttest
- mounttest-user
Additionally, removes the usage of the mounttest-user image and removes
it from kubernetes/test/images. RunAsUser is set instead of having that image.
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
This is gross but because NewDeleteOptions is used by various parts of
storage that still pass around pointers, the return type can't be
changed without significant refactoring within the apiserver. I think
this would be good to cleanup, but I want to minimize apiserver side
changes as much as possible in the client signature refactor.
The condition was not part of the message and so would not
match:
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/128aea1f-bde3-43d5-8b5f-dd86b9a5ef33/volumes/kubernetes.io~secret/default-token-v55hm\\\" to rootfs \\\"/var/lib/docker/overlay2/813487ba91d534ded546ae34f2a05e7d94c26bd015d356f9b2641522d8f0d6da/merged\\\" at \\\"/var/run/secrets/kubernetes.io/serviceaccount\\\" caused \\\"stat /var/lib/kubelet/pods/128aea1f-bde3-43d5-8b5f-dd86b9a5ef33/volumes/kubernetes.io~secret/default-token-v55hm: no such file or directory\\\"\"": unknown
Updated the check and regex.
Make sure the SR-IOV device plugin is ready, and that
there are enough SR-IOV devices allocatable before
spinning up test pods.
Signed-off-by: vpickard <vpickard@redhat.com>
1. move the integration test of TaintBasedEvictions to test/integration/node
2. move the e2e test of TaintBasedEvictions e2e test/e2e/node
3. modify the conformance file to adapt the TaintBasedEviction test
The current agnhost version is 2.12, 2.11 was not previously built as the
VERSION bumps merged one after the other, and the Image Promoter did not get to
build the 2.11 image.
In the current version, due to how make works, when building all the conformance
images (make all-push WHAT=all-conformance), ALL the images are being built first
before being pushed.
This PR will allow images to be built and pushed immediately afterwards, so the first
images that have been succesfully built are already pushed and promotable, even if
the the task failed on the last image, or it timed out.
A previous PR (#76838) introduced the ability to build and publish
Windows Test Images to kubernetes/test/images/image-util.sh.
Additionally, that PR also configured the Image Promoter to use a
few Windows Remote Docker build nodes to build the Windows Test Images,
however, there is a minor issue: the build container has a different $HOME
folder than expected (is: /builder/home, expected: /root - since it's the
root user), and the Remote Docker credentials are mounted in /root.
Because of that, image-build.sh cannot find the credentials it needs.
This will have to be properly fixed, but for now, we can just skip
the Windows image building part.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- dnsutils
dnsmasq is a Linux specific binary. In order for the tests to also
pass on Windows, CoreDNS should be used instead.
- Search/replace Google Infra kube-cross locations for K8s Infra
- Update kube-cross make targets
- Don't attempt to pre-pull image (docker build --pull)
This prevents CI failures when the image under test doesn't exist
yet in the registry.
- 'make all' now builds and pushes the kube-cross image
- Allow 'TAG' to be specified via env var
- Use 'KUBE_CROSS_VERSION' to represent the kube-cross version
- Tag kube-cross images with both a kubernetes version
('git describe') and a kube-cross version
- Add a GCB (Google Cloud Build) config file (cloudbuild.yaml)
Signed-off-by: Stephen Augustus <saugustus@vmware.com>
We don't want to set the name directly because then starting the pod
can fail when the node is temporarily out of resources
(https://github.com/kubernetes/kubernetes/issues/87855).
For CSI driver deployments, we have three options:
- modify the pod spec with custom code, similar
to how the NodeSelection utility code does it
- add variants of SetNodeSelection and SetNodeAffinity which
work with a pod spec instead of a pod
- change their parameter from pod to pod spec and then use
them also when patching a pod spec
The last approach is used here because it seems more general. There
might be other cases in the future where there's only a pod spec that
needs to be modified.
A previous PR replaced the usage of Redis in the guestbook app test
with Agnhost. The replacement went well for Linux setups and Containers,
which is why the tests are green, but there is a network particularity on
Windows setups which won't allow the test to pass.
The issue was observed with another test hitting the same issue:
https://github.com/kubernetes/kubernetes/issues/83072
Here's exactly what happens during the test:
- frontend containers are created, having the /guestbook endpoint. Its main
purpose is to forward the call to either agnhost-master (cmd=set), or
agnhost-slave (cmd=get).
- agnhost-master container is created, having the /set endpoint, and the
/register endpoint, through which the agnhost-slave containers would
register to it. Its purpose is to propagate all data received through /set
to its clients.
- agnhost-slave containers are created, having the /set and /get endpoints.
They would register to agnhost-master, and then receive any and all updates
from it, which was then served through the /get endpoint.
For simplicity, all 3 types have the same agnhost subcommand (agnhost guestbook), being
able to satisfy its given purpose. For this, HTTP servers were being used, including
for the /register endpoints. agnhost-master would send its /set updates as /set HTTP
requests. However, because of the issue listed above, agnhost-master did not receive
the client's IP, but rather the container host's IP, resulting in the request being
sent to the wrong destination.
This PR updates the agnhost guestbook subcommand. Now, the agnhost subscriber nodes will
send their own IP to the /register endpoint (/endpoint?host=myip).
In order to promote the volume limits e2e test (from CSI Mock driver)
to Conformance, we can't rely on specific output of optional Condition
fields. Thus, this commit changes the test to only check the presence
of the right condition and verify that the optional fields are not empty.
The existing walk.go and conformance.txt have a few shortcomings
which we'd like to resolve:
- difficult to get the full test name due to test context nesting
- complicated AST logic and understanding necessary due to the
different ways a test can be invoked and written
This changes the AST parsing logic to be much more simple and simply
looks for the comments at/around a specific line. This file/line
information (and the full test name) is gathered by a custom ginkgo
reporter which dumps the SpecSummary data to a file.
Also, the SpecSummary dump can, itself, be potentially useful for
other post-processing and debugging tasks.
Signed-off-by: John Schnake <jschnake@vmware.com>
The service session affinity allows to set the maximum session
sticky timeout.
This commit adds e2e tests to check that the session is sticky
before the timeout and is not after.
Executing commands in pods is expensive in terms of time and the
execution time is unpredictable and random.
The session affinity tests send several http requests from a pod
to check that the session is sticky. Instead of executing one
http request at a time, we can execute several requests from the
pod at one time and process the output.
The image "gcr.io/authenticated-image-pulling/windows-nanoserver:v1" is not a
manifest list, and it is only useful for Windows Server 1809, which means that the
test "should be able to pull from private registry with secret" will fail for
environments with Windows Server 1903, 1909, or any other future version we might
want to test.
This commit adds the the ability to have an alternative private image to pull by
using a configurable docker config file which contains the necessary credentials
needed to pull the image.
Add a new e2e test to test multiple stacked NetworkPolicies with
Except clauses in IPBlock which overlaps with an allowed CIDR in
another NetworkPolicy. This test ensures that the order of the
creation of NetworkPolicies should not matter while evaluating
a Pods access to another Pod.
Previously, we've centralized several images into agnhost, including
test-webserver.
The Hybrid cluster network test was using the test-webserver image, and
was updated to use agnhost, but without properly making it so it behaves like
test-webserver, resulting in a failing test.
We have added and enabled the Image Promoter on the k/k test images, which
will build the conformance images after a PR that affects kubernetes/test/images
merges.
We have added support for image-util.sh to handle external Windows Docker connections
in order to build Windows images.
This PR enables the Image Promoter to use some Windows nodes to build the necessary
Windows images.
In order to build Windows container images for multiple OS versions,
--isolation=hyperv is required. However, not all clouds / nodes supports
or have it enabled by default, which is why we're going to rely on
having multiple nodes to build the Windows images, until this issue
is addressed.
This commit adds support for building test images for multiple
Windows versions, as we have to support both LTS and SAC channels.
With this, the format for Windows images in the BASEIMAGE files is:
OS/ARCH/OS_VERSION
Also adds --isolation-hyperv to the Windows docker build command, making sure
that container images for multiple OS versions can be built using the same
Windows node.
Adds Windows support to the test/images/image-util.sh script.
A Windows node with Docker installed is required to build Windows images.
The connection URL to it must be set in the REMOTE_DOCKER_URL env variable.
Additionally, the authentication to the remote docker node is done through
certificates, which must be found in ~/.docker.
By default, the REMOTE_DOCKER_URL env variable is set to "" in the Makefile,
and because of it, the image-util.sh script will skip building and pushing
Windows images.
Added GOOS argument to the go build process in order to be able to build
Windows binaries. Additionally, the OS env variable was added to the images
Makefiles (default value is "linux") in order to maintain default behaviour.
Some images require a different Dockerfile for Windows images, since they
have different ways of installing dependencies. Because of this, if a image
needs to be built for Windows, it will first check for a Dockerfile_windows
file instead of the default one. If there isn't one, it means that the
same Dockerfile can be used for both Windows and Linux.
All Windows images will be based on the image
"mcr.microsoft.com/windows/servercore:ltsc2019". There are a couple of features
that are needed from this image, especially powershell.
Added busybox image for Windows. Most Windows images will be based on it, which
will help reduce the command line differences between Linux and Windows, but
not entirely.
Added Windows support for agnhost image.
Changes the image naming template from:
$REGISTRY/$image-$arch:$TAG
to
$REGISTRY/$image:$TAG-$os_name-$arch
The previous naming template would generate a plethora of images (Ai * N images,
where Ai is the number of OS/architectures for the image i and N is the number
of images), while the new naming template will reduce the number of images to N.
The new template also includes the OS name, as we plan to integrate Windows
images into the manifest lists as well.
When building images, their REGISTRY can be set to a custom
one, instead of the default "gcr.io/kubernetes-e2e-test-images" or
"us.gcr.io/k8s-artifacts-prod/e2e-test-images".
Some images are based on other images we're already building
(e.g.: kitten, nautilus), but their base images
are set in the default registry name, which can be undesirable.
This commit addresses this issue.
Windows images will require other base images, and thus, we will need
to explicitly specify the OS type a base image is for in order to
avoid confusion or errors.
The way the images are built is going to be changed, and in order to avoid
overwritting and breaking the current images, the image versions are bumped.
Similar functionality is required across e2e tests for RuntimeClass.
Let's create runtimeclass as part of the framework/node package.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Use bytes instead of strings, and slice in-place filter
(see https://github.com/golang/go/wiki/SliceTricks#filter-in-place)
to avoid copying strings around.
In my benchmark it shows almost 2x improvement:
BenchmarkString-8 1477207 10198 ns/op
BenchmarkBuffer-8 1561291 7622 ns/op
BenchmarkInPlace-8 2295714 5202 ns/op
String is the original implementation, Buffer is an intermediary
one that uses strings.Builder, and InPlace is the one from this commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Add a new e2e test to test the Except clauses in IPBlock CIDR
based NetworkPolicies. This test adds an egress rule which
allows client to connect to a CIDR which includes the
ServerPod's IP, however carves an except subnet which excludes
this ServerPod.
The test "should enforce egress policy allowing traffic to a server in a
different namespace based on PodSelector and NamespaceSelector
[Feature:NetworkPolicy]" is flaky because it doesn't wait for the server
Pod to be ready before testing traffic via its service, then even the
NetworkPolicy allows it, the SYN packets will be rejected by iptables
because the service has no endpoints at that moment.
This PR fixes it by making it wait for Pods to be ready like other
tests.
Due to an oversight, the e2e topology manager tests
were leaking a configmap and a serviceaccount.
This patch ensures a proper cleanup
Signed-off-by: Francesco Romani <fromani@redhat.com>
Up until now, the test validated the alignment of resources
only in the first container in a pod. That was just an overlook.
With this patch, we validate all the containers in a given pod.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add autodetection code to figure out on which NUMA node are
the devices attached to.
This autodetection work under the assumption all the VFs in
the system must be used for the tests.
Should not this be the case, or in general to handle non-trivial
configurations, we keep the annotations mechanism added to the
SRIOV device plugin config map.
Signed-off-by: Francesco Romani <fromani@redhat.com>
On single-NUMA node systems the numa_node of sriov devices was
sometimes reported as "-1" instead of, say, 0. This makes some
tests that should succeed[0] fail unexpectedly.
The reporting works as expected on real multi-NUMA node systems.
This small workaround was added to handle this corner case,
but it makes overall the code less readable and a bit too lenient,
hence we remove it.
+++
[0] on a single NUMA node system some resources are obviously
always aligned if the pod can be admitted. It boils down to the
node capacity at pod admittal time.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The e2e_node topology_manager check have a early, quick check
to rule out systems without sriov device, thus skipping the tests.
The first version of the ckeck detected PFs, (Physical Functions),
under the assumption that VFs (Virtual Functions) were already been
created. This works because, obviously, you can't have VFs without PFs.
However, it's a little safer and easier to understand if we check
firectly for VFs, bailing out from systems which don't provide them.
Nothing changes for properly configured test systems.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This is useful for logs from daemonset pods because for those it is
often relevant which node they ran on because they interact with
resources or other pods on the host.
To keep the log prefix short, it gets limited to a maximum length of
10 characters.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- resource-consumer-controller
- test-webserver
Commit 69a473be3 broke the test case (that was checking
that a file from a volume can't be read) by adding a
(wrong) assumption that error should be nil.
Fix the assumption (we do expect the error here).
Note that we were checking file contents before; now when
we check the error from read, checking the contents is
redundant.
The other issue with the test is SELinux should be in enforcing
mode for this to work, so let's check that first to avoid
false positives.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Address review comments and move the helper function
in the `framework/kubelet` package to avoid circular deps
(see https://github.com/kubernetes/kubernetes/issues/81245)
Signed-off-by: Francesco Romani <fromani@redhat.com>
The test "should allow ingress access from updated pod" fails regardless
of which CNI plugin is enabled. It's because the test assumes the client
Pod can recheck connectivity after updating its label, but the client
won't restart after the first failure, so the second check will always
fail. The PR creates a client Pod with OnFailure RestartPolicy to fix it.
In addition to the above test that checks rule selector takes effect on
updated client pod, the PR adds a test "should deny ingress access to
updated pod" to ensure network policy selector can take effect on updated
server pod.
In addition to getting overall performance measurements from golang benchmark,
collect metrics that provides information about insides of the scheduler itself.
This is a first step towards improving what we collect about the scheduler.
Metrics in question:
- scheduler_scheduling_algorithm_predicate_evaluation_seconds
- scheduler_scheduling_algorithm_priority_evaluation_seconds
- scheduler_binding_duration_seconds
- scheduler_e2e_scheduling_duration_seconds
Scheduling throughput is computed on the fly inside perfScheduling.
this patch moves the helper getCurrentKubeletConfig function,
used in both e2e and e2e_node tests and previously duplicated,
in the common framework.
There are no intended changes in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
All tests remove the test client pod, usually in TestVolumeClient.
Rename TestCleanup to TestServerCleanup.
In addition, remove few calls to Test(Server)Cleanup that do not do anything
useful (server pod is not used in these tests).
- Add handlers for service account issuer metadata.
- Add option to manually override JWKS URI.
- Add unit and integration tests.
- Add a separate ServiceAccountIssuerDiscovery feature gate.
Additional notes:
- If not explicitly overridden, the JWKS URI will be based on
the API server's external address and port.
- The metadata server is configured with the validating key set rather
than the signing key set. This allows for key rotation because tokens
can still be validated by the keys exposed in the JWKs URL, even if the
signing key has been rotated (note this may still be a short window if
tokens have short lifetimes).
- The trust model of OIDC discovery requires that the relying party
fetch the issuer metadata via HTTPS; the trust of the issuer metadata
comes from the server presenting a TLS certificate with a trust chain
back to the from the relying party's root(s) of trust. For tests, we use
a local issuer (https://kubernetes.default.svc) for the certificate
so that workloads within the cluster can authenticate it when fetching
OIDC metadata. An API server cannot validly claim https://kubernetes.io,
but within the cluster, it is the authority for kubernetes.default.svc,
according to the in-cluster config.
Co-authored-by: Michael Taufen <mtaufen@google.com>
This adds the [Feature:vsphere] tag to those vSphere tests which were
missing it. This makes it easier to specifically target the vSphere
storage E2E test suite.
Reorganize the code with setup and teardown functions,
to make room for the future addition of more device plugin
support, and to make the code a bit tidier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add a helper function to check if a Pod failed
admission for Topology Affinity Error.
So far we only check the Status.Reason.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Five minutes was initially used only to be overcautious.
From my experiments, the node is ready in usually less than a minute.
Double it to give some buffer space.
Signed-off-by: Francesco Romani <fromani@redhat.com>
TO properly implement some e2e tests, we need to know
some basic topology facts about the system running the tests.
The bare minimum we need to know is how many PCI SRIOV devices
are attached to which NUMA node.
This way we know which core we can reserve for kube services,
and which NUMA socket we can take to test full socket reservation.
To let the tests know the PCI device topology, we use annotations
in the SRIOV device plugin ConfigMap we need anyway.
The format is
```yaml
metadata:
annotations:
pcidevice_node0: "2"
pcidevice_node1: "0"
```
with one annotation per NUMA node in the system.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Negative tests is when we request a gu Pod we know the system cannot
fullfill - hence we expect rejection from the topology manager.
Unfortunately, besides the trivial case of excessive cores (request
more socket than a NUMA node provides) we cannot easily test the
devices, because crafting a proper pod will require detailed knowledge
of the hw topology.
Let's consider a hypotetical two-node NUMA system with two PCIe busses,
one per NUMA node, with a SRIOV device on each bus.
A proper negative test would require two SRIOV device, that the system
can provide but not on the same single NUMA node.
Requiring for example three devices (one more than the system provides)
will lead to a different, legitimate admission error.
For these reasons we bootstrap the testing infra for the negative tests,
but we add just the simplest one.
Signed-off-by: Francesco Romani <fromani@redhat.com>
We cannot anticipate all the possible configurations
needed by the SRIOV device plugin: there is too much variety.
Hence, we need to allow the test environment to supply
a host-specific ConfigMap to properly configure the device
plugin and avoid false negatives.
We still provide a the default config map as fallback and reference.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The SRIOV device plugin can create different resources depending
on both the hardware present on the system and the configuration.
As long as we have at least one SRIOV device, the tests don't actually
care about which specific device is.
Previously, the test hardcoded the most common intel SRIOV device
identifier. This patch lifts the restriction and let the test
autodetect and use what's available.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch extends and completes the previously-added
empty topology manager test for single-NUMA node policy
by adding reporting in the test pod and checking
the resource alignment.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch all the testing infra and utilities needed
to run e2e topology manager tests. This include setup
a guaranteed pod which needs some devices.
The simplest real device available for the purpose
are the SRIOV devices, hence we use them.
This patch pulls the SRIOV device plugin from
the official, yet external, repository.
We do it as close as possible for the nvidia GPU plugin.
This patch also performs minor refactoring for some
test framework utilities, needed to support the new
e2e tests.
Finally, we add an empty e2e topology manager test,
to be completed by the next patch.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The kubelet test here is using a one minute timeout, instead of the
normal framework.PodStartTimeout.
The DNS results validation functions pull several images including
the jessie-dnsutils which is a bit bigger than usual.
We have enabled the Image Promoter on the k/k test E2E images. This updates the
kubernetes/test/images/README.md file to include information about the images
it runs for, and how to promote the new image from the staging registry to the
regular one.
Cleans up some TODO items, since the Image Centralization Part 4 merged.
In addition to getting overall performance measurements from golang benchmark,
collect metrics that provides information about insides of the scheduler itself.
This is a first step towards improving what we collect about the scheduler.
Metrics in question:
- scheduler_scheduling_algorithm_predicate_evaluation_seconds
- scheduler_scheduling_algorithm_priority_evaluation_seconds
- scheduler_binding_duration_seconds
- scheduler_e2e_scheduling_duration_seconds
Scheduling throughput is computed on the fly inside perfScheduling.
In 24d105995d, a fix was made in bazel
based builds to ensure that we add `selinux` tag when we build all
binaries especially the `kubelet`. We need to do the same for in our
hack scripts so things like `make release` will work properly as well.
Some scripts use `GOFLAGS=-tags=providerless` for example, So we should
support the tags to be specified in GOFLAGS as well. We parse out the
tags from there and ensure selinux is added to the list of tags we used
for building the binaries. Note that we add our own `-tags` with the
full set of tags and since we specify our parameter at the end, ours
full list takes precendence
Organizes the behaviors directory based on SIG, and provides a few
example behavior descriptions for Pods and Services to start. Note
that these are very incomplete lists of behaviors at this point.
Some tests under e2e/storage never end up calling the
Framework#BeforeEach() prolog. Handle such cases by returning
early in AfterEach() by checking a new field "beforeEachStarted".
Also add a nil check for ClientSet in AfterEach().
As of https://github.com/kubernetes/kubernetes/pull/72831, the minimum
docker version is 1.13.1. (and the minimum API version is 1.26). The
only time the `RuntimeAdmitHandler` returns anything other than accept
is when the Docker API version < 1.24. In other words, we can be
confident that Docker will always support sysctl.
As a result, we can delete this unnecessary and docker-specific code.
Re conversation in https://github.com/kubernetes/kubernetes/pull/87373,
we should keep the current behavior (i.e. using the docker binary
instead of the docker client). Delete the TODO instructing us to change
the behavior.
Due to lack of the line "test/e2e/framework/" in import-boss/main.go
verify-import-boss didn't work for e2e framework.
This makes the check enabled by adding the line.
In addition, this adds some e2e sub framework packages and some
k8s.io/utils packages which were implemented after creating
.import-restrictions to pass the check.
if it is unset. Otherwise, kubectl exec through http/kubectl proxy tests in
test/e2e/kubectl/kubectl.go would fail with "--host variable must be set to
the full URI to the api server on e2e run" error.
With this change, running the following tests can now pass:
$ e2e.test --kubeconfig=xxx --ginkgo.focus="should support exec through"
NodeResourceFit plugin's Filter method responsible for checking if a pod fits
a given node ignores resource limits and acknowledge resource requests only.
Given both tests validating resource limits of pods were setting only pod resource limits,
ability of NodeResourceFit plugin to properly filter nodes was not tested at all.
Many times an e2e test fails with an unexpected error,
"timed out waiting for the condition".
Useful information may be in the test logs, but debugging e2e test
failures will be much faster if we add context to errors when they
happen.
This change makes sure we add context to all errors returned from
helpers like wait.Poll().
- Use preemptor pod's Status.NominatedNodeName to signal success of the Preemption behavior
- Optimize the test to eliminate unnecessary Pods creation
- Increase timeout from 1 minute to 2 minutes
GetPodLogs always fails when the tests fail, which is because the tests
specify wrong container names when getting logs.
When creating a client Pod, it specifies "<podName>-container" as
container name and "<podName>-" as Pod GenerateName. For instance,
podName "client-a" will result in "client-a-container" as the container
name and "client-a-vx5sv" as the actual Pod name, but it always uses the
actual Pod name to construct the container name when getting logs, e.g.
"client-a-vx5sv-container".
This patch fixes it by specifying the same static container name when
creating Pod and getting logs.
ServiceStartTimeout is defined at e2e core framework and the one of
the core is used at many places, but the one of this endpoints/ports.go
is not used at the other places.
So this removes the one of endpoints/ports.go for the cleanup.
The manifest list is stateful, which means that the same list will get amended
with each successive image published. That's unintended, and can lead to the
wrong image being pulled from the manifest list.
Resets the manifest list before amending new images into it.
The test was flaky because it required the job succeeds 3 times with
pseudorandom 50% failure chance within 15 minutes, while there is an
exponential back-off delay (10s, 20s, 40s …) capped at 6 minutes before
recreating failed pods. As 7 consecutive failures (1/128 chance) could
take 20+ minutes, exceeding the timeout, the test failed intermittently
because of "timed out waiting for the condition".
This PR forces the Pods of a Job to be scheduled to a single node and
uses a hostPath volume instead of an emptyDir to persist data across new
Pods.
The e2e test "Kubectl Port forwarding With a server listening .."
is failed sometimes due to the difference between expected data and
received data. To receive the data, the test does CloseWrite() but
it didn't have the corresponding error handling.
This adds it to investigate the issue more.
Not all errors will happen in sync during Instances.Insert(...).Do(), so
it is important to verify the operation object to see why insert fails.
An example is when exceeding the resource quota.
Eg.
could not create instance test-cos-beta-80-12739-29-0: [&{Code:QUOTA_EXCEEDED Location: Message:Quota 'CPUS' exceeded. Limit: 24.0 in region europe-west6. ForceSendFields:[] NullFields:[]}
This fixes the issue where tests will fail "silently" when instance
insert fails.
This is currently the top flake against PRs, so I'm tagging it
as [Flaky]. Flaky tests can't be conformance tests, so I'm
removing it from [Conformance] as well until this is resolved.
it turns out that the e2e test was not using the timeout used to
hold the CLOSE_WAIT status, hence the test was flake depending
on how fast it checked the conntrack table.
This PR replaces the dependency on ssh using a pod to check the conntrack
entries on the host in a loop, to make the test more robust
and reduce the flakiness due to race conditions and/or ssh issues.
It also fixes a bug trying to grep the conntrack entry, where
the error was swallowed if a conntrack entry wasn't found.
It seems that the Image Promoter is running containers without the -t flag, which causes the error:
the input device is not a TTY
Removing the -it from the docker command in kubernetes/test/images/image-util.sh solves this.
Issue: https://github.com/kubernetes/kubernetes/issues/86884
This PR fixes the issue that err variable gets shadowed. Because of
this, it might get nil pointer error.
Change-Id: Ib7da918418a7c8148a6ca598db12b3744eb3b7c8
Prior to the Image Centralization part 4 (https://github.com/kubernetes/kubernetes/pull/81170),
a PR merged that enables the Image Promoter to run on the k/k test images.
The Image Promoter currently only builds the Conformance-related images, but the
Image Centralization part 4 centralized some of those images into agnhost, so they
need to be removed from the conformance_images list.
Additionally, https://github.com/kubernetes/kubernetes/pull/81226 proposes mounttest-user
image to be removed, and RunAsUser to be used in tests instead.
The image used by the Image Promoter (gcr.io/k8s-testimages/gcb-docker-gcloud:v20190906-745fed4)
is based on busybox, and thus, the sed binary is actually busybox. image-util.sh calls
kube::util::ensure-gnu-sed several times, which ensures that a GNU sed binary exists
(it checks by greping GNU in its --help output). Obviously, it won't match the busybox sed
binary. But the sed usage in image-util.sh is fairly simple, and the busybox sed is sufficient.
Bumps image versions for: jessie-dnsutils, nonewprivs, resource-consumer, sample-apiserver. These
images are included in the conformance_images that are being built by the Image Promoter, so
we're bumping them just to make sure we're not breaking anything and cause all the CIs to fall.
We're going to bump the image versions used in tests in a subsequent PR. The image version was not
bumped for: agnhost, kitten, nautilus, as they were already bumped by the Image Centralization part 4
PR.
it turns out that the e2e test was no using the timeout used to
hold the CLOSE_WAIT status, hence the test was flake depending
on how fast it checked the conntrack table.
This PR replaces the dependency on ssh using a pod to check the conntrack
entries on the host in a loop, to make the test more robust
and reduce the flakiness due to race conditions and/or ssh issues.
It also fixes a bug trying to grep the conntrack entry, where
the error was swallowed if a conntrack entry wasn't found..
In order to build the image binaries, the GOARM variable is required,
but not all Makefiles have it defined, causing the make to fail on
those images.
This adds the require GOARM variable to the Makefiles in question.
Remove Todos for verification of “allowed to
post CSR” and "allowed to auto approve CSR"
for bootstraptoken group, as these privilege
bindings are not managed by kubeadm source
code
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
Now we are facing flake test of TestPreemption due to less available
node. TestPreemption consists of multiple test cases and the resource
is shared in them.
At this time, we cannot see what test cases run before the flake
happens. So it is better to know that to distinguish the cleanup of
pods is not completed or not.
The log of a flake test says
"Pod did not start running: timed out waiting for the condition"
but it is hard to know what is actual status of the pod.
So this adds debugging message to know that.
DeleteSyncInNamespace() was used at an e2e node test and DeleteSync()
only. In addition, the part of the e2e node test can be replaced with
DeleteSync(). CreateSyncInNamespace() is the same thing and can be
replaced with CreateSync(). So this replaces these functions and
removes them for the cleanup.
In order for the E2E test images to be automatically built and published
to the staging registry (from which they will be promoted to the regular
E2E test registry), the cloudbuild.yaml file has been added.
The file was added in conformance with [1].
Adds the ability to build all test images:
make -C test/images WHAT=all-images
[1] https://github.com/kubernetes/test-infra/blob/master/config/jobs/image-pushing/README.md
kubeadm creates the bootstrap token with extra group
, system:bootstrappers:kubeadm:default-node-token,
should be able to be used for authentication and
signing.
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
While this is looser check than original check, I do not think
we can quite expect NodePublish and NodeUnpublish call counts to match
NodePublishvolume call count may not be same as NodeUnpublishVolume
call count because reconciler may have a mount operation queued up
while previous one is finishing. So, it is not unusual to have more than one
NodePublishVolume call for same pod+volume combination, similarly
unmount may also run more than once.
We've had a fun case of `Sample API Server using the current
Aggregator` failing due to DNS returning a response for localhost that
is not 127.0.0.1 and does not exist on the node, causing etcd trying to
bind to a non-existing address and consequently failing. Trying to spare
others this fun :)
This reverts commit 4d3d364d2b.
This commit promoted a test to conformance and also changed the test at
the same time. It should have been split up into two PRs to verify
whether the change had the intended effect. The test is now failing
consistently, so reverting.
Currently hollow nodes communicate with kubemark master using public
master IP, which results in each call going through cloud NAT. Cloud NAT
limitations become performance bottleneck (see kubernetes/perf-tests/issues/874).
To mitigate this, in this change, a second kubeconfig called "internal"
is created. It uses private master IP and is used to set up hollow nodes.
Note that we still need the original kubemark kubeconfig (using public master IP)
to be able to communicate with the master from outside the cluster (when
setting it up or running tests).
Testing:
- set up kubemark cluster, verified apiserver logs to confirm that the call
from hollow nodes did not go through NAT
Errors from staticcheck:
test/images/agnhost/dns/common.go:79:6: func runCommand is unused (U1000)
test/images/agnhost/inclusterclient/main.go:75:16: using time.Tick leaks the underlying ticker, consider using it only in endless functions, tests and the main package, and use time.NewTicker here (SA1015)
test/images/agnhost/net/nat/closewait.go:41:5: var leakedConnection is unused (U1000)
test/images/agnhost/netexec/netexec.go:157:17: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:250:18: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:317:18: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:331:19: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:345:19: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:357:19: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:370:19: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:380:9: this value of err is never used (SA4006)
test/images/agnhost/netexec/netexec.go:381:17: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/netexec/netexec.go:386:17: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
test/images/agnhost/pause/pause.go:43:23: syscall.SIGKILL cannot be trapped (did you mean syscall.SIGTERM?) (SA1016)
test/images/agnhost/serve-hostname/serve_hostname.go:125:15: the channel used with signal.Notify should be buffered (SA1017)
test/images/agnhost/webhook/patch_test.go:131:13: this value of err is never used (SA4006)
test/images/pets/peer-finder/peer-finder.go:155:6: this value of newPeers is never used (SA4006)
The storage e2e test suite uses given CSI driver names as pod names. For
pod names that also get enriched by a prefix and suffix, it is very easy
to exceed the 63 character limit that pod names are subject to, thereby
causing tests to fail.
This change fixes the described problem by omitting the driver name from
the pod name suffix.
It also allows us to drop VolumeResource.VolType.
This is the initial commit for E2E testing for Topology
Manager.
For now, run a subset of the CPU Manager tests.
Additional tests will be forthcoming.
Signed-off-by: vpickard <vpickard@redhat.com>
For now, we just pass 'nil' as the set of 'initialContainers' for
migrating from old state semantics to new ones. In a subsequent commit
will we pull this information from higher layers so that we can pass it
down at this stage properly.
- eliminate running paths of Predicates in scheduler; use Filter Plugins instead.
- refactor all unit tests
- adjust the TestPreemptWithPermitPlugin integration test
Add the "get" and "watch" verbs to the ClusterRole created
for the sample apiserver. Without this, the test complains about
"Failed to watch..." the resources in question.
Strictly speaking the "get" verb doesn't seem to be needed, but
this aligns the e2e test with the example at
staging/src/k8s.io/sample-apiserver/artifacts/example/rbac.yaml
Some of the content was out-dated (`ShortName` was removed,
`dataSource` renamed). Better refer to the actual definitions with
functional links.
To make it simpler to find those, `driverDefinition` gets moved up in
`external.go`.
This will catch accidentally adding a new interface function which
isn't exported. For example, an attempt to implement a new unexported
"foobar()" function then leads to:
test/e2e/storage/testsuites/api_test.go:54:5: cannot use &fakeSuite literal (type *fakeSuite) as type testsuites.TestSuite in assignment:
*fakeSuite does not implement testsuites.TestSuite (missing testsuites.foobar method)
have foobar()
want testsuites.foobar()
FAIL k8s.io/kubernetes/test/e2e/storage/testsuites [build failed]
Implementing a test suite was impossible outside of the
k8s.io/kubernetes/test/e2e/storage/testsuites package because all
interfaces and structs used by them were private.
As part of revamping the API, genericVolumeTestResource also gets
exported because it is useful for other test suites. Because the
TestResource interface became obsolete a while ago and isn't used
anymore, the new name is just testsuites.VolumeResource.
testpatterns.CSIInlineVolume needs special handling in a few places.
It can now be used in a test pattern for a test suite that uses a
VolumeResource instance.
On systems with SELinux enabled, non-privileged containers can't access
data of privileged containers. Since the CSI driver socket is exposed
by a privileged container, all sidecars must be privileged too.
This removes setting KUBE_GCE_PRIVATE_CLUSTER=false flag when creating
kubemark master.
In result, util.sh detect-master function detects both private and
public master IPs. The comment about cloud NAT does not apply after
https://github.com/kubernetes/kubernetes/pull/81073/files got merged
(see comments in the PR discussion).
This is first PR to change kubemark clusters to use private master IPs:
https://github.com/kubernetes/perf-tests/issues/874.
Note that kubemark kubeconfig will still contain public master IP. This
will be addresses in the follow-up PRs.
Testing:
* set up kubemark cluster
* verified that both private and public kubemark master IPs are logged
* ran tests on kubemark cluster using cluster loader
* Added a extractor to extract raw logs from json format and then pipe
it into benchmark parser.
* Also added -alsologtostderr=false -logtostderr=false to reduce noisy logs.
Some tests are setting HostNetwork=true, even if it is not required
for them to pass.
This patch will set the HostNetwork to false for those tests, allowing
them to be run on Windows nodes as well.
We've added the WindowsRunAsUserName feature some time ago, and it was
promoted to Beta for v1.17. We can now remove the [LinuxOnly] tag for
a few tests.
Depends On: #83058
Depends On: #84882
- add csi pd driver manifests
- modify snapshottable test case
- fix tests of pod has to be created first for delay-binding PVC, otherwise PVC won't be bound
This ended up causing far more problems than it was worth, especially
given that it just attempted to provide backwards compatibility with
the alpha release.
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
update tests
add comment
amend var name
update comment
add check for empty slice
fix tests
fix mask size in test
review feedback
add ipv4 and ipv6 flag for mask sizes
add to violation exception list
remove import alias
run update-openapi-spec
review feedback
run update-bazel
review feedback
review feedback
suites.go is used from e2e.go only and suites.go has invalid dependency
to subpackage of e2e framework as e2e core framework.
So this moves suites.go from e2e core framework.
Since 59533f0cd1 which removes the
deprecated scalability tests, functions in profile_gatherer.go have
not been used at all.
So this removes e2e/framework/profile_gatherer.go
Ensures that requests that require large packets work properly, and that
they are not dropped.
Adds AgnhostPrivate to test/utils/image/manifest. Some tests are trying to pull
the agnhost image from the private registry, meaning that we would need to
always build and push the agnhost image to both e2e and private registry
whenever we bump its version. Decoupling them would mean that we only need
to push the image to the e2e registry.
The e2e core framework and subpackages of e2e framework are defined.
The subpackages can import the core framework, but the core framework
should not import the subpackages. We've defined this dependency rule
after circular depencency issue happened.
This adds TODOs to understand what we should in this rule.
Changes scheduler.New so that algorithm source is moved from the
parameter to an option. The default algorithm source is source with the
DefaultProvider.
This is the last PR which moves functions from e2e/framework/util.go
- WaitForServiceWithSelector: Moved to e2e/cloud/gcp
- WaitForStatefulSetReplicasReady: Moved to e2e/storage
- WaitForRCToStabilize: Moved to e2e/kubectl
- CheckInvariants: Moved to e2e/common
- ContainerInitInvariant: Moved to e2e/common
- DumpEventsInNamespace: Renamed to local function
- WaitForDaemonSets: Moved to e2e/e2e.go
Since EndpointSlices and Endpoints are expected to be enabled by default
for some time, it's important to have tests covering the functionality
of both controllers. With some consumers relying on Endpoints and others
relying on EndpointSlices, it's important to ensure both resources are
generated in a timely manner with consistent attributes.
The redis version has been bumped to version 5.0.5, but the maximum version supported on
Windows is 3.2. This can lead to failing tests, the output and behaviour can be different
(see #80516). In order to prevent such failures, the amount of times the Redis image is
used can be reduced.
This commit uses the previously added agnhost guestbook subcommand as a replacement for the
Guestbook application created by the test "should create and stop a working application".
Adds AgnhostPrivate to test/utils/image/manifest. Some tests are trying to pull
the agnhost image from the private registry, meaning that we would need to
always build and push the agnhost image to both e2e and private registry
whenever we bump its version. Decoupling them would mean that we only need
to push the image to the e2e registry.
cs is not used any more after previous change removed cs arg
from TectVolumeClient. Functions down the call path
get cs value from framework parameter.
Use POST method instead of running local kubectl.
Use ExecCommandInContainerWithFullOutput() instead of RunKubectl().
PodExec() takes additional framework arg, passed down in call chain.
VerifyExecInPodFail uses different error code cast as original
one causes test code Panic if used with new call method.
This is required for promoting volume limits to GA. The new version of
the driver reports the max number of volumes it supports. Such number
should be specified as a CLI argument when starting the driver.
Clients have to renegotiate frequently, and the shape of the interface
that a client wants is slightly different than the interface on the
server. Create a new ClientNegotiator interface that represents what
the client->server code would want to use (mostly, still evolving
Encoder) and move versioning details out of RESTClient.
In the long run, we want to remove internal clients and conversions
from clients. This takes a step in that direction and also makes sure
watch negotiation is consistent with the server.
One common frustration of end users running the e2e suite is that
they take a significant amount of time and it is difficult to
gauge progress.
Even if tailing the logs it can be difficult to see where one
test starts and another ends or understand the if there have been
failures in the past 1h of logs.
This change adds a new custom reporter which prints summary information
as tests complete. This includes the number of tests to run and how
many have been passed/failed/skipped along with which tests have failed.
A new flag can be set which pushes these values to an endpoint. This is
intended for integration with Sonobuoy but any endpoint could consume and
surface this data to the user so they can better understand the state
of the current test run.
The following functions are called at some specific places only,
so this moves these functions to the places and makes them local.
- WaitForPersistentVolumeClaimDeleted: Moved to e2e storage
- PrintSummaries: Moved to e2e framework.go
- GetHostExternalAddress: Moved to e2e node
- WaitForMasters: Moved to e2e cloud gcp
- WaitForApiserverUp: Moved to e2e network
- WaitForKubeletUp: Moved to e2e storage vsphere
The util.go file is so huge, it is better to make it small and simple
as possible for the maintenance.
This removes the following unused function:
- WaitForNodeHasTaintOrNot: Unused since 7e1794dcb1
The `kubectl get output` e2e test I'd previously added ended up being
flaky in certain e2e test scenarios. It relies on getting a list of API
resources in the cluster and running `kubectl get` calls against them.
The problem ended up being that other e2e tests could create resources
that would cause this test to fail. This change limits the scope of the
tests to not cover CRDs. This should still allow the test to catch new
Kubernetes resources with improperly configured kubectl output while
limiting the flakiness of the test.
When running ./hack/verify-import-aliases.sh locally, the following
error happened:
$ ./hack/verify-import-aliases.sh
checking-imports:
ERROR wrong alias for import "k8s.io/api/node/v1beta1" should be
nodev1beta1 in file test/e2e/node/runtimeclass.go
test/e2e_node/system/specs
exit status 1
This fixes it.
This was changed recently in #84640, but result must be pre-populated.
Example error:
`Nov 6 00:18:05.296: INFO: Unable to retrieve kubelet pods for node master-us-central1-c-t81q: expected pointer, but got nil`
Since we've added support for RunAsUserName, we can now run some new
tests. However, the [LinuxOnly] tag will have to remain until the
WindowsRunAsUserName feature becomes enabled by default.
Additionally, Containerd supports file mounting on Windows, and some
tests will be able to pass on Windows with Containerd instead of Docker.
As of go1.13, test flags like test.timeout are registered lazily.
This means they are not available in package init() methods:
> Testing flags are now registered in the new Init function,
> which is invoked by the generated main function for the test.
> As a result, testing flags are now only registered when running
> a test binary, and packages that call flag.Parse during package
> initialization may cause tests to fail.
This moves the copy of CLI flags into TestMain, just prior to parse.
This removes the following functions from e2e/framework/skip.go:
- SkipUnlessTaintBasedEvictionsEnabled: Unused since 7e1794dcb1
- SkipIfContainerRuntimeIs: Unused since 19a588eeda
Tests should provide more verbose output when their core functionality
fails; in this case we were only logging that the job count did not
match but this is not sufficient for understanding a failure.
This PR moves functions from test/e2e/framework.util.go for making e2e
core framework small and simple:
- RestartKubeProxy: Moved to e2e network package
- CheckConnectivityToHost: Moved to e2e network package
- RemoveAvoidPodsOffNode: Move to e2e scheduling package
- AddOrUpdateAvoidPodOnNode: Move to e2e scheduling package
- UpdateDaemonSetWithRetries: Move to e2e apps package
- CheckForControllerManagerHealthy: Moved to e2e storage package
- ParseKVLines: Removed because of e9345ae5f0
- AddOrUpdateLabelOnNodeAndReturnOldValue: Removed because of ff7b07c43c
1. move GetOneTimeResourceUsageOnNode() from test/e2e/framework/kubelet/stats.go to getOneTimeResourceUsageOnNode() in test/e2e/framework/resource_usage_gatherer.go
2. copy GetKubeletPods() from test/e2e/framework/kubelet/kubelet_pods.go to getKubeletPods() in test/e2e/framework/util.go
Signed-off-by: clarklee92 <clarklee1992@hotmail.com>
- add Policy API to pkg/scheduler/apis/config and staging/src/k8s.io/kube-scheduler/config/v1
- dual-register Policy as apiGroup "v1" and "kubescheduler.config.k8s.io
- move/merge pkg/scheduler/api to pkg/scheduler/apis/config/...
- alias schedulerapi to pkg/scheduler/apis/config
- alias legacyapi to pkg/scheduler/api
- eliminate latest.Codec; use scheme.Codecs instead
- unit tests to verify Policy YAML with version "v1" or "kubescheduler.config.k8s.io/v1" can be loaded properly
- update api/api-rules/violation_exceptions.list
This uses etcd test data to load resources and then ensures that
kubectl get output contains a different set of columns than default.
This assumes that all future API resources will either have appropriate
kubectl get output or add an exception here.
Co-authored-by: Luis Sanchez <sanchezl@redhat.com>
- SimpleGET: Moved to ingress sub package of e2e framework
- PollURL: Moved to ingress sub package of e2e framework
- ProxyMode: Moved to service e2e test package
- ListNamespaceEvents: Moved to e2e_node test package
- NewE2ETestNodePreparer: Removed since 59533f0cd1
In case node object update fails due to revision conflict, it does not make sense to return success.
It needs to be retried so the v1.PreferAvoidPodsAnnotationKey annotation can be properly set in the next round.
This is needed for raw block volumes. It mirrors a change made in the upstream
deployment in https://github.com/kubernetes-csi/csi-driver-host-path/pull/109
Raw block volumes use loop devices under the hood. "losetup --find
--show" uses LOOP_CTL_GET_FREE to get a free loop device. It then
expects to have the corresponding /dev/loopX already available. When
/dev inside the container is a static tmpfs which doesn't already have
those /dev/loop* devices (*) the new device fails to show up,
resulting in:
I1028 13:25:19.937846 1 server.go:117] GRPC call: /csi.v1.Controller/CreateVolume
I1028 13:25:19.938083 1 server.go:118] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"topology.hostpath.csi/node":"pmem-csi-pmem-govm-worker3"}}],"requisite":[{"segments":{"topology.hostpath.csi/node":"pmem-csi-pmem-govm-worker3"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-24985a49-5638-4bf6-b789-bb99a28d1073","volume_capabilities":[{"AccessType":{"Block":{}},"access_mode":{"mode":1}}]}
I1028 13:25:19.961124 1 volume_path_handler_linux.go:41] Creating device for path: /csi-data-dir/635c6569-f986-11e9-baa6-0242ac110004
I1028 13:25:20.391472 1 volume_path_handler_linux.go:75] Failed device create command for path: /csi-data-dir/635c6569-f986-11e9-baa6-0242ac110004 exit status 1 losetup: /csi-data-dir/635c6569-f986-11e9-baa6-0242ac110004: failed to set up loop device: No such file or directory
E1028 13:25:20.392916 1 server.go:121] GRPC error: rpc error: code = Internal desc = failed to create volume 635c6569-f986-11e9-baa6-0242ac110004: failed to attach device /csi-data-dir/635c6569-f986-11e9-baa6-0242ac110004: exit status 1
(*) It seems that the static tmpfs gets populated by Docker based on
what's currently on the host when the container starts. That would
explain why it worked in the Kubernetes Prow testing - the host must
have had enough loop devices already defined.
This updates to the releases meant to be used with Kubernetes 1.16
except for external-snapshotter, which is kept at the more recent
2.0.0-rc1 which targets 1.17.
The new external-attacher v2.0.0 needs updated RBAC rules, copied
verbatim from the v2.0.0 release.
This PR does minimal changes to interface to allow removing all
references to prometheus from `test` directory. In future I would expect
wrapping prometheus samples to provide better abstraction. Changes:
Move generic_metrics.go to testutil/metrics.go
Remove etcd.go as it was not called
Move prometheus label consts to testutil.