WaitForPod*() are just wrapper functions for e2epod package, and they
made an invalid dependency to sub e2e framework from the core framework.
So this replaces WaitForPodTerminated() with the e2epod function.
and they made an invalid dependency to sub e2e framework from the core framework.
So we can use e2epod.WaitTimeoutForPodReadyInNamespace to remove invalid dependency.
The main purpose of this pr is to handle the framework core package dependency subpackage pod.
WaitForPod*() are just wrapper functions for e2epod package, and they
made an invalid dependency to sub e2e framework from the core framework.
So this replaces WaitForPodNoLongerRunning() with the e2epod function.
The e2e framework package podlogs is used in e2e/storage/testsuites
only. In addition we considered we should have a single e2e framework
package for pod without the podlogs. So this moves the podlogs into
e2e/storage/podlogs for the e2e storage tests.
Windows test "[sig-windows] [Feature:Windows] Cpu Resources Container
limits should not be exceeded after waiting 2 minutes" should be run
serially to prevent flakyness.
WaitForPod*() are just wrapper functions for e2epod package, and they
made an invalid dependency to sub e2e framework from the core framework.
So this replaces WaitForPodRunning() with the e2epod function.
Adds splitOsArch function to image-util.sh, which makes the script DRY-er.
When building a Windows test image, if REMOTE_DOCKER_URL is not set, skip the rest of the
building process for that image, which will save some time (no need to build binaries).
If a REMOTE_DOCKER_URL was not set for a particular OS version, exclude that image from the
manifest list. This fixes an issue where, if REMOTE_DOCKER_URL was not set for Windows Server 1909,
the Windows were completely excluded from the manifest list, including for Windows Server 1809
and 1903 which could have been built and pushed.
Sets "test-webserver" as the default CMD for kitten and nautilus. Since they are now based on
agnhost, they should be set to run test-webserver to maintain previous behaviour.
So multiple instances of kube-apiserver can bind on the same address and
port, to provide seamless upgrades.
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This change removes support for basic authn in v1.19 via the
--basic-auth-file flag. This functionality was deprecated in v1.16
in response to ATR-K8S-002: Non-constant time password comparison.
Similar functionality is available via the --token-auth-file flag
for development purposes.
Signed-off-by: Monis Khan <mok@vmware.com>
Some e2e tests depend on the controller-manager to expose metrics
on the path /metrics.
It may happen that when the test runs, the pod is not available or the
URL not ready, causing it to fail.
Previously, the test were waiting until the pod was running, but we
need to wait until the /metrics URL is ready.
The MetricsGrabber may use the controller-manager pod
to gather metrics, however, it doesn't wait until
it is ready to serve, failing the test if this is the
case.
We wait until the controller-manager pod is running
before trying to get metrics from it.
There were framework.ExpectNoError(fmt.Errorf(..)) calls which just
raise an exception without actual value checks, they just raised the
specified error messages. These usages of framework.ExpectNoError()
seemed a little tricky, so this replaces them with corresponding check
functions for the readability.
The configuration file was design as a yaml file on purpose.
To easily extend the test cases without a need to modify
the testing binary. Also, it's possible to extend the configuration
itself to enrich individual test cases.
The kubelet can race when a pod is deleted and report that a container succeeded
when it instead failed, and thus the pod is reported as succeeded. Create an e2e
test that demonstrates this failure.
After moving Permit() to the scheduling cycle test PermitPlugin should
no longer wait inside Permit() for another pod to enter Permit() and become waiting pod.
In the past this was a way to make test work regardless of order in
which pods enter Permit(), but now only one Permit() can be executed at
any given moment and waiting for another pod to enter Permit() inside
Permit() leads to timeouts.
In this change waitAndRejectPermit and waitAndAllowPermit flags make first
pod to enter Permit() a waiting pod and second pod to enter Permit()
either rejecting or allowing pod.
Mentioned in #88469
Extends agnhost with the capability to validate a mounted token against
the API server's OIDC endpoints.
Co-authored-by: Michael Taufen <mtaufen@google.com>
Close outbound connections when using a cert callback and certificates rotate. This means that we won't get into a situation where we have open TLS connections using expires certs, which would get unauthorized errors at the apiserver
Attempt to retrieve a new certificate if open connections near expiry, to prevent the case where the cert expires but we haven't yet opened a new TLS connection and so GetClientCertificate hasn't been called.
Move certificate rotation logic to a separate function
Rely on generic transport approach to handle closing TLS client connections in exec plugin; no need to use a custom dialer as this is now the default behaviour of the transport when faced with a cert callback. As a result of handling this case, it is now safe to apply the transport approach even in cases where there is a custom Dialer (this will not affect kubelet connrotation behaviour, because that uses a custom transport, not just a dialer).
Check expiry of the full TLS certificate chain that will be presented, not only the leaf. Only do this check when the certificate actually rotates. Start the certificate as a zero value, not nil, so that we don't see a rotation when there is in fact no client certificate
Drain the timer when we first initialize it, to prevent immediate rotation. Additionally, calling Stop() on the timer isn't necessary.
Don't close connections on the first 'rotation'
Remove RotateCertFromDisk and RotateClientCertFromDisk flags.
Instead simply default to rotating certificates from disk whenever files are exclusively provided.
Add integration test for client certificate rotation
Simplify logic; rotate every 5 mins
Instead of trying to be clever and checking for rotation just before an
expiry, let's match the logic of the new apiserver cert rotation logic
as much as possible. We write a controller that checks for rotation
every 5 mins. We also check on every new connection.
Respond to review
Fix kubelet certificate rotation logic
The kubelet rotation logic seems to be broken because it expects its
cert files to end up as cert data whereas in fact they end up as a
callback. We should just call the tlsConfig GetCertificate callback
as this obtains a current cert even in cases where a static cert is
provided, and check that for validity.
Later on we can refactor all of the kubelet logic so that all it does is
write files to disk, and the cert rotation work does the rest.
Only read certificates once a second at most
Respond to review
1) Don't blat the cert file names
2) Make it more obvious where we have a neverstop
3) Naming
4) Verbosity
Avoid cache busting
Use filenames as cache keys when rotation is enabled, and add the
rotation later in the creation of the transport.
Caller should start the rotating dialer
Add continuous request rotation test
Rebase: use context in List/Watch
Swap goroutine around
Retry GETs on net.IsProbableEOF
Refactor certRotatingDialer
For simplicity, don't affect cert callbacks
To reduce change surface, lets not try to handle the case of a changing
GetCert callback in this PR. Reverting this commit should be sufficient
to handle that case in a later PR.
This PR will focus only on rotating certificate and key files.
Therefore, we don't need to modify the exec auth plugin.
Fix copyright year
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
This is gross but because NewDeleteOptions is used by various parts of
storage that still pass around pointers, the return type can't be
changed without significant refactoring within the apiserver. I think
this would be good to cleanup, but I want to minimize apiserver side
changes as much as possible in the client signature refactor.
The condition was not part of the message and so would not
match:
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/128aea1f-bde3-43d5-8b5f-dd86b9a5ef33/volumes/kubernetes.io~secret/default-token-v55hm\\\" to rootfs \\\"/var/lib/docker/overlay2/813487ba91d534ded546ae34f2a05e7d94c26bd015d356f9b2641522d8f0d6da/merged\\\" at \\\"/var/run/secrets/kubernetes.io/serviceaccount\\\" caused \\\"stat /var/lib/kubelet/pods/128aea1f-bde3-43d5-8b5f-dd86b9a5ef33/volumes/kubernetes.io~secret/default-token-v55hm: no such file or directory\\\"\"": unknown
Updated the check and regex.
Make sure the SR-IOV device plugin is ready, and that
there are enough SR-IOV devices allocatable before
spinning up test pods.
Signed-off-by: vpickard <vpickard@redhat.com>
1. move the integration test of TaintBasedEvictions to test/integration/node
2. move the e2e test of TaintBasedEvictions e2e test/e2e/node
3. modify the conformance file to adapt the TaintBasedEviction test
The current agnhost version is 2.12, 2.11 was not previously built as the
VERSION bumps merged one after the other, and the Image Promoter did not get to
build the 2.11 image.
In the current version, due to how make works, when building all the conformance
images (make all-push WHAT=all-conformance), ALL the images are being built first
before being pushed.
This PR will allow images to be built and pushed immediately afterwards, so the first
images that have been succesfully built are already pushed and promotable, even if
the the task failed on the last image, or it timed out.
A previous PR (#76838) introduced the ability to build and publish
Windows Test Images to kubernetes/test/images/image-util.sh.
Additionally, that PR also configured the Image Promoter to use a
few Windows Remote Docker build nodes to build the Windows Test Images,
however, there is a minor issue: the build container has a different $HOME
folder than expected (is: /builder/home, expected: /root - since it's the
root user), and the Remote Docker credentials are mounted in /root.
Because of that, image-build.sh cannot find the credentials it needs.
This will have to be properly fixed, but for now, we can just skip
the Windows image building part.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- dnsutils
dnsmasq is a Linux specific binary. In order for the tests to also
pass on Windows, CoreDNS should be used instead.
- Search/replace Google Infra kube-cross locations for K8s Infra
- Update kube-cross make targets
- Don't attempt to pre-pull image (docker build --pull)
This prevents CI failures when the image under test doesn't exist
yet in the registry.
- 'make all' now builds and pushes the kube-cross image
- Allow 'TAG' to be specified via env var
- Use 'KUBE_CROSS_VERSION' to represent the kube-cross version
- Tag kube-cross images with both a kubernetes version
('git describe') and a kube-cross version
- Add a GCB (Google Cloud Build) config file (cloudbuild.yaml)
Signed-off-by: Stephen Augustus <saugustus@vmware.com>
We don't want to set the name directly because then starting the pod
can fail when the node is temporarily out of resources
(https://github.com/kubernetes/kubernetes/issues/87855).
For CSI driver deployments, we have three options:
- modify the pod spec with custom code, similar
to how the NodeSelection utility code does it
- add variants of SetNodeSelection and SetNodeAffinity which
work with a pod spec instead of a pod
- change their parameter from pod to pod spec and then use
them also when patching a pod spec
The last approach is used here because it seems more general. There
might be other cases in the future where there's only a pod spec that
needs to be modified.
A previous PR replaced the usage of Redis in the guestbook app test
with Agnhost. The replacement went well for Linux setups and Containers,
which is why the tests are green, but there is a network particularity on
Windows setups which won't allow the test to pass.
The issue was observed with another test hitting the same issue:
https://github.com/kubernetes/kubernetes/issues/83072
Here's exactly what happens during the test:
- frontend containers are created, having the /guestbook endpoint. Its main
purpose is to forward the call to either agnhost-master (cmd=set), or
agnhost-slave (cmd=get).
- agnhost-master container is created, having the /set endpoint, and the
/register endpoint, through which the agnhost-slave containers would
register to it. Its purpose is to propagate all data received through /set
to its clients.
- agnhost-slave containers are created, having the /set and /get endpoints.
They would register to agnhost-master, and then receive any and all updates
from it, which was then served through the /get endpoint.
For simplicity, all 3 types have the same agnhost subcommand (agnhost guestbook), being
able to satisfy its given purpose. For this, HTTP servers were being used, including
for the /register endpoints. agnhost-master would send its /set updates as /set HTTP
requests. However, because of the issue listed above, agnhost-master did not receive
the client's IP, but rather the container host's IP, resulting in the request being
sent to the wrong destination.
This PR updates the agnhost guestbook subcommand. Now, the agnhost subscriber nodes will
send their own IP to the /register endpoint (/endpoint?host=myip).
In order to promote the volume limits e2e test (from CSI Mock driver)
to Conformance, we can't rely on specific output of optional Condition
fields. Thus, this commit changes the test to only check the presence
of the right condition and verify that the optional fields are not empty.
The existing walk.go and conformance.txt have a few shortcomings
which we'd like to resolve:
- difficult to get the full test name due to test context nesting
- complicated AST logic and understanding necessary due to the
different ways a test can be invoked and written
This changes the AST parsing logic to be much more simple and simply
looks for the comments at/around a specific line. This file/line
information (and the full test name) is gathered by a custom ginkgo
reporter which dumps the SpecSummary data to a file.
Also, the SpecSummary dump can, itself, be potentially useful for
other post-processing and debugging tasks.
Signed-off-by: John Schnake <jschnake@vmware.com>
The service session affinity allows to set the maximum session
sticky timeout.
This commit adds e2e tests to check that the session is sticky
before the timeout and is not after.
Executing commands in pods is expensive in terms of time and the
execution time is unpredictable and random.
The session affinity tests send several http requests from a pod
to check that the session is sticky. Instead of executing one
http request at a time, we can execute several requests from the
pod at one time and process the output.
The image "gcr.io/authenticated-image-pulling/windows-nanoserver:v1" is not a
manifest list, and it is only useful for Windows Server 1809, which means that the
test "should be able to pull from private registry with secret" will fail for
environments with Windows Server 1903, 1909, or any other future version we might
want to test.
This commit adds the the ability to have an alternative private image to pull by
using a configurable docker config file which contains the necessary credentials
needed to pull the image.
Add a new e2e test to test multiple stacked NetworkPolicies with
Except clauses in IPBlock which overlaps with an allowed CIDR in
another NetworkPolicy. This test ensures that the order of the
creation of NetworkPolicies should not matter while evaluating
a Pods access to another Pod.
Previously, we've centralized several images into agnhost, including
test-webserver.
The Hybrid cluster network test was using the test-webserver image, and
was updated to use agnhost, but without properly making it so it behaves like
test-webserver, resulting in a failing test.
We have added and enabled the Image Promoter on the k/k test images, which
will build the conformance images after a PR that affects kubernetes/test/images
merges.
We have added support for image-util.sh to handle external Windows Docker connections
in order to build Windows images.
This PR enables the Image Promoter to use some Windows nodes to build the necessary
Windows images.
In order to build Windows container images for multiple OS versions,
--isolation=hyperv is required. However, not all clouds / nodes supports
or have it enabled by default, which is why we're going to rely on
having multiple nodes to build the Windows images, until this issue
is addressed.
This commit adds support for building test images for multiple
Windows versions, as we have to support both LTS and SAC channels.
With this, the format for Windows images in the BASEIMAGE files is:
OS/ARCH/OS_VERSION
Also adds --isolation-hyperv to the Windows docker build command, making sure
that container images for multiple OS versions can be built using the same
Windows node.
Adds Windows support to the test/images/image-util.sh script.
A Windows node with Docker installed is required to build Windows images.
The connection URL to it must be set in the REMOTE_DOCKER_URL env variable.
Additionally, the authentication to the remote docker node is done through
certificates, which must be found in ~/.docker.
By default, the REMOTE_DOCKER_URL env variable is set to "" in the Makefile,
and because of it, the image-util.sh script will skip building and pushing
Windows images.
Added GOOS argument to the go build process in order to be able to build
Windows binaries. Additionally, the OS env variable was added to the images
Makefiles (default value is "linux") in order to maintain default behaviour.
Some images require a different Dockerfile for Windows images, since they
have different ways of installing dependencies. Because of this, if a image
needs to be built for Windows, it will first check for a Dockerfile_windows
file instead of the default one. If there isn't one, it means that the
same Dockerfile can be used for both Windows and Linux.
All Windows images will be based on the image
"mcr.microsoft.com/windows/servercore:ltsc2019". There are a couple of features
that are needed from this image, especially powershell.
Added busybox image for Windows. Most Windows images will be based on it, which
will help reduce the command line differences between Linux and Windows, but
not entirely.
Added Windows support for agnhost image.
Changes the image naming template from:
$REGISTRY/$image-$arch:$TAG
to
$REGISTRY/$image:$TAG-$os_name-$arch
The previous naming template would generate a plethora of images (Ai * N images,
where Ai is the number of OS/architectures for the image i and N is the number
of images), while the new naming template will reduce the number of images to N.
The new template also includes the OS name, as we plan to integrate Windows
images into the manifest lists as well.
When building images, their REGISTRY can be set to a custom
one, instead of the default "gcr.io/kubernetes-e2e-test-images" or
"us.gcr.io/k8s-artifacts-prod/e2e-test-images".
Some images are based on other images we're already building
(e.g.: kitten, nautilus), but their base images
are set in the default registry name, which can be undesirable.
This commit addresses this issue.
Windows images will require other base images, and thus, we will need
to explicitly specify the OS type a base image is for in order to
avoid confusion or errors.
The way the images are built is going to be changed, and in order to avoid
overwritting and breaking the current images, the image versions are bumped.
Similar functionality is required across e2e tests for RuntimeClass.
Let's create runtimeclass as part of the framework/node package.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Use bytes instead of strings, and slice in-place filter
(see https://github.com/golang/go/wiki/SliceTricks#filter-in-place)
to avoid copying strings around.
In my benchmark it shows almost 2x improvement:
BenchmarkString-8 1477207 10198 ns/op
BenchmarkBuffer-8 1561291 7622 ns/op
BenchmarkInPlace-8 2295714 5202 ns/op
String is the original implementation, Buffer is an intermediary
one that uses strings.Builder, and InPlace is the one from this commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Add a new e2e test to test the Except clauses in IPBlock CIDR
based NetworkPolicies. This test adds an egress rule which
allows client to connect to a CIDR which includes the
ServerPod's IP, however carves an except subnet which excludes
this ServerPod.