A previous PR (#76838) introduced the ability to build and publish
Windows Test Images to kubernetes/test/images/image-util.sh.
Additionally, that PR also configured the Image Promoter to use a
few Windows Remote Docker build nodes to build the Windows Test Images,
however, there is a minor issue: the build container has a different $HOME
folder than expected (is: /builder/home, expected: /root - since it's the
root user), and the Remote Docker credentials are mounted in /root.
Because of that, image-build.sh cannot find the credentials it needs.
This will have to be properly fixed, but for now, we can just skip
the Windows image building part.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- dnsutils
dnsmasq is a Linux specific binary. In order for the tests to also
pass on Windows, CoreDNS should be used instead.
- Search/replace Google Infra kube-cross locations for K8s Infra
- Update kube-cross make targets
- Don't attempt to pre-pull image (docker build --pull)
This prevents CI failures when the image under test doesn't exist
yet in the registry.
- 'make all' now builds and pushes the kube-cross image
- Allow 'TAG' to be specified via env var
- Use 'KUBE_CROSS_VERSION' to represent the kube-cross version
- Tag kube-cross images with both a kubernetes version
('git describe') and a kube-cross version
- Add a GCB (Google Cloud Build) config file (cloudbuild.yaml)
Signed-off-by: Stephen Augustus <saugustus@vmware.com>
We don't want to set the name directly because then starting the pod
can fail when the node is temporarily out of resources
(https://github.com/kubernetes/kubernetes/issues/87855).
For CSI driver deployments, we have three options:
- modify the pod spec with custom code, similar
to how the NodeSelection utility code does it
- add variants of SetNodeSelection and SetNodeAffinity which
work with a pod spec instead of a pod
- change their parameter from pod to pod spec and then use
them also when patching a pod spec
The last approach is used here because it seems more general. There
might be other cases in the future where there's only a pod spec that
needs to be modified.
A previous PR replaced the usage of Redis in the guestbook app test
with Agnhost. The replacement went well for Linux setups and Containers,
which is why the tests are green, but there is a network particularity on
Windows setups which won't allow the test to pass.
The issue was observed with another test hitting the same issue:
https://github.com/kubernetes/kubernetes/issues/83072
Here's exactly what happens during the test:
- frontend containers are created, having the /guestbook endpoint. Its main
purpose is to forward the call to either agnhost-master (cmd=set), or
agnhost-slave (cmd=get).
- agnhost-master container is created, having the /set endpoint, and the
/register endpoint, through which the agnhost-slave containers would
register to it. Its purpose is to propagate all data received through /set
to its clients.
- agnhost-slave containers are created, having the /set and /get endpoints.
They would register to agnhost-master, and then receive any and all updates
from it, which was then served through the /get endpoint.
For simplicity, all 3 types have the same agnhost subcommand (agnhost guestbook), being
able to satisfy its given purpose. For this, HTTP servers were being used, including
for the /register endpoints. agnhost-master would send its /set updates as /set HTTP
requests. However, because of the issue listed above, agnhost-master did not receive
the client's IP, but rather the container host's IP, resulting in the request being
sent to the wrong destination.
This PR updates the agnhost guestbook subcommand. Now, the agnhost subscriber nodes will
send their own IP to the /register endpoint (/endpoint?host=myip).
In order to promote the volume limits e2e test (from CSI Mock driver)
to Conformance, we can't rely on specific output of optional Condition
fields. Thus, this commit changes the test to only check the presence
of the right condition and verify that the optional fields are not empty.
The existing walk.go and conformance.txt have a few shortcomings
which we'd like to resolve:
- difficult to get the full test name due to test context nesting
- complicated AST logic and understanding necessary due to the
different ways a test can be invoked and written
This changes the AST parsing logic to be much more simple and simply
looks for the comments at/around a specific line. This file/line
information (and the full test name) is gathered by a custom ginkgo
reporter which dumps the SpecSummary data to a file.
Also, the SpecSummary dump can, itself, be potentially useful for
other post-processing and debugging tasks.
Signed-off-by: John Schnake <jschnake@vmware.com>
The service session affinity allows to set the maximum session
sticky timeout.
This commit adds e2e tests to check that the session is sticky
before the timeout and is not after.
Executing commands in pods is expensive in terms of time and the
execution time is unpredictable and random.
The session affinity tests send several http requests from a pod
to check that the session is sticky. Instead of executing one
http request at a time, we can execute several requests from the
pod at one time and process the output.
Previously, we've centralized several images into agnhost, including
test-webserver.
The Hybrid cluster network test was using the test-webserver image, and
was updated to use agnhost, but without properly making it so it behaves like
test-webserver, resulting in a failing test.
We have added and enabled the Image Promoter on the k/k test images, which
will build the conformance images after a PR that affects kubernetes/test/images
merges.
We have added support for image-util.sh to handle external Windows Docker connections
in order to build Windows images.
This PR enables the Image Promoter to use some Windows nodes to build the necessary
Windows images.
In order to build Windows container images for multiple OS versions,
--isolation=hyperv is required. However, not all clouds / nodes supports
or have it enabled by default, which is why we're going to rely on
having multiple nodes to build the Windows images, until this issue
is addressed.
This commit adds support for building test images for multiple
Windows versions, as we have to support both LTS and SAC channels.
With this, the format for Windows images in the BASEIMAGE files is:
OS/ARCH/OS_VERSION
Also adds --isolation-hyperv to the Windows docker build command, making sure
that container images for multiple OS versions can be built using the same
Windows node.
Adds Windows support to the test/images/image-util.sh script.
A Windows node with Docker installed is required to build Windows images.
The connection URL to it must be set in the REMOTE_DOCKER_URL env variable.
Additionally, the authentication to the remote docker node is done through
certificates, which must be found in ~/.docker.
By default, the REMOTE_DOCKER_URL env variable is set to "" in the Makefile,
and because of it, the image-util.sh script will skip building and pushing
Windows images.
Added GOOS argument to the go build process in order to be able to build
Windows binaries. Additionally, the OS env variable was added to the images
Makefiles (default value is "linux") in order to maintain default behaviour.
Some images require a different Dockerfile for Windows images, since they
have different ways of installing dependencies. Because of this, if a image
needs to be built for Windows, it will first check for a Dockerfile_windows
file instead of the default one. If there isn't one, it means that the
same Dockerfile can be used for both Windows and Linux.
All Windows images will be based on the image
"mcr.microsoft.com/windows/servercore:ltsc2019". There are a couple of features
that are needed from this image, especially powershell.
Added busybox image for Windows. Most Windows images will be based on it, which
will help reduce the command line differences between Linux and Windows, but
not entirely.
Added Windows support for agnhost image.
Changes the image naming template from:
$REGISTRY/$image-$arch:$TAG
to
$REGISTRY/$image:$TAG-$os_name-$arch
The previous naming template would generate a plethora of images (Ai * N images,
where Ai is the number of OS/architectures for the image i and N is the number
of images), while the new naming template will reduce the number of images to N.
The new template also includes the OS name, as we plan to integrate Windows
images into the manifest lists as well.
When building images, their REGISTRY can be set to a custom
one, instead of the default "gcr.io/kubernetes-e2e-test-images" or
"us.gcr.io/k8s-artifacts-prod/e2e-test-images".
Some images are based on other images we're already building
(e.g.: kitten, nautilus), but their base images
are set in the default registry name, which can be undesirable.
This commit addresses this issue.
Windows images will require other base images, and thus, we will need
to explicitly specify the OS type a base image is for in order to
avoid confusion or errors.
The way the images are built is going to be changed, and in order to avoid
overwritting and breaking the current images, the image versions are bumped.
Similar functionality is required across e2e tests for RuntimeClass.
Let's create runtimeclass as part of the framework/node package.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Use bytes instead of strings, and slice in-place filter
(see https://github.com/golang/go/wiki/SliceTricks#filter-in-place)
to avoid copying strings around.
In my benchmark it shows almost 2x improvement:
BenchmarkString-8 1477207 10198 ns/op
BenchmarkBuffer-8 1561291 7622 ns/op
BenchmarkInPlace-8 2295714 5202 ns/op
String is the original implementation, Buffer is an intermediary
one that uses strings.Builder, and InPlace is the one from this commit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Add a new e2e test to test the Except clauses in IPBlock CIDR
based NetworkPolicies. This test adds an egress rule which
allows client to connect to a CIDR which includes the
ServerPod's IP, however carves an except subnet which excludes
this ServerPod.
The test "should enforce egress policy allowing traffic to a server in a
different namespace based on PodSelector and NamespaceSelector
[Feature:NetworkPolicy]" is flaky because it doesn't wait for the server
Pod to be ready before testing traffic via its service, then even the
NetworkPolicy allows it, the SYN packets will be rejected by iptables
because the service has no endpoints at that moment.
This PR fixes it by making it wait for Pods to be ready like other
tests.
Due to an oversight, the e2e topology manager tests
were leaking a configmap and a serviceaccount.
This patch ensures a proper cleanup
Signed-off-by: Francesco Romani <fromani@redhat.com>
Up until now, the test validated the alignment of resources
only in the first container in a pod. That was just an overlook.
With this patch, we validate all the containers in a given pod.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add autodetection code to figure out on which NUMA node are
the devices attached to.
This autodetection work under the assumption all the VFs in
the system must be used for the tests.
Should not this be the case, or in general to handle non-trivial
configurations, we keep the annotations mechanism added to the
SRIOV device plugin config map.
Signed-off-by: Francesco Romani <fromani@redhat.com>
On single-NUMA node systems the numa_node of sriov devices was
sometimes reported as "-1" instead of, say, 0. This makes some
tests that should succeed[0] fail unexpectedly.
The reporting works as expected on real multi-NUMA node systems.
This small workaround was added to handle this corner case,
but it makes overall the code less readable and a bit too lenient,
hence we remove it.
+++
[0] on a single NUMA node system some resources are obviously
always aligned if the pod can be admitted. It boils down to the
node capacity at pod admittal time.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The e2e_node topology_manager check have a early, quick check
to rule out systems without sriov device, thus skipping the tests.
The first version of the ckeck detected PFs, (Physical Functions),
under the assumption that VFs (Virtual Functions) were already been
created. This works because, obviously, you can't have VFs without PFs.
However, it's a little safer and easier to understand if we check
firectly for VFs, bailing out from systems which don't provide them.
Nothing changes for properly configured test systems.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This is useful for logs from daemonset pods because for those it is
often relevant which node they ran on because they interact with
resources or other pods on the host.
To keep the log prefix short, it gets limited to a maximum length of
10 characters.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- resource-consumer-controller
- test-webserver
Commit 69a473be3 broke the test case (that was checking
that a file from a volume can't be read) by adding a
(wrong) assumption that error should be nil.
Fix the assumption (we do expect the error here).
Note that we were checking file contents before; now when
we check the error from read, checking the contents is
redundant.
The other issue with the test is SELinux should be in enforcing
mode for this to work, so let's check that first to avoid
false positives.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Address review comments and move the helper function
in the `framework/kubelet` package to avoid circular deps
(see https://github.com/kubernetes/kubernetes/issues/81245)
Signed-off-by: Francesco Romani <fromani@redhat.com>
The test "should allow ingress access from updated pod" fails regardless
of which CNI plugin is enabled. It's because the test assumes the client
Pod can recheck connectivity after updating its label, but the client
won't restart after the first failure, so the second check will always
fail. The PR creates a client Pod with OnFailure RestartPolicy to fix it.
In addition to the above test that checks rule selector takes effect on
updated client pod, the PR adds a test "should deny ingress access to
updated pod" to ensure network policy selector can take effect on updated
server pod.
In addition to getting overall performance measurements from golang benchmark,
collect metrics that provides information about insides of the scheduler itself.
This is a first step towards improving what we collect about the scheduler.
Metrics in question:
- scheduler_scheduling_algorithm_predicate_evaluation_seconds
- scheduler_scheduling_algorithm_priority_evaluation_seconds
- scheduler_binding_duration_seconds
- scheduler_e2e_scheduling_duration_seconds
Scheduling throughput is computed on the fly inside perfScheduling.
this patch moves the helper getCurrentKubeletConfig function,
used in both e2e and e2e_node tests and previously duplicated,
in the common framework.
There are no intended changes in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
All tests remove the test client pod, usually in TestVolumeClient.
Rename TestCleanup to TestServerCleanup.
In addition, remove few calls to Test(Server)Cleanup that do not do anything
useful (server pod is not used in these tests).