* Rename const for topology.../zone
* Rename const for topology.../region
* Rename const for failure-domain.../zone
* Rename const for failure-domain.../region
* Restore old names for compat
This reverts commit 0ed8fd6dc9.
It turns out that ExternalIPs are not allowed to be reachable from
pods until the IP is present in the node.
However, due to a kube-proxy limitation it was working in environment
that used CNIs without bridges for the pods.
Currently, the Image Builder job is failing as it cannot build images
for other architecture types. This happens because the Image Builder image
does not have any of the expected qemu-* binaries in /usr/bin/ needed in order to
run qemu-binfmt-conf.sh with the -p yes flag, so that flag is removed.
docker buildx requires DOCKER_CLI_EXPERIMENTAL=enabled to be set
in order to be used.
This environment variable is not getting plumbed through from the
test/images/cloudbuild.yaml file, causing the docker buildx commands
to fail.
The default cloudbuild has HOME=/builder/home and docker buildx is in /root/.docker/cli-plugins/docker-buildx
We need to set the Home to /root explicitly since we're using docker buildx
Exec is a utility function but, if you call it and are already planning to either suppress or print the exec command before hand, its ability to log can be redundant or a hinderance on test readability
- re-enable e2e_node services
- call GenerateSecureToken for e2e_node Conformance test-suite
- add log messages indicating location in process
- move log messages to some more accurate locations
Since the insecure port of apiserver has been disabled in e2e node tests,
we could create a service account in the test for node problem detector
and then bind the cluster role `system:node-problem-detector` with this
service account.
Signed-off-by: knight42 <anonymousknight96@gmail.com>
Provides a response that includes a body and a method. This response
will enable a client (e2e test) to confirm that a proxy did not alter
the http method.
Service has had a problem since forever:
- User creates a service type=LoadBalancer
- We silently allocate them a NodePort
- User changes type to ClusterIP
- We fail the operation because they did not clear NodePort
They never asked for or used the NodePort!
Dual-stack introduced some dependent fields that get auto-wiped on
updates. This carries it further.
If you squint, you can see Service as a big, messy discriminated union,
with type as the discriminator. Ignoring fields for non-selected
union-modes seems right.
This introduces the potential for an apply loop. Specifically, we will
accept YAML that we did not previously accept. Apply could see the
field in local YAML and not in the server and repeatedly try to patch it
in. But since that YAML is currently an error, it seems like a very low
risk. Almost nobody actually specifies their own NodePort values.
To mitigate this somewhat, we only auto-wipe on updates. The same YAML
would fail to create. This is a little inconsistent. We could
auto-wipe on create, too, at the risk of more potential impact.
To do this properly, we need to know the old and new values, which means
we can not do it in defaulting or conversion. So we do it in strategy.
This change also adds unit tests and updates e2e tests to rely on and
verify this behavior.
add integration tests to verify the behaviour of the endpoints
and endpointslices controller with dual stack services.
Since services can be single or dual stack, endpoints should be
generated for each IP family in the endpoint slice controller.
The legacy endpoint controller only will generate endpoints
in the first IP family configured in the service.
integration fix
we are missing tests that check the connectivity against services
that have backend pods with hostNetwork: true.
Because the tests run in parallel, it is possible that the pods used as
backends try to bind to the same port, and since all of them use the
host network, the scheduler will fail to create them due to port conflicts,
so we run them serially.
We have to skip networking tests with udp and endpoints using
hostNetwork, because they have a known issue.
NetworkingTest is used to test different network scenarios.
Since new capabilites and scenarios are added, like SCTP or HostNetwork
for pods, we need a way to configure it with minimum disruption and code
changes.
Go idiomatic way to achieve this is using functional options.
the e2e test container used for the "Networking Granular Checks: Services"
tests only needs to listen in one port to perform do network checks.
This port is unrelated to the other ports used in the test, so we may
use a different number to avoid possible conflicts.
* api: structure change
* api: defaulting, conversion, and validation
* [FIX] validation: auto remove second ip/family when service changes to SingleStack
* [FIX] api: defaulting, conversion, and validation
* api-server: clusterIPs alloc, printers, storage and strategy
* [FIX] clusterIPs default on read
* alloc: auto remove second ip/family when service changes to SingleStack
* api-server: repair loop handling for clusterIPs
* api-server: force kubernetes default service into single stack
* api-server: tie dualstack feature flag with endpoint feature flag
* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service
* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service
* kube-proxy: feature-flag, utils, proxier, and meta proxier
* [FIX] kubeproxy: call both proxier at the same time
* kubenet: remove forced pod IP sorting
* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy
* e2e: fix tests that depends on IPFamily field AND add dual stack tests
* e2e: fix expected error message for ClusterIP immutability
* add integration tests for dualstack
the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:
- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.
The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:
- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4
* [FIX] add integration tests for dualstack
* generated data
* generated files
Co-authored-by: Antonio Ojea <aojea@redhat.com>
The kube_proxy SIGDescribe previously only had Network in the title
and made it more difficult to select just the test cases in the
kube_proxy file and would end up running anything with Network in the
text area of SIGDescribe e2e tests.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
Simulating a cluster with 500 nodes in 3 zones, deploying 3, 12 and 27 Pods belonging to the same service.
Change-Id: I16425594012ea7bd24b888acedb12958360bff97
A spreading test is more meaningful with a greater number of Pods. However, we cannot always expect perfect spreading. We accept a skew of 2 for 5*z Pods, where z is the number of zones.
Change-Id: Iab0de06a95974fbfec604f003b550f15db618ebd
Due to a rebase glitch the fmt.Sprintf() was lost.
This patches restores it improving the logs readability.
Signed-off-by: Francesco Romani <fromani@redhat.com>
CRUD operations are the extent of conformance testing that we can add
for NetworkPolicy tests since we require a 3rd party plugin like CNI
for enforcement.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
Adds Windows support for most of the images.
Adds a README explaining the image building process, including the
Windows Container image building process.
A previous commit added a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
The CreateSync method includes the waiting for the pod to become running
and returns a fresh new pod instance.
In addition, errors are asserted in the method.
Therefore, there is no need for the callers to repeat these operations.
Some, like the error assertions, will never be reached in case they
occur as they will explode from within the method itself.
Signed-off-by: Edward Haas <edwardh@redhat.com>
I've observed this test occasionally failing due to 403 errors. I think there's something racing within apiserver w/ respect to RBAC and that if this test were more patient, then it would not flake this way.
- Due to performance issues, service controller updates are slow
in large clusters, causing failing tests. Tag can be removed once
performance issues are resolved
Currently, some of the E2E test images have Windows support, and one of the goals is for most of
them to have Windows support. For that, the Image Builder is currently building those Windows
container images using a few Windows Server nodes (for 1809, 1903, 1909) with Remote Docker
enabled which are hosted on an azure subscription dedicated for CNCF.
With this, the Windows nodes dependency is removed entirely, as the images can be also built with
docker buildx. One additional benefit to this is that adding new supported Windows OS versions
to the E2E test images manifest lists becomes a lot easier (we wouldn't have to create a new Windows
Server node that matches that new OS version, assign DNS name, update certificates, etc.), and it
also becomes easier for other people to build their own E2E windows test images.
However, some dependencies are still required to run on a Windows machine. To solve this, we can
just pull helper images: e2eteam/powershell-helper:6.2.7 and e2eteam/busybox-helper:1.29.0. Their
Dockerfiles and a Makefile for them has been included in this commit. If any change is required to
them, then a new image will be built and tagged under a different version, but they are pretty
straight-forward and shouldn't require changes.
However, there is a small concern when it comes to the build time: Windows servercore images are
very large (for example, mcr.microsoft.com/windows/servercore:ltsc2019 is 4.99GB uncompressed, and
about ~2 GB compressed - those images are already cached on the Windows Server builder nodes, so
this isn't an issue there), and we currently support 1809, 1903, and 1909 (soon to add 2004).
This can lead to build times that are too big.
We have changed the base image to nanoserver (uncompressed size: 250MB), but some images still
require some DLLs or some other dependencies that can be fetched from a servercore image.
A separate job has been defined that would build a scratch windows-servercore-cache image monthly,
and then we can just get those dependencies from this cache, which will be very small.
This would be preferred, as the Windows images update periodically, and those dependencies
could be updated as well.
We need to make sure we tear down the sriov device plugin pod
should the tests fail, to avoid leaking pods in the test environment.
Signed-off-by: Francesco Romani <fromani@redhat.com>