The default cloudbuild has HOME=/builder/home and docker buildx is in /root/.docker/cli-plugins/docker-buildx
We need to set the Home to /root explicitly since we're using docker buildx
Exec is a utility function but, if you call it and are already planning to either suppress or print the exec command before hand, its ability to log can be redundant or a hinderance on test readability
- re-enable e2e_node services
- call GenerateSecureToken for e2e_node Conformance test-suite
- add log messages indicating location in process
- move log messages to some more accurate locations
Since the insecure port of apiserver has been disabled in e2e node tests,
we could create a service account in the test for node problem detector
and then bind the cluster role `system:node-problem-detector` with this
service account.
Signed-off-by: knight42 <anonymousknight96@gmail.com>
Provides a response that includes a body and a method. This response
will enable a client (e2e test) to confirm that a proxy did not alter
the http method.
Service has had a problem since forever:
- User creates a service type=LoadBalancer
- We silently allocate them a NodePort
- User changes type to ClusterIP
- We fail the operation because they did not clear NodePort
They never asked for or used the NodePort!
Dual-stack introduced some dependent fields that get auto-wiped on
updates. This carries it further.
If you squint, you can see Service as a big, messy discriminated union,
with type as the discriminator. Ignoring fields for non-selected
union-modes seems right.
This introduces the potential for an apply loop. Specifically, we will
accept YAML that we did not previously accept. Apply could see the
field in local YAML and not in the server and repeatedly try to patch it
in. But since that YAML is currently an error, it seems like a very low
risk. Almost nobody actually specifies their own NodePort values.
To mitigate this somewhat, we only auto-wipe on updates. The same YAML
would fail to create. This is a little inconsistent. We could
auto-wipe on create, too, at the risk of more potential impact.
To do this properly, we need to know the old and new values, which means
we can not do it in defaulting or conversion. So we do it in strategy.
This change also adds unit tests and updates e2e tests to rely on and
verify this behavior.
add integration tests to verify the behaviour of the endpoints
and endpointslices controller with dual stack services.
Since services can be single or dual stack, endpoints should be
generated for each IP family in the endpoint slice controller.
The legacy endpoint controller only will generate endpoints
in the first IP family configured in the service.
integration fix
we are missing tests that check the connectivity against services
that have backend pods with hostNetwork: true.
Because the tests run in parallel, it is possible that the pods used as
backends try to bind to the same port, and since all of them use the
host network, the scheduler will fail to create them due to port conflicts,
so we run them serially.
We have to skip networking tests with udp and endpoints using
hostNetwork, because they have a known issue.
NetworkingTest is used to test different network scenarios.
Since new capabilites and scenarios are added, like SCTP or HostNetwork
for pods, we need a way to configure it with minimum disruption and code
changes.
Go idiomatic way to achieve this is using functional options.
the e2e test container used for the "Networking Granular Checks: Services"
tests only needs to listen in one port to perform do network checks.
This port is unrelated to the other ports used in the test, so we may
use a different number to avoid possible conflicts.
* api: structure change
* api: defaulting, conversion, and validation
* [FIX] validation: auto remove second ip/family when service changes to SingleStack
* [FIX] api: defaulting, conversion, and validation
* api-server: clusterIPs alloc, printers, storage and strategy
* [FIX] clusterIPs default on read
* alloc: auto remove second ip/family when service changes to SingleStack
* api-server: repair loop handling for clusterIPs
* api-server: force kubernetes default service into single stack
* api-server: tie dualstack feature flag with endpoint feature flag
* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service
* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service
* kube-proxy: feature-flag, utils, proxier, and meta proxier
* [FIX] kubeproxy: call both proxier at the same time
* kubenet: remove forced pod IP sorting
* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy
* e2e: fix tests that depends on IPFamily field AND add dual stack tests
* e2e: fix expected error message for ClusterIP immutability
* add integration tests for dualstack
the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:
- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.
The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:
- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4
* [FIX] add integration tests for dualstack
* generated data
* generated files
Co-authored-by: Antonio Ojea <aojea@redhat.com>
The kube_proxy SIGDescribe previously only had Network in the title
and made it more difficult to select just the test cases in the
kube_proxy file and would end up running anything with Network in the
text area of SIGDescribe e2e tests.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
Simulating a cluster with 500 nodes in 3 zones, deploying 3, 12 and 27 Pods belonging to the same service.
Change-Id: I16425594012ea7bd24b888acedb12958360bff97
A spreading test is more meaningful with a greater number of Pods. However, we cannot always expect perfect spreading. We accept a skew of 2 for 5*z Pods, where z is the number of zones.
Change-Id: Iab0de06a95974fbfec604f003b550f15db618ebd
Due to a rebase glitch the fmt.Sprintf() was lost.
This patches restores it improving the logs readability.
Signed-off-by: Francesco Romani <fromani@redhat.com>
CRUD operations are the extent of conformance testing that we can add
for NetworkPolicy tests since we require a 3rd party plugin like CNI
for enforcement.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
Adds Windows support for most of the images.
Adds a README explaining the image building process, including the
Windows Container image building process.
A previous commit added a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
The CreateSync method includes the waiting for the pod to become running
and returns a fresh new pod instance.
In addition, errors are asserted in the method.
Therefore, there is no need for the callers to repeat these operations.
Some, like the error assertions, will never be reached in case they
occur as they will explode from within the method itself.
Signed-off-by: Edward Haas <edwardh@redhat.com>
I've observed this test occasionally failing due to 403 errors. I think there's something racing within apiserver w/ respect to RBAC and that if this test were more patient, then it would not flake this way.
- Due to performance issues, service controller updates are slow
in large clusters, causing failing tests. Tag can be removed once
performance issues are resolved
Currently, some of the E2E test images have Windows support, and one of the goals is for most of
them to have Windows support. For that, the Image Builder is currently building those Windows
container images using a few Windows Server nodes (for 1809, 1903, 1909) with Remote Docker
enabled which are hosted on an azure subscription dedicated for CNCF.
With this, the Windows nodes dependency is removed entirely, as the images can be also built with
docker buildx. One additional benefit to this is that adding new supported Windows OS versions
to the E2E test images manifest lists becomes a lot easier (we wouldn't have to create a new Windows
Server node that matches that new OS version, assign DNS name, update certificates, etc.), and it
also becomes easier for other people to build their own E2E windows test images.
However, some dependencies are still required to run on a Windows machine. To solve this, we can
just pull helper images: e2eteam/powershell-helper:6.2.7 and e2eteam/busybox-helper:1.29.0. Their
Dockerfiles and a Makefile for them has been included in this commit. If any change is required to
them, then a new image will be built and tagged under a different version, but they are pretty
straight-forward and shouldn't require changes.
However, there is a small concern when it comes to the build time: Windows servercore images are
very large (for example, mcr.microsoft.com/windows/servercore:ltsc2019 is 4.99GB uncompressed, and
about ~2 GB compressed - those images are already cached on the Windows Server builder nodes, so
this isn't an issue there), and we currently support 1809, 1903, and 1909 (soon to add 2004).
This can lead to build times that are too big.
We have changed the base image to nanoserver (uncompressed size: 250MB), but some images still
require some DLLs or some other dependencies that can be fetched from a servercore image.
A separate job has been defined that would build a scratch windows-servercore-cache image monthly,
and then we can just get those dependencies from this cache, which will be very small.
This would be preferred, as the Windows images update periodically, and those dependencies
could be updated as well.
We need to make sure we tear down the sriov device plugin pod
should the tests fail, to avoid leaking pods in the test environment.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The NetworkPolicy tests work by trying to connect to a service by its
name, which means that for the tests that involved creating egress
policies, it had to always create an extra rule allowing egress for
DNS, but this assumed that DNS was running on UDP port 53. If it was
running somewhere else (eg if you changed the CoreDNS pods to use port
5353 to avoid needing to give them the NET_BIND_SERVICE capability)
then the NetworkPolicy tests would fail.
Fix this by making the tests connect to their services by IP rather
than by name, and removing all the DNS special-case rules. There are
other tests that ensure that Service DNS works.
The integration test for pods produces a warning caused by using deprecated
default cluster IPs.
$ make test-integration WHAT=./test/integration/pods GOFLAGS="-v"
W1007 17:25:28.217410 100721 services.go:37] No CIDR for service cluster IPs specified. Default value which was 10.0.0.0/24 is deprecated and will be removed in future releases. Please specify it using --service-cluster-ip-range on kube-apiserver.
This warning appears 36 times after running all tests. This patch removes all
the warnings.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
PATCH verb is used when creating a namespace using server-side apply,
while POST verb is used when creating a namespace using client-side
apply.
The difference in path between the two ways to create a namespace led to
an inconsistency when calling webhooks. When server-side apply is used,
the request sent to webhooks has the field "namespace" populated with
the name of namespace being created. On the other hand, when using
client-side apply the "namespace" field is omitted.
This commit aims to make the behaviour consistent and populates the
"namespace" field when creating a namespace using POST verb (i.e.
client-side apply).
On HA API server hiccups we saw a prow job with error:
fail [k8s.io/kubernetes@v1.19.0/test/e2e/e2e.go:284]: Sep 30 17:06:08.313: Error waiting for all pods to be running and ready: 0 / 0 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
The failure should include the last error from API server, if it's
available:
fail [k8s.io/kubernetes@v1.19.0/test/e2e/e2e.go:284]: Oct 1 11:29:45.220: Error waiting for all pods to be running and ready: 0 / 0 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
Last error: Get "https://localhost:6443/api/v1/namespaces/kube-system/replicationcontrollers": dial tcp [::1]:6443: connect: connection refused
POD NODE PHASE GRACE CONDITIONS
This fixes a problem when using the framework helper
e2epod.CreateExecPodOrFail() more than once in the same test.
The problem is that this function was creating pods with both the
pod.Name and pod.GenerateName set.
The second pod failed to be created with a 500 with Reason ServerTimeout
indicating a unique name could not be found in the time allotted,
xref https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/
With the changes introduced by agnhost refactoring PR ( #87266 ), this test was left searching for a invalid container name. Updated to the proper name.
fixed syntax, wrote a test
fixed a test
.
1
Update staging/src/k8s.io/apimachinery/pkg/util/intstr/intstr_test.go
Co-Authored-By: Joel Speed <Joel.speed@hotmail.co.uk>
added test
.
fix
fix test
fixed a test
gofmt
lint
fix
function name
validation fix
.
godocs added
.
A previous commit created a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
A previous commit added a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
Disable subpath test "should be able to unmount after the subpath
directory is deleted" for windows because the test will fail when
deleting a dir while another container still use it.
having framework.DumpDebugInfo after the FailF was
a noop and we are losing those potentially helpful
logs when we need them the most (on a failure)
Signed-off-by: Jamo Luhrsen <jluhrsen@redhat.com>
By passing "oflag=nocache" and "iflag=direct", caching should be
disabled while writing/reading with "dd" to a block device. The
TestConcurrentAccessToSingleVolume() test is known to fail with certain
storage backends (like Ceph RBD) when caching is enabled.
The default BusyBox image used for testing does not support the required
options for "dd". So instead of running with BusyBox, run the test with
a Debian image.
For testing certain features, the BusyBox image does not provide all the
tools that are needed. Notably 'dd' from BusyBox does not support
direct-io that is required for skipping caches while doing writes and
reads on a Block-mode PVC attached to different nodes.
'agnhost' image uses hardcoded 'cluster.local' value for DNS domain.
It leads to failure of a bunch of HPA tests when test cluster is
configured to use custom DNS domain and there is no alias for
default 'cluster.local' one.
So, fix it by reusing it's own function for reading DNS domain suffixes.
Signed-off-by: Valerii Ponomarov <kiparis.kh@gmail.com>
Testing with the default FS (ext4) is IMO enough, ext2/ext3 does not add much
value. It's handled by the same kernel module anyway.
Leave ext2/ext3 only in GCE PD which is tested in kubernetes/kubernetes CI
jobs regularly to catch regressions.