A spreading test is more meaningful with a greater number of Pods. However, we cannot always expect perfect spreading. We accept a skew of 2 for 5*z Pods, where z is the number of zones.
Change-Id: Iab0de06a95974fbfec604f003b550f15db618ebd
Due to a rebase glitch the fmt.Sprintf() was lost.
This patches restores it improving the logs readability.
Signed-off-by: Francesco Romani <fromani@redhat.com>
CRUD operations are the extent of conformance testing that we can add
for NetworkPolicy tests since we require a 3rd party plugin like CNI
for enforcement.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
Adds Windows support for most of the images.
Adds a README explaining the image building process, including the
Windows Container image building process.
A previous commit added a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
The CreateSync method includes the waiting for the pod to become running
and returns a fresh new pod instance.
In addition, errors are asserted in the method.
Therefore, there is no need for the callers to repeat these operations.
Some, like the error assertions, will never be reached in case they
occur as they will explode from within the method itself.
Signed-off-by: Edward Haas <edwardh@redhat.com>
I've observed this test occasionally failing due to 403 errors. I think there's something racing within apiserver w/ respect to RBAC and that if this test were more patient, then it would not flake this way.
- Due to performance issues, service controller updates are slow
in large clusters, causing failing tests. Tag can be removed once
performance issues are resolved
Currently, some of the E2E test images have Windows support, and one of the goals is for most of
them to have Windows support. For that, the Image Builder is currently building those Windows
container images using a few Windows Server nodes (for 1809, 1903, 1909) with Remote Docker
enabled which are hosted on an azure subscription dedicated for CNCF.
With this, the Windows nodes dependency is removed entirely, as the images can be also built with
docker buildx. One additional benefit to this is that adding new supported Windows OS versions
to the E2E test images manifest lists becomes a lot easier (we wouldn't have to create a new Windows
Server node that matches that new OS version, assign DNS name, update certificates, etc.), and it
also becomes easier for other people to build their own E2E windows test images.
However, some dependencies are still required to run on a Windows machine. To solve this, we can
just pull helper images: e2eteam/powershell-helper:6.2.7 and e2eteam/busybox-helper:1.29.0. Their
Dockerfiles and a Makefile for them has been included in this commit. If any change is required to
them, then a new image will be built and tagged under a different version, but they are pretty
straight-forward and shouldn't require changes.
However, there is a small concern when it comes to the build time: Windows servercore images are
very large (for example, mcr.microsoft.com/windows/servercore:ltsc2019 is 4.99GB uncompressed, and
about ~2 GB compressed - those images are already cached on the Windows Server builder nodes, so
this isn't an issue there), and we currently support 1809, 1903, and 1909 (soon to add 2004).
This can lead to build times that are too big.
We have changed the base image to nanoserver (uncompressed size: 250MB), but some images still
require some DLLs or some other dependencies that can be fetched from a servercore image.
A separate job has been defined that would build a scratch windows-servercore-cache image monthly,
and then we can just get those dependencies from this cache, which will be very small.
This would be preferred, as the Windows images update periodically, and those dependencies
could be updated as well.
We need to make sure we tear down the sriov device plugin pod
should the tests fail, to avoid leaking pods in the test environment.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The NetworkPolicy tests work by trying to connect to a service by its
name, which means that for the tests that involved creating egress
policies, it had to always create an extra rule allowing egress for
DNS, but this assumed that DNS was running on UDP port 53. If it was
running somewhere else (eg if you changed the CoreDNS pods to use port
5353 to avoid needing to give them the NET_BIND_SERVICE capability)
then the NetworkPolicy tests would fail.
Fix this by making the tests connect to their services by IP rather
than by name, and removing all the DNS special-case rules. There are
other tests that ensure that Service DNS works.
The integration test for pods produces a warning caused by using deprecated
default cluster IPs.
$ make test-integration WHAT=./test/integration/pods GOFLAGS="-v"
W1007 17:25:28.217410 100721 services.go:37] No CIDR for service cluster IPs specified. Default value which was 10.0.0.0/24 is deprecated and will be removed in future releases. Please specify it using --service-cluster-ip-range on kube-apiserver.
This warning appears 36 times after running all tests. This patch removes all
the warnings.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
PATCH verb is used when creating a namespace using server-side apply,
while POST verb is used when creating a namespace using client-side
apply.
The difference in path between the two ways to create a namespace led to
an inconsistency when calling webhooks. When server-side apply is used,
the request sent to webhooks has the field "namespace" populated with
the name of namespace being created. On the other hand, when using
client-side apply the "namespace" field is omitted.
This commit aims to make the behaviour consistent and populates the
"namespace" field when creating a namespace using POST verb (i.e.
client-side apply).
On HA API server hiccups we saw a prow job with error:
fail [k8s.io/kubernetes@v1.19.0/test/e2e/e2e.go:284]: Sep 30 17:06:08.313: Error waiting for all pods to be running and ready: 0 / 0 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
The failure should include the last error from API server, if it's
available:
fail [k8s.io/kubernetes@v1.19.0/test/e2e/e2e.go:284]: Oct 1 11:29:45.220: Error waiting for all pods to be running and ready: 0 / 0 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
Last error: Get "https://localhost:6443/api/v1/namespaces/kube-system/replicationcontrollers": dial tcp [::1]:6443: connect: connection refused
POD NODE PHASE GRACE CONDITIONS
This fixes a problem when using the framework helper
e2epod.CreateExecPodOrFail() more than once in the same test.
The problem is that this function was creating pods with both the
pod.Name and pod.GenerateName set.
The second pod failed to be created with a 500 with Reason ServerTimeout
indicating a unique name could not be found in the time allotted,
xref https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/
With the changes introduced by agnhost refactoring PR ( #87266 ), this test was left searching for a invalid container name. Updated to the proper name.
fixed syntax, wrote a test
fixed a test
.
1
Update staging/src/k8s.io/apimachinery/pkg/util/intstr/intstr_test.go
Co-Authored-By: Joel Speed <Joel.speed@hotmail.co.uk>
added test
.
fix
fix test
fixed a test
gofmt
lint
fix
function name
validation fix
.
godocs added
.
A previous commit created a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
A previous commit added a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
Disable subpath test "should be able to unmount after the subpath
directory is deleted" for windows because the test will fail when
deleting a dir while another container still use it.
having framework.DumpDebugInfo after the FailF was
a noop and we are losing those potentially helpful
logs when we need them the most (on a failure)
Signed-off-by: Jamo Luhrsen <jluhrsen@redhat.com>
By passing "oflag=nocache" and "iflag=direct", caching should be
disabled while writing/reading with "dd" to a block device. The
TestConcurrentAccessToSingleVolume() test is known to fail with certain
storage backends (like Ceph RBD) when caching is enabled.
The default BusyBox image used for testing does not support the required
options for "dd". So instead of running with BusyBox, run the test with
a Debian image.
For testing certain features, the BusyBox image does not provide all the
tools that are needed. Notably 'dd' from BusyBox does not support
direct-io that is required for skipping caches while doing writes and
reads on a Block-mode PVC attached to different nodes.
'agnhost' image uses hardcoded 'cluster.local' value for DNS domain.
It leads to failure of a bunch of HPA tests when test cluster is
configured to use custom DNS domain and there is no alias for
default 'cluster.local' one.
So, fix it by reusing it's own function for reading DNS domain suffixes.
Signed-off-by: Valerii Ponomarov <kiparis.kh@gmail.com>
Testing with the default FS (ext4) is IMO enough, ext2/ext3 does not add much
value. It's handled by the same kernel module anyway.
Leave ext2/ext3 only in GCE PD which is tested in kubernetes/kubernetes CI
jobs regularly to catch regressions.
test can execute whever hosts have ssh or not
relevant case:
"should be able to up and down services"
"should implement service.kubernetes.io/service-proxy-name"
"should implement service.kubernetes.io/headless"
A previous commit created a few agnhost related functions that creates agnhost
pods / containers for general purposes.
Refactors tests to use those functions.
This reverts commit 61490bba46, reversing
changes made to 9ecab1b4b2.
Some methods from the networking e2e tools are dialing from a
containter to another container, and failing the test if there was no
connectivity. This PR modified the methods to return an error instead of
failing the test.
However, these methods were used by other tests in the framework, and
they are not checking if the method returns an error, expecting that
the method fail the test. With this change, any connectivity problem
will go unnoticed on the tests that are not asserting the error, so we
need to revert to previous state.
Using Windows nanoserver container images as a base instead of the current
Windows servercore image will reduce the image size by about ~10x.
However, the nanoserver image lacks several things we need:
- netapi32.dll
- powershell
- certain powershell commands
- chocolatey cannot be used
When building the nanoserver images, we are going to use a Windows servercore helper,
in which we are going to install the necessary dependencies, and then copy them over
to our nanoserver image, including necessary DLLs.
Other notable changes include:
- switch from wget to curl (wget was a powershell alias).
- implement in code getting the DNS suffix list and DNS server list.
- reimplement getting file permissions for mounttest.
The provided DialContext wraps existing clients' DialContext in an attempt to
preserve any existing timeout configuration. In some cases, we may replace
infinite timeouts with golang defaults.
- scaleio: tcp connect/keepalive values changed from 0/15 to 30/30
- storageos: no change
When trying to build the s390x image, it would fail when running the apk
command with the following error:
ERROR: Unable to open root: Bad address
ERROR: Failed to open apk database: Bad address
This can be fixed by updating the third_party/multiarch/qemu-user-static/register/register.sh
and third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh scripts
and their usage to a newer version [1].
Additionally, the packages nginx-mod-http-lua and nginx-mod-http-lua-upstream
cannot be found in the regular http://dl-cdn.alpinelinux.org/alpine/v3.9/main/s390x/
repository, but we can use an older one [2].
[1] https://github.com/qemu/qemu/blob/master/scripts/qemu-binfmt-conf.sh
[2] http://dl-cdn.alpinelinux.org/alpine/v3.8/main
All images used by e2e tests must use templates in order to allow
relocation. In addition this is hitting Dockerhub which will be
getting throttled soon.
microk8s run kubelet service as `snap.microk8s.daemon-kubelet.service`, instead of `kubelet.service`.
so e2e should use `systemctl list-units *kubelet* --state=running` to find out kubelet service of microk8s.
And same for go_test_conditional_pure.
Instead of aliasing. Aliases are annoying in a number of ways. This is
specifically bugging me now because they make the action graph harder to
analyze programmatically. By using aliases here, we would need to handle
potentially aliased go_binary targets and dereference to the effective
target.
The comment references an issue with `pure = select(...)` which appears
to be resolved considering this now builds.
When updating ephemeral containers, convert Pod to EphemeralContainers
in storage validation. This resolves a bug where admission webhook
validation fails for ephemeral container updates because the webhook
client cannot perform the conversion.
Also enable the EphemeralContainers feature gate for the admission
control integration test, which would have caught this bug.
Even with foreground deletion, removal of the PVs that may have been
created for a pod with generic ephemeral volumes happens
asynchronously, in the worst case after the test has completed and the
driver for the volume got removed.
Perhaps this can be fixed in Kubernetes itself, but for now we need to
deal with it as part of the test.
the endpoints API handler was using the Canonicalize() method to
reorder the endpoints, however, due to differences with the
endpoint controller RepackSubsets(), the controller was considering
the endpoints different despite they were not, generating unnecessary
updates evert resync period.
In 1.19, we made it so that upgrading from client-side `kubectl apply`
to server-side `kubectl apply --server-side` does not conflict.
This means that `kubectl diff --server-side` should work the same:
server-side apply diffing a resource managed originally with client-side apply
should not result in conflicts.
Since the SCTP module verification tests were added, their result may be affected by
running the SCTPConnectivity tests. For this reason, they are now marked as disruptive.
Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Creates a few agnhost related functions that creates agnhost
pods / containers for general purposes.
The following commits will refactor tests to use those functions.
Make pre provisioned snapshots using CSI driver by
1. Take a dynamic snapshot with retain policy
2. Delete the dynamic snapshot and content
3. Create a preprovisioned snapshot with snapshotHandle
This commit adds a preprovisiond test pattern, all snapshots made using
create snapshot resource become prepv snapshots. All exisitng test cases
now run again with prepv snapshots.
After PR https://github.com/kubernetes/kubernetes/pull/92555, there are a number of gce pd default fs tests skipped. Here the testpatten has SnapshotType set because some provisioning tests use snapshots. But for drivers such as In-tree gce pd driver, the tests will be skipped because of the logic in skipUnsupportedTesthttps://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/testsuites/base.go#L154
Since multiple drivers might test with the same pattern, so I think we need keep SnapshotType here.
This PR removes the part of the logic in skipUnsupportedTest. This should be ok because all snapshot tests will check whether a driver has snapshot capability or not.
Currently if a group is specified for an impersonated user,
'system:authenticated' is not added to the 'Groups' list inside the
request context.
This causes priority and fairness match to fail. The catch-all flow
schema needs the user to be in the 'system:authenticated' or in the
'system:unauthenticated' group. An impersonated user with a specified
group is in neither.
As a general rule, if an impersonated user has passed authorization
checks, we should consider him authenticated.
The test "should fail substituting values in a volume subpath with absolute path" creates a pod
with a variable expansion path which is set as an absolute path. However "/tmp" is not an absolute
on Windows, it has to be prefixed with the drive letter (C:\tmp). But C:\tmp does not typically
exist on Windows nodes, so we use C:\Users instead.
This test was trying to create an Endpoints resource that the Endpoints
controller would also attempt to create. This could result in a failure
if the Endpoints controller created the resource before the test did.