The e2e test "should have Endpoints and EndpointSlices pointing to
the API Server Service" was veryfing the current endpoints
reconciler implementation on the apiservers, however, users may
disable the endpoint reconciler and create their own.
This e2e test is also a conformance test, so we should test the
behaviour and not the implementation details. The test verifies
that a kubernetes.default service exist, an endpoint and endpoint
slices object referencing that service exist and are equivalent.
The Container Images for Windows Server 2022 have been published, and we can
start adding jobs for them.
The ltsc2022-based images have been built and promoted with these image versions.
The PR https://github.com/kubernetes/kubernetes/pull/104575 introduces
some intermediate types which makes the 32GiB memory machine kill the
typecheck process. To resolve that issue and make the test more robust,
we now reduce the amount of parallel typechecks to run to `2`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Prior to this change, the pod was not getting scheduled on the node as
we don't have a running scheduler in e2e_node. PodClient solves this
problem by manually assigning the pod to the node.
The current GPU installer was built in 2017, from source that no longer
exists in Kubernetes ([adding commit][1]. The image was built on 2017-06-13.
Unfortunately, this installer no longer appears to work. When debugging
on the same node type as used by test-infra, it failed to build the
driver as the kernel sha was no longer available.
This lead to needing to find a new way to install GPUs. The smallest
logical change was switching to [cos-gpu-installer][2]
. There is a newer version of this available on [googlesource][3] that
I have not yet tested as it's not clear what the state of the project
is, as I couldn't find docs outside of the source itself.
We install things to the same location as previously to avoid needing
extra downstream changes. There are a couple of weird issues here
however, like needing to run the container twice to correctly update the
LD Cache.
[1]: 1e77594958/cluster/gce/gci/nvidia-gpus/Dockerfile
[2]: https://github.com/GoogleCloudPlatform/cos-gpu-installer
[3]: https://cos.googlesource.com/cos/tools/+/refs/heads/master/src/cmd/cos_gpu_installer/
Different CSI drivers have different error messages, making it difficult
to check them accurately. We remove the check for the error message and
only check the failure type instead, since that is all we need.
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.
It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
The Container Images for Windows Server 2022 have been published, and
we can start building test images using them, so we can start adding
jobs for them.
The image versions for the e2e test images have been bumped in a previous
commit, but haven't been promoted yet. We don't need to bump them here.
httpd-2.4.46-win64-VC15.zip no longer exists, so we have to use
httpd-2.4.48-win64-VC15.zip instead.
Agnhost's serve-hostname at endpoint /hostname
will return hostname. Pods host node name may
return FQDN. Comparison between the two fails.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The Container Images for Windows Server 2022 have been published, and
we can start building test images using them, so we can start adding
jobs for them.
The image versions for the e2e test images have been bumped in a previous
commit, but haven't been promoted yet. We don't need to bump them here.
We're starting with windows-servercore-cache and busybox images, since
they are needed for the other images the most.
A previous added LD_FLAGS for the go binary compilation, but it's not
defined for all images.
The pods using hostNetwork use the host network namespace, hence
they have to share it with the rest of the process and pods.
If several pods try to bind to the same port, the test will fail,
so we try to use a non common port, and run the different scenario
in the same test, so we only have to bind once and we avoid consuming
ports reducing the port collision risk.