The purpose of the pod created by `createBalancedPodForNodes()` is to ensure
that all nodes have equal resource requests (as seen by the scheduler). This
prevents the default scheduling behavior (which attempts to balance resource requests)
from interfering with e2e's which test other priorities/score plugins.
Because the scheduler only worries about requests, specifying `Limits` in this pod
is unnecessary. In fact, if the calculated "balancing" limit is too low, it can cause
the balancing pod to never start due to OOMKill errors, leading to flakes and failures.
To run the tests in a single node cluster, create two pods consuming 2/5 of the extended resource instead of one consuming 2/3.
The low priority pod will be consuming 2/5 of the extended resource instead so in case there's only a single node,
a high priority pod consuming 2/5 of the extended resource can be still scheduled. Thus, making sure only the low priority pod
gets preempted once the preemptor pod consuming 2/5 of the extended resource gets scheduled while keeping the high priority pod untouched.
This adds a call to createBalancedPods during the ubernetes_lite scheduling e2es,
which are prone to improper score balancing due to unbalanced utilization.
The test "validates that there is no conflict between pods with same
hostPort but different hostIP and protocol" was testing the scheduler
capability to schedule pods on the same node with hostPorts, however,
it wasn´t validating that the HostPorts was working, causing false
positives, because the pods were scheduled, but the HostPort exposed
wasn´t working.
In order to test the HostPort functionality, we have to use HostNetwork
pods, that are incompatible with Windows platforms. Also, since this
is touching both network and scheduling, there is no clear the ownership,
but sig-network is happy to adopt it.
We also add a new test for scheduling only under "scheduling", so Windows
folks can use it to test the scheduled in that platform.
The test is not cleaning all pods it created.
Memory balancing pods are deleted once the test namespace is.
Thus, leaving the pods running or in terminating state when a new test is run.
In case the next test is "[sig-scheduling] SchedulerPredicates [Serial] validates resource limits of pods that are allowed to run",
the test can fail.
The e2e test, included as part of Conformance,
"validates that there is no conflict between
pods with same hostPort but different hostIP and protocol"
was only testing that the pods were scheduled without conflict
but was never testing the functionality.
The test should check that pods with containers forwarding the same
hostPort can be scheduled without conflict, and that those exposed
HostPort are forwarding the ports to the corresponding pods.
the predicate tests were using loopback addresses for the the
hostPort test, however, those have different semantics depending
on the IP family, i.e. you can not bind to ::1 and ::2 simultanously,
in addition, IP forwarding from localhost to localhost in IPv6 is
not working since it doesn't have the kernel route_localnet hack.
* Rename const for topology.../zone
* Rename const for topology.../region
* Rename const for failure-domain.../zone
* Rename const for failure-domain.../region
* Restore old names for compat
A spreading test is more meaningful with a greater number of Pods. However, we cannot always expect perfect spreading. We accept a skew of 2 for 5*z Pods, where z is the number of zones.
Change-Id: Iab0de06a95974fbfec604f003b550f15db618ebd
Updates sig-scheduling e2e Nvidia GPU tests to install drivers using
local manifest by default. Currently the DaemonSet is fetched from the
GoogleCloudPlatform/container-enginer-accelerators repo by default.
Using a local manifest allows for manually specifying the image
cos-gpu-installer image rather than always using latest. A remote
manifest can still be fetched by setting
NVIDIA_DRIVER_INSTALLER_DAEMONSET env var.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Currently when checking for unscheduled pods an exception will be raised
if a pod is not scheduled and the status is unknown. This update modifies
the logic to include any pod without a NodeName in the not scheduled
pods returned.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>