The KEP specifies that the controller will "mirror all labels from the
Endpoints resource and all endpoints and ports from the corresponding subset".
I'd missed that in my initial implementation, this should fix that.
- Move "ForgetPod" after "RunReservePluginsUnreserve", so that the cache would hold the pod to
avoid it's being retried simutaneously until Unreserve is completed.
- Move "assume" ahead of "RunReservePluginsReserve". This is based on the fact that "ForgetPod" is
the last step of failure path, so "assume" should be reversly treated as the first step. The
current failure path is like this:
assume -> reserve -> unreserve -> forgetPod -> recordingFailure
- Make subtests of TestReservePluginUnreserve stateless
When create fake data for the nodeTree unittests, The 'append' is invoked
on the common fake data set. That makes the unittests is running with unexpected
fake data after that.
When nodes are added in multiple zones at once, the nodeTree next
function does not return a correct list of nodes but repeats some
This commit resets the index before starting to call next() to
prevent this issue
Special thanks to igraecao for the help in finding the bug
Co-authored-by: igraecao <matvej.yolli@outlook.com>
clamp the max cpu.shares to the maximum value allowed by the kernel.
It is not an issue when using cgroupfs, as the kernel will
anyway make sure the value is not out of range and automatically clamp
it, systemd has an additional check that prevents the cgroup creation.
Closes: https://github.com/kubernetes/kubernetes/issues/92855
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.
As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error
This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.