When kubelet starts a Pod that requires device resources, if the device
plug-in updates the device at the same time, it may cause kubelet to crash.
Signed-off-by: huyinhou <huyinhou@bytedance.com>
Device Plugins that wish to leverage the Topology Manager can send back a populated
TopologyInfo struct as part of the device registration, along with the device IDs
and the health of the device. TopologyInfo is converted to TopologyHints and
used by TopologyManager to find the optimal/desired resource allocation for a Pod.
If a plugin sends an empty but non-nil instance of TopologyInfo for a resource,
devicemanager passes it on as an empty instance of TopologyHint which is
currently interpreted as "Hint Provider has no possible NUMA affinities
for resource" which further means that pods requesting that resource will fail.
To not block device resources that pass TopologyInfo{Nodes:[]*NUMANode{}} from being
used, interprete that as nil set of hints and not a []TopologyHint{}.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.
As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error
This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
This ensures that we have the most up-to-date state when generating
topology hints for a container. Without this, it's possible that some
resources will be seen as allocated, when they are actually free.
- As discussed in reviews and other public channels,
this abstraction is used to represent numa nodes, not sockets.
- There is nothing inherently related to sockets in this package anyway.